Zakharov, Roman
[UCL]
Dupont, Pierre
[UCL]
This paper describes a novel feature selection algorithm embedded into logistic regression. It specifically addresses high dimensional data with few observations, which are commonly found in the biomedical domain such as microarray data. The overall objective is to optimize the predictive performance of a classifier while favoring also sparse and stable models.
Feature relevance is first estimated according to a simple t-test ranking. This initial feature relevance is treated as a feature sampling probability and a multivariate logistic regression is iteratively reestimated on subsets of randomly and non-uniformly sampled features. At each iteration, the feature sampling probability is adapted according to the predictive performance and the weights of the logistic regression. Globally,the proposed selection method can be seen as an ensemble of logistic regression models voting jointly for the final relevance of features.
Practical experiments reported on several microarray datasets show that the proposed method offers a comparable or better stability and significantly better predictive performances than logistic regression regularized with Elastic Net. It also outperforms a selection based on Random Forests, another popular embedded feature selection from an ensemble of classifiers.
- Abeel Thomas, Helleputte Thibault, Van de Peer Yves, Dupont Pierre, Saeys Yvan, Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, 10.1093/bioinformatics/btp630
- Bach Francis R., Bolasso : model consistent Lasso estimation through the bootstrap, 10.1145/1390156.1390161
- Breiman Leo, 10.1023/a:1010933404324
- Chandran Uma R, Ma Changqing, Dhir Rajiv, Bisceglia Michelle, Lyons-Weiler Maureen, Liang Wenjing, Michalopoulos George, Becich Michael, Monzon Federico A, Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process, 10.1186/1471-2407-7-64
- Cox, D.R., Snell, E.J.: Analysis of binary data. Monographs on statistics and applied probability. Chapman and Hall (1989)
- Desmedt C., Piette F., Loi S., Wang Y., Lallemand F., Haibe-Kains B., Viale G., Delorenzi M., Zhang Y., d'Assignies M. S., Bergh J., Lidereau R., Ellis P., Harris A. L., Klijn J. G.M., Foekens J. A., Cardoso F., Piccart M. J., Buyse M., Sotiriou C., , Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series, 10.1158/1078-0432.ccr-06-2765
- Dietterich Thomas G., Ensemble Methods in Machine Learning, Multiple Classifier Systems (2000) ISBN:9783540677048 p.1-15, 10.1007/3-540-45014-9_1
- Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction. Foundations and Applications. Studies in Fuzziness and Soft Computing. Physica-Verlag, Springer (2006)
- Helleputte Thibault, Dupont Pierre, Feature Selection by Transfer Learning with Linear Regularized Models, Machine Learning and Knowledge Discovery in Databases (2009) ISBN:9783642041792 p.533-547, 10.1007/978-3-642-04180-8_52
- Hoerl Arthur E., Kennard Robert W., Ridge Regression: Biased Estimation for Nonorthogonal Problems, 10.1080/00401706.1970.10488634
- Kalousis Alexandros, Prados Julien, Hilario Melanie, Stability of feature selection algorithms: a study on high-dimensional spaces, 10.1007/s10115-006-0040-8
- Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th International Multi-Conference Artificial Intelligence and Applications, pp. 390–395. ACTA Press, Anaheim (2007)
- Li Qiyuan, Eklund Aron C., Juul Nicolai, Haibe-Kains Benjamin, Workman Christopher T., Richardson Andrea L., Szallasi Zoltan, Swanton Charles, Minimising Immunohistochemical False Negative ER Classification Using a Complementary 23 Gene Expression Signature of ER Status, 10.1371/journal.pone.0015031
- Nadeau Claude, 10.1023/a:1024068626366
- Ng Andrew Y., Feature selection, L1 vs. L2 regularization, and rotational invariance, 10.1145/1015330.1015435
- Roth V., The Generalized LASSO, 10.1109/tnn.2003.809398
- Saeys Y., Inza I., Larranaga P., A review of feature selection techniques in bioinformatics, 10.1093/bioinformatics/btm344
- Shipp Margaret A., Ross Ken N., Tamayo Pablo, Weng Andrew P., Kutok Jeffery L., Aguiar Ricardo C.T., Gaasenbeek Michelle, Angelo Michael, Reich Michael, Pinkus Geraldine S., Ray Tane S., Koval Margaret A., Last Kim W., Norton Andrew, Lister T. Andrew, Mesirov Jill, Neuberg Donna S., Lander Eric S., Aster Jon C., Golub Todd R., Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning, 10.1038/nm0102-68
- Singh Dinesh, Febbo Phillip G., Ross Kenneth, Jackson Donald G., Manola Judith, Ladd Christine, Tamayo Pablo, Renshaw Andrew A., D'Amico Anthony V., Richie Jerome P., Lander Eric S., Loda Massimo, Kantoff Philip W., Golub Todd R., Sellers William R., Gene expression correlates of clinical prostate cancer behavior, 10.1016/s1535-6108(02)00030-2
- Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)
- Witten, D.M., Tibshirani, R.: A comparison of fold-change and the t-statistic for microarray data analysis. Stanford University. Technical report (2007)
- Zou Hui, Hastie Trevor, Regularization and variable selection via the elastic net, 10.1111/j.1467-9868.2005.00503.x
Bibliographic reference |
Zakharov, Roman ; Dupont, Pierre. Ensemble logistic regression for feature selection.6th IAPR International Conference on Pattern Recognition in Bioinformatics (Delft, The Netherlands, du 02/11/2011 au 04/11/2011). In: Lecture Notes in Bioinformatics, no. 7036, p. 133-144 (2011) |
Permanent URL |
http://hdl.handle.net/2078.1/87509 |