Paul, Jérôme
[UCL]
Verleysen, Michel
[UCL]
Dupont, Pierre
[UCL]
Embedded feature selection can be performed by analyzing the variables used in a Random Forest. Such a multivariate selection takes into account the interactions between variables but is not easy to interpret in a statistical sense. We propose a statistical procedure to measure variable importance that tests if variables are significantly useful in combination with others in a forest. We show experimentally that this new importance index correctly identifies relevant variables. The top of the variable ranking is, as expected, largely correlated with Breiman’s importance index based on a permutation test. Our measure has the additional benefit to produce p-values from the forest voting process. Such p-values offer a very natural way to decide which features are significantly relevant while controlling the false discovery rate.
Référence bibliographique |
Paul, Jérôme ; Verleysen, Michel ; Dupont, Pierre. Identification of Statistically Significant Features from Random Forests.ECML workshop on Solving Complex Machine Learning Problems with Ensemble Methods (Prague (Czech Republic), du 27/09/2013 au 27/09/2013). |
Permalien |
http://hdl.handle.net/2078.1/133615 |