Abstract |
: |
This thesis addresses three challenge of machine learning: high-dimensional data, label noise and limited computational resources. Learning is usually hard in high-dimensional spaces, due to the curse of dimensionality and other phenomena like the concentration of distances. One can either handle such data with specific tools or try to reduce their dimensionality using e.g. feature selection. The first contribution of this thesis is to study the adequacy of mutual information to select relevant subsets of features. For both classification and regression problems, mutual information is shown to be a sensible criterion for feature selection in most cases. Counterexamples are discussed, where mutual information fails to select optimal features with respect to common error criteria for classification and regression. However, the probability and impact of such failures is also shown to be limited. The second contribution of this thesis is a survey of the label noise literature. Indeed, label noise is an important problem in classification, whose consequences are various and complex. For example, this thesis shows that label noise affects the segmentation of electrocardiogram signals and the results of feature selection. In each case, a new algorithm is proposed to deal with label noise using a probabilistic modelling introduced by Lawrence and Sch"{o}lkopf. Afterwards, a more generic framework is proposed to deal with instances which have a too large influence on learning. This framework is used to robustify several probabilistic learning algorithms. The last contribution of this thesis is the study of large extreme learning machines. Indeed, extreme learning is a recent trend in machine learning which allows learning non-linear models much faster than other state-of-the-art methods. Extreme learning machines are single layer feedforward neural networks whose hidden layer is randomly initialised and not optimised during learning. Only the output weights of such networks have to be optimised, which explains why learning becomes much faster. This thesis shows that when the number of hidden neurons is large, overfitting can be avoided using regularisation. In this case, a new kernel can be defined using extreme learning, which is shown to give good results for both classification and regression problems. This kernel offers a compromise between prediction accuracy and computational needs which can be useful in contexts where computational time is precious. |