Abstract |
: |
[eng] Linear dimensionality reduction techniques such as principal component analysis are powerful tools for the analysis of high-dimensional data. In this thesis, we explore a closely related problem, namely nonnegative matrix factorization (NMF), a low-rank matrix approximation problem with nonnegativity constraints. More precisely, we seek to approximate a given nonnegative matrix with the product of two low-rank nonnegative matrices. These nonnegative factors can be interpreted in the same way as the data, e.g., as images (described by pixel intensities) or texts (represented by vectors of word counts), and lead to an additive and sparse representation. However, they render the problem much more difficult to solve (i.e., NP-hard).
A first goal of this thesis is to study theoretical issues related to NMF. In particular, we make connections with well-known problems in graph theory, combinatorial optimization and computational geometry. We also study computational complexity issues and show, using reductions from the maximum-edge biclique problem, NP-hardness of several low-rank matrix approximation problems, including the rank-one subproblems arising in NMF, a problem involving underapproximation constraints (NMU) and the unconstrained version of the factorization problem where some data is missing or unknown.
Our second goal is to improve existing techniques and develop new tools for the analysis of nonnegative data. We propose accelerated variants of several NMF algorithms based on a careful analysis of their computational cost. We also introduce a multilevel approach to speed up their initial convergence. Finally, a new greedy heuristic based on NMU is presented and used for the analysis of hyperspectral images, in which each pixel is measured along hundreds of wavelengths, which allows for example spectroscopy of satellite images. |