Authors 
: 

Document type 
: 
Communication à un colloque (Conference Paper) – Présentation orale avec comité de sélection

Abstract 
: 
Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a maxsum submatrix is a related but distinct problem for which one looks for a (nonnecessarily contiguous) rectangular submatrix with a maximal sum of its entries. Le Van et al. (Ranked Tiling, 2014) already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (CPLNS). In this work, we exhibit some key properties of this NPhard problem and define a bounding function such that larger problems can be solved in reasonable time. Two different algorithms are proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the original CPLNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive. 
Access type 
: 
Accès libre

Publication date 
: 
2017 
Language 
: 
Anglais 
Conference 
: 
"6th International Workshop on New Frontiers in Mining Complex Patterns in conjunction with ECMLPKDD 2017", Skopje (MK) (22/09/2017) 
Journal information 
: 
"Proceedings of the 6th International Workshop on New Frontiers in Mining Complex Patterns in conjunction with ECMLPKDD 2017" 
Peer reviewed 
: 
yes 
Publication status 
: 
Publié 
Affiliation 
: 
UCL
 SST/ICTM/INGI  Pôle en ingénierie informatique

Keywords 
: 
Pattern mining ; Data mining ; Optimization ; Constraint programming ; Mixed integer programming ; Linear programming ; Biclustering ; Gene expression data

Links 
: 
