Bosquillon de Jenlis, Armand
[UCL]
Schaus, Pierre
[UCL]
Aoga, John Oscar
[UCL]
The tasks of finding all frequent or closed frequent itemsets is a widely studied subject in the field of data mining. Recently, there have been some attempts to tackle those issues using the constraint programming (CP) paradigm. These attempts proved to be useful as CP is a declarative paradigm which enables to add any combination of constraints. This method contrasts with the traditional approaches taken to solve the itemset mining problems with specialized algorithms. These specialized algorithms show less flexibility as they must be modified in order to add a new combination of constraints on the itemsets found. For some specific tasks, the CP approach showed significant improvements over the existing algorithms, but its performances are worse on standard tasks. This thesis focuses on closing the gap between the performances of the traditional approach and the CP approach. The experiments conducted in this work highlight significant performance improvements over previous CP attempts. It also provides new insights regarding the use of ideas coming from state-of-the-art itemset mining systems in order to enhance CP approaches and vice versa. This thesis gives new theoretical bounds for the CP approach. It also shows that the operations executed to find the frequent (resp. closed frequent itemsets) are very similar to the one executed by the state-of-the-art Eclat (resp. LCM algorithms). It proves that these problems can be handled very efficiently by the use of a CP approach as demonstrated by LCM victory at the FIMI04 competition.


Bibliographic reference |
Bosquillon de Jenlis, Armand. Data Mining using Constraint Programming. Ecole polytechnique de Louvain, Université catholique de Louvain, 2016. Prom. : Schaus, Pierre ; Aoga, John Oscar. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:8142 |