de Walque, Arthur
[UCL]
Delvenne, Jean-Charles
[UCL]
In biology, a phylogenetic tree (or phylogeny) is a graphical representation of the evolutionary relationships between organisms, species, genes, molecular sequences, or more generally a set of taxa. Phylogenetic analysis has many fields of application in systematics, biodiversity, epidemiology, and life sciences. It has proven its usefulness during the SARS-CoV-2 pandemic in the identification of variants and the search for contamination clusters, for example. In this thesis, we will focus on a tree reconstruction method belonging to the distance-based methods, and more specifically, the BMEP (Balanced Minimum Evolution), whose objective function encodes the length function of phylogeny. BMEP is NP-hard, it is a good model, it has one of the best accuracies, and its optimal solution has been proven statistically consistent. The increasing number of sequencing techniques and the growing amount of molecular data available make it necessary for biologists to take into account an ever-increasing number of taxa. BMEP is widely used in approximate methods, but its computational cost makes its optimal solution difficult to obtain for a relatively small number of taxa. We will first look at the earlier optimal solution proposal, introduced by Pardi in 2009, based on the combinatorial properties of BMEP and using a branch-and-bound technique. In a second step, we will try to give a counterexample to a conjecture that remains open and that concerns the characterization of the set of path-length matrices. A recent work proved that the identified necessary conditions were sufficient for a number of taxa n = [3, 11]. This characterization is crucial for the development of methods to reduce the computational cost of the optimal solution using integer linear programming. Our approach will be to propose an enumeration algorithm for all possible path-length matrices in the hope of reaching n = 12. If our enumeration results in at least an expected number of phylogenies + 1, this would mean that the necessary conditions are not sufficient beyond n = 12. But the number of phylogenies being hyper exponential, the challenge is far from simple.


Référence bibliographique |
de Walque, Arthur. Reconstructing phylogenies : BMEP model implementation and research on the definition of the optimization set. Ecole polytechnique de Louvain, Université catholique de Louvain, 2023. Prom. : Delvenne, Jean-Charles. |
Permalien |
http://hdl.handle.net/2078.1/thesis:38743 |