He, Yuqing
[UCL]
Significant advances have been made in predicting new topological materials using high-throughput extit{ab–initio} and symmetry-based indicator calculations. To date, thousands of materials have been identified to be topologically nontrivial by scanning the existing databases. However, this approach is severely limited to non-magnetic systems with well-defined symmetries, leaving a much larger materials space unexplored. Additionally, the existing results have not been able to link the topology of materials to some basic factors that are applicable in the experimental field, for example, the difference in electronegativities of the chemical composition. The machine learning method is an effective way to overcome the above shortcomings by exploring the topological materials data. A large dataset including 35,608 entries is obtained by merging two symmetry-indicator-based databases: Materiae and Topological Materials Database. We mainly address two tasks: 5-types classification for categorizing materials into the following classes - trivial insulator, high-symmetry-point semimetal, high-symmetry-line semimetal, topological insulator, and topological crystalline insulator; and a binary classification to distinguish trivial insulators from topologically nontrivial materials. The first task aims to train a model to explore new materials. After a benchmark with an identical nested cross-validation procedure, the method adopting gradient boosted trees reaches the highest accuracy, 85.2%. We further apply this method in a group of tests to analyze how well this model would generalize to new, unseen data in practical applications. Then a robust model is trained on our full dataset, and it is used to predict the topological type of 906 materials. The second task is to study the essential factors that influence the topology of materials. We interpret some important features used in our model and combine them with an unsupervised algorithm to analyze their effectiveness.


Bibliographic reference |
He, Yuqing. Machine Learning topological characteristics from multiple electronic materials databases. Prom. : Rignanese, Gian-Marco ; Giantomassi, Matteo |
Permanent URL |
http://hdl.handle.net/2078.1/284193 |