Peel, Leto
[UCL]
Larremore, Daniel B.
Clauset, Aaron
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
- Evans T S, Clique graphs and overlapping communities, 10.1088/1742-5468/2010/12/p12037
- Adamic Lada A., Glance Natalie, The political blogosphere and the 2004 U.S. election : divided they blog, 10.1145/1134271.1134277
- Hric Darko, Darst Richard K., Fortunato Santo, Community detection in networks: Structural communities versus ground truth, 10.1103/physreve.90.062805
- Leskovec Jure, Lang Kevin J., Mahoney Michael, Empirical comparison of algorithms for network community detection, 10.1145/1772690.1772755
- Yang Jaewon, Leskovec Jure, Community-Affiliation Graph Model for Overlapping Network Community Detection, 10.1109/icdm.2012.139
- Zachary Wayne W., An Information Flow Model for Conflict and Fission in Small Groups, 10.1086/jar.33.4.3629752
- Holland Paul W., Laskey Kathryn Blackmond, Leinhardt Samuel, Stochastic blockmodels: First steps, 10.1016/0378-8733(83)90021-7
- Nowicki Krzysztof, Snijders Tom A. B, Estimation and Prediction for Stochastic Blockstructures, 10.1198/016214501753208735
- Jianbo Shi, Malik J., Normalized cuts and image segmentation, 10.1109/34.868688
- Cheng Xue-Qi, Shen Hua-Wei, Uncovering the community structure associated with the diffusion dynamics on networks, 10.1088/1742-5468/2010/04/p04024
- Krzakala F., Moore C., Mossel E., Neeman J., Sly A., Zdeborova L., Zhang P., Spectral redemption in clustering sparse networks, 10.1073/pnas.1312486110
- Fortunato Santo, Community detection in graphs, 10.1016/j.physrep.2009.11.002
- Karrer Brian, Newman M. E. J., Stochastic blockmodels and community structure in networks, 10.1103/physreve.83.016107
- Newman M. E. J., Clauset Aaron, Structure and inference in annotated networks, 10.1038/ncomms11863
- Good Benjamin H., de Montjoye Yves-Alexandre, Clauset Aaron, Performance of modularity maximization in practical contexts, 10.1103/physreve.81.046106
- Bordenave Charles, Lelarge Marc, Massoulie Laurent, Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs, 10.1109/focs.2015.86
- Ghasemian, Phys. Rev. X, 6, 031005 (2016)
- Massoulié Laurent, Community detection thresholds and the weak Ramanujan property, 10.1145/2591796.2591857
- E. Mossel, J. Neeman, A. Sly, Belief propagation, robust reconstruction and optimal recovery of block models, Proceedings of the 27th Conference on Learning Theory, Barcelona, Spain, 13 to 15 June 2014, vol. 35, pp. 356–370.
- Mossel Elchanan, Neeman Joe, Sly Allan, Reconstruction and estimation in the planted partition model, 10.1007/s00440-014-0576-6
- Guimerà Roger, Sales-Pardo Marta, Amaral Luís A. Nunes, Modularity from fluctuations in random graphs and complex networks, 10.1103/physreve.70.025101
- D. Taylor , R. S. Caceres , P. J. Mucha , Detectability of small communities in multilayer and temporal networks: Eigenvector localization, layer aggregation, and time series discretization. arXiv:1609.04376 (2016).
- Lancichinetti Andrea, Fortunato Santo, Community detection algorithms: A comparative analysis, 10.1103/physreve.80.056117
- Holme P., Huss M., Jeong H., Subnetwork hierarchies of biochemical pathways, 10.1093/bioinformatics/btg033
- Yang Jaewon, Leskovec Jure, Defining and evaluating network communities based on ground-truth, 10.1007/s10115-013-0693-z
- Wolpert David H., The Lack of A Priori Distinctions Between Learning Algorithms, 10.1162/neco.1996.8.7.1341
- L. Peel, Topological feature based classification, Proceedings of the 14th International Conference on Information Fusion, Chicago, IL, 5 to 8 July 2011 (IEEE, 2011), pp. 1–8.
- L. Peel , Supervised blockmodelling. arXiv:1209.5561 (2012).
- Fosdick Bailey K., Hoff Peter D., Testing and Modeling Dependencies Between a Network and Nodal Attributes, 10.1080/01621459.2015.1008697
- Airoldi, J. Mach. Learn. Res., 9, 1981 (2007)
- Ball Brian, Karrer Brian, Newman M. E. J., Efficient and principled method for detecting communities in networks, 10.1103/physreve.84.036103
- Larremore Daniel B., Clauset Aaron, Jacobs Abigail Z., Efficiently inferring community structure in bipartite networks, 10.1103/physreve.90.012805
- Peixoto, Phys. Rev. X, 4, 011047 (2014)
- Guimerà Roger, Nunes Amaral Luís A., Functional cartography of complex metabolic networks, 10.1038/nature03288
- Bianconi Ginestra, Pin Paolo, Marsili Matteo, Assessing the relevance of node features for network structure, 10.1073/pnas.0811511106
- Larremore Daniel B., Clauset Aaron, Buckee Caroline O., A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes, 10.1371/journal.pcbi.1003268
- Ahn Yong-Yeol, Bagrow James P., Lehmann Sune, Link communities reveal multiscale complexity in networks, 10.1038/nature09182
- Soundarajan Sucheta, Hopcroft John, Using community information to improve the precision of link prediction methods, 10.1145/2187980.2188150
- Chakrabort Tanmoy, Sikdar Sandipan, Tammana Vihar, Ganguly Niloy, Mukherjee Animesh, Computer science fields as ground-truth communities : their impact, rise and fall, 10.1145/2492517.2492536
- von Luxburg, J. Mach. Learn. Res., 27, 65 (2012)
- J. Kleinberg , An impossibility theorem for clustering. Adv. Neural Inf. Process. Syst. 463–470 (2003).
- A. Browet , J. M. Hendrickx , A. Sarlette , Incompatibility boundaries for properties of community partitions. arXiv:1603.00621 (2016).
- Clauset Aaron, Moore Cristopher, Newman M. E. J., Hierarchical structure and the prediction of missing links in networks, 10.1038/nature06830
- Peel, J. Adv. Inf. Fusion, 6, 119 (2011)
- Yang Zhao, Algesheimer René, Tessone Claudio J., A Comparative Analysis of Community Detection Algorithms on Artificial Networks, 10.1038/srep30750
- C. Cortes, D. Pregibon, C. Volinsky, Communities of interest, in Advances in Intelligent Data Analysis, vol. 2189 of Lecture Notes in Computer Science, F. Hoffmann, D. Hand, N. Adams, D. Fisher, G. Guimaraes, Eds. (Springer, 2001), pp. 105–114.
- D. Hric , T. P. Peixoto , S. Fortunato , Network structure, metadata and the prediction of missing nodes. arXiv:1604.00255 (2016).
- Lazega Emmanuel, The Collegial Phenomenon, ISBN:9780199242726, 10.1093/acprof:oso/9780199242726.001.0001
- Peixoto Tiago P., Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models, 10.1103/physreve.89.012804
- Peixoto Tiago P., Entropy of stochastic blockmodel ensembles, 10.1103/physreve.85.056122
- J. Parkkinen, J. Sinkkonen, A. Gyenge, S. Kaski, A block model suitable for sparse graphs, Proceedings of the 7th International Workshop on Mining and Learning with Graphs (MLG 2009), Leuven, Belgium, 2 to 4 July 2009, vol. 5.
- J. Cichoń, Z. Gołe¸biewski, On Bernoulli sums and Bernstein polynomials, 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’12), Montreal, Quebec, Canada, 18 to 22 June 2012, pp. 179–190.
- U. Brandes , D. Delling , M. Gaertler , R. Goerke , M. Hoefer , Z. Nikoloski , D. Wagner , Maximizing modularity is hard. arXiv:physics/0608255 (2006).
- Vinh Nguyen Xuan, Epps Julien, Bailey James, Information theoretic measures for clusterings comparison : is a correction for chance necessary?, 10.1145/1553374.1553511
- I. Borg, P. J. F. Groenen, Modern Multidimensional Scaling: Theory and Applications (Springer Science & Business Media, 2005).
- Haggerty Leanne S., Jachiet Pierre-Alain, Hanage William P., Fitzpatrick David A., Lopez Philippe, O’Connell Mary J., Pisani Davide, Wilkinson Mark, Bapteste Eric, McInerney James O., A Pluralistic Account of Homology: Adapting the Models to the Data, 10.1093/molbev/mst228
- Meilă Marina, Comparing Clusterings by the Variation of Information, Learning Theory and Kernel Machines (2003) ISBN:9783540407201 p.173-187, 10.1007/978-3-540-45167-9_14
- Erdős, Publ. Math. Debrecen, 6, 290 (1959)
- Decelle Aurelien, Krzakala Florent, Moore Cristopher, Zdeborová Lenka, Inference and Phase Transitions in the Detection of Modules in Sparse Networks, 10.1103/physrevlett.107.065701
- Girvan M., Newman M. E. J., Community structure in social and biological networks, 10.1073/pnas.122653799
Bibliographic reference |
Peel, Leto ; Larremore, Daniel B. ; Clauset, Aaron. The ground truth about metadata and community detection in networks. In: Science Advances, Vol. 3, no.5, p. e1602548 (2017) |
Permanent URL |
http://hdl.handle.net/2078.1/184574 |