Model selection in overlapping community detection

Overview

Communities in networks can overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem. Recently the efficient algorithm was proposed [1], which solves the problem of overlapping community detection in elegant probabilistic manner related to non-negative matrix factorization and logistic matrix factorization. We aim to follow this general direction, while focusing on automatic selection of number of communities. This can be done via Bayesian approach using general machinery of Dirichlet processes and variational inference.

Tasks

  1. Study the BigCLAM algorithm.
  2. Learn about Dirichlet processes and variational inference approach.
  3. Develop and implement the algorithm for automatic determination of number of communities.
  4. Evaluate efficiency on model and real data.

References

[1] Jaewon Yang and Jure Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 587–596, 2013.