# Topics for Students

- Community detection with adaptive weights

Responsible persons: Maxim Panov, Igor Silin, Kirill Efimov, Alexey Naumov, Vladimir Spokoiny - Semisupervised learning

Responsible persons: Maxim Panov, Igor Silin, Kirill Efimov, Vladimir Spokoiny - Comparison of random graphs

Responsible persons: Maxim Panov, Igor Silin, Kirill Efimov, Vladimir Spokoiny - Efficient dimension reduction

Responsible persons: Alexey Naumov, Igor Silin, Alexander Podkopaev, Vladimir Spokoiny - Gaussian approximation and bootstrap confidence set

Responsible persons: Alexey Naumov, Arshak Minasyan, Vladimir Spokoiny - Adaptive topological data analysis

Responsible persons: Alexey Naumov, Kirill Efimov, Vladimir Spokoiny - Bound of the density of the weighted non-centred chi-squared distribution (solved by Y. Tavyrikov)

Responsible persons: Alexey Naumov - Central limit theorem in high-dimension

Responsible persons: Alexey Naumov - The development of new efficient Markov Chain Monte Carlo methods for solving high-dimensional integration and optimization problems in machine learning

Responsible persons: Denis Belomestny, Leonid Iosipoi - Decisions in energy markets via deep learning and optimal control

Responsible persons: Denis Belomestny, Vladimir Spokoiny - Sparse inductive matrix completion

Responsible persons: Maxim Panov - Model selection in overlapping community detection

Responsible persons: Maxim Panov - Improper prediction of node labels in graphs

Responsible persons: Maxim Panov - Local algorithms for community detection

Responsible persons: Maxim Panov - Inference for hidden Markov models

### Optimal transportation and statistics

Many problems in modern data analysis require an ability to deal with large data sets with complex underlying geometric structure. Usually observed points admit representation in terms of measures, supported on , or more complicated spaces, e.g. high- or infinite dimensional spaces of features. As an example one can consider a set of medical images, long sequences of symbols (DNA, proteins) e.t.c Among the most popular approaches of statistical inference for this type of spaces one can highlight methodology, based on so-called optimal-transportation distance (e.g Monge-Kantorovich and Hellinger-Kantorovich distances). OT distance between two measures is the minimum amount of work one has to do to convert one object to the other with respect to some predefined cost function. A deep connection between transportation distance and metric geometry of an underlying space, measures are supported on, makes it a powerful tool for the statistical inference and gives rise to many interesting problems. We enlist some of them below. For all those, who are interested in optimal transportation problems we recommend the following excellent surveys: [SAN15], [VIL08]

- Hypothesis testing with Hellinger-Kantorovich distance

Responsible persons: Alexanda Suvorikova, Pavel Dvurechensky, Alexey Kroshnin, Andrey Sobolevskii, Vladimir Spokoiny - Domain adaptation using optimal transportation

Responsible persons: Alexanda Suvorikova, Pavel Dvurechensky, Alexey Kroshnin, Andrey Sobolevskii, Vladimir Spokoiny - Bootstrap for empirical barycenters

Responsible persons: Alexanda Suvorikova, Alexey Kroshnin, Andrey Sobolevskii, Vladimir Spokoiny - Two sample test for high dimensional data using Monge-Kantorovich transform

Responsible persons: Alexanda Suvorikova, Alexey Kroshnin, Andrey Sobolevskii, Vladimir Spokoiny

**References:**

[SAN15] Santambrogio F. Optimal transport for applied mathematicians. Birkäuser, NY, 2015.

[VIL08] Villani, C. Optimal transport: old and new. Springer Science and Business Media, 2008.