Topics for Students

Optimal transportation and statistics

Many problems in modern data analysis require an ability to deal with large data sets with complex underlying geometric structure. Usually observed points admit representation in terms of measures, supported on \mathbb{R}^d, d \geq 1 or more complicated spaces, e.g. high- or infinite dimensional spaces of features. As an example one can consider a set of medical images, long sequences of symbols (DNA, proteins) e.t.c Among the most popular approaches of statistical inference for this type of spaces one can highlight methodology, based on so-called optimal-transportation distance (e.g Monge-Kantorovich and Hellinger-Kantorovich distances). OT distance between two measures is the minimum amount of work one has to do to convert one object to the other with respect to some predefined cost function. A deep connection between transportation distance and metric geometry of an underlying space, measures are supported on, makes it a powerful tool for the statistical inference and gives rise to many interesting problems. We enlist some of them below. For all those, who are interested in optimal transportation problems we recommend the following excellent surveys: [SAN15], [VIL08]

[SAN15] Santambrogio F. Optimal transport for applied mathematicians. Birkäuser, NY, 2015.
[VIL08] Villani, C. Optimal transport: old and new. Springer Science and Business Media, 2008.