# Structural Learning Seminar

Structural Learning Seminar is a seminar for researchers, Master’s and Ph.D. students of Skoltech, IITP RAS and HSE. This seminar is devoted to the problems of Mathematics and Computer Science and covers areas such as Mathematical Statistics, Machine Learning, Random Matrix Theory, Optimization, Information Theory, etc.

The organizer of the seminar: Vladimir Spokoiny.
Local organizers: Leonid Iosipoi, Alexey Naumov, Maxim Panov, Nikita Zhivotvskiy.

The seminar takes place on Thursday, room 615 at 5 PM.

Follow the seminar announcements in Telegram: https://t.me/strlearn

#### Spring 2018

 1.02.2018 Oleg Matveevskiy (MSU) Graph-Based Methods in Semi-Supervised Learning. In my review I would like to talk about Graph-Based Methods in Semi-Supervised Learning. Such algorithms as Label Propagation, Label Spreading and some others will be presented. We will consider the general mathematical principles of constructing these algorithms. 8.02.2018 Denis Belomestny (University of Duisburg-Essen; HSE; IITP) Introduction to MCMC algorithms and their convergence analysis. In this talk, we introduce main MCMC algorithms and discuss their basic properties. Some applications to Bayesian statistics will also be given. 15.02.2018 Vladimir Spokoiny (WIAS, Berlin) Community detection using adaptive weights The talk discusses the problem of network clustering. Inside of each cluster (community), the edge probability is higher than between clusters, this gives a hint how the community structure can be detected by a multi-scale testing procedure. The approach extends the clustering procedure from Efimov, Spokoiny, Adamyan (2017), http://arxiv.org/abs/1709.09102. 22.02.2018 at 18:10 CS HSE Eric Moulines (École Polytechnique) Perturbed Proximal Gradient Algorithms. We study a version of the proximal gradient algorithm for which the gradient is intractable and is approximated by Monte Carlo methods (and in particular Markov Chain Monte Carlo). We derive conditions on the step size and the Monte Carlo batch size under which convergence is guaranteed: both increasing batch size and constant batch size are considered. We also derive non- asymptotic bounds for an averaged version. Our results cover both the cases of biased and unbiased Monte Carlo approximation. To support our findings, we discuss the inference of a sparse generalized linear model with random effect and the problem of learning the edge structure and parameters of sparse undirected graphical models. Joint seminar with the colloquium of Faculty of Computer Science HSE. Please, register. 1.03.2018 Nikita Zhivotovskiy (Technion, Israel; IITP RAS, Skoltech) Random graphs: concentration of the spectral radius. In this talk, we consider several questions related to the concentration of the spectral radius of the random graph. We start with the Efron-Stein type inequalities for the moments of functions of independent random variables. Then we discuss the results of Bandeira and Van Handel applied to our case and finally we show the complete delocalization of the eigenvector corresponding to the largest eigenvalue for the adjacency matrix of a random graph. 22.03.2018 Denis Belomestny (University of Duisburg Essen, HSE) and Alexey Naumov (Skoltech, HSE, IITP) Statistical Learning Theory Day 2 Taming the uncertainty: new challenges in MC and MCMC methods We discuss recent developments and open problems in Monte-Carlo and Markov Chain Monte-Carlo methodology. Statistical challenges in Topological Data Analysis Topological Data Analysis can broadly be described as a collection of data analysis methods that find structure in data. We review some of these methods and open questions. 29.03.2018 Sergey Ivanov (Skoltech) Network Embeddings for Graph Classification Problem. A wide range of real world applications deal with network analysis and classification tasks. An ease of representing data with graphs makes them very valuable asset in any data mining toolbox; however, the complexity of working with graphs led researchers to seek for new ways of representing and analyzing graphs, of which network embeddings have become broadly popular due to their success in several machine learning areas such as graph classification, visualization, and pattern recognition. In this talk, we will present a history of solutions to graph classification problem, starting from graph kernels to the latest distributed embeddings based on convolutional neural networks and auto-encoders. In addition, we will present our recent result on anonymous walk embeddings, which show substantial uplift in classification accuracy to state-of-the-art algorithms. We will wrap up with a discussion of network embeddings applications and future directions. The slides are available here. 5.04.2018 Alexander Lipatev (MSU) On sharp error bounds in general linear model. General linear model was comprehensively described in seminal Introduction to Multivariate Analysis by T.W. Anderson. This model includes two tests that could be considered analogs of F-test in simple linear regression, namely Bartlett-Nanda-Pillai test and Lawley-Hotelling test. As usual for n (sample size)->infinity and p (dimensionality) fixed both statistics are asymptotically chi-square and for n->infinity, p->infinity, p/n 12.04.2018 Vladimir Spokoiny (WIAS, Berlin) Statistical inference with optimal transport (SLT day 3) Optimal transportation (OT) theory provides a powerful toolbox for data analysis in nonlinear spaces, where nonlinearity appears as an inevitable consequence of complexity of objects of interest (e.g. medical images or meta-genomes). OT opens a new direction in creating complete package of statistical instruments which takes into account the underlying geometry of an observed data set. In this talk we introduce basics on statistical inference based on OT and present recent results. Download presentation 19.04.2018 Yury Maximov (Skoltech) Mathematical Aspects of Emergency Control in Energy Grids In this talk, we consider various mathematical aspects of control in power grids. First, we consider a simple linearized model (DC) of power flow and analyze the probability of the system being outside the operational limits. In the second-half, we consider non-linear non-convex model (AC) of the power flow and provide an inner polyhedral approximation of the region where the solution of the power flow equations exists. After that, we propose the adaptive importance sampling to estimate the probability of the system to be out of the polyhedral security region. The results are joint with Michael Chertkov (Skoltech), and Art Owen (Stanford). 26.04.2018 Alexander Zimin (HSE,  Faculty of Mathematics) Multidimensional transport problem. Some partial cases are solved. For example: Find f(x, y) : 0, 1^2 \to \mathbb{R} satisfying the condition f(x, y) + f(y, z) + f(z, x) \le xyz with the maximal integral over square. Generalization will be illustrated by particular examples. 10.05.2018 Alexandra Suvorikova (WIAS, Berlin) Construction of non-asymptotic confidence sets in 2-Wasserstein space. In this paper, we consider a probabilistic setting where the probability measures are considered to be random objects. We propose a procedure of construction non-asymptotic confidence sets for empirical barycenters in 2-Wasserstein space and develop the idea further to construction of a non-parametric two-sample test that is then applied to the detection of structural breaks in data with complex geometry. 15.05.2018 Dmitry Zaporozhets (PDMI RAS) Non-absorption probability for multidimensional random walk. Suppose that we have a symmetric random walk in $R^d$ of the length $n$. We will calculate the probability that it’s convex hull does not contain the origin which is distribution-free. To this end, we will solve the equivalent problem of finding the number of Weyl chambers of type $B_n$ intersected by a generic linear subspace of $R^n$ of codimension $d$. This will be done using Whitney’s and Zaslavsky’s theorems from the theory of hyperplanes arrangements. The case $d=1$ corresponds to the formula by Sparre Andersen (1949) for the probability that such random walk in dimension one stays positive. Based on the joint works with Z.~Kabluchko, V.~Vysotsky. 17.05.2018 at 16:00 CS HSE Sergey Bobkov (University of Minnesota) Mini-course “Strong probability distances and limit theorems” The lectures explore strong distances in the space of probability distributions, including total variation, relative entropy, chi squared and more general Renyi/Tsallis informational divergences, as well as relative Fisher information. Special attention is given to the distances from the normal law. The first part of the course is devoted to the general theory, and the second part to the behavior of such informational distances along convolutions and associated central limit theorem. Lectures 1-2. General theory of Renyi and Tsallis informational divergences. Relative entropy and chi squared distance as particular cases. Relationship with total variation. Pinsker-type inequalities. Lectures 3-4. Entropy power inequality. Basic properties of relative entropy. Fisher information. Stam’s inequality and its applications. 24.05.2018 at 16:00 CS HSE Sergey Bobkov (University of Minnesota) Mini-course “Strong probability distances and limit theorems” The lectures explore strong distances in the space of probability distributions, including total variation, relative entropy, chi squared and more general Renyi/Tsallis informational divergences, as well as relative Fisher information. Special attention is given to the distances from the normal law. The first part of the course is devoted to the general theory, and the second part to the behavior of such informational distances along convolutions and associated central limit theorem. Lectures 5-6. Informational divergences from the normal law. Poincare and logarithmic Sobolev inequalities. Cramer’s exponential series and their applications to the chi-squared distance. Superadditivity with respect to dimension. Lectures 7-8. Local limit theorem (the central limit theorem for densities) and Edgeworth expansions. Convergence to the normal law in Renyi’s distances. 31.05.2018 Alexey Kroshnin (IITP RAS; HSE) On new complexity bounds for Sinkhorn’s and Iterative Bregman Projections algorithms Sinkhorn’s algorithm is a popular method of solving Monge-Kantorovich problem and Iterative Bregman Projections (Cuturi et al., 2014) is its generalization to other related problems e.g. computing of Wasserstein barycenters. In this talk we present a new complexity bound O(n2/ϵ2) for this family of algorithms what improves the best known bound O(n2/ϵ3) for Sinkhorn’s algorithm. 7.06.2018 Egor Kosov (HSE) Estimates of the total variation distance between polynoimial random variables. The talk discusses the recent results on the estimates of the total variation distance in terms of the Kantorovich distance between distributions of two polynoimials in Gaussian and logarithmically concave random vectors. The estimates are based on the fractional smoothness of the distributions.

#### Fall 2017

 22.06.2017 Любовь Маркович (ИППИ) Separability and entanglement in the Hilbert space reference frames related through the generic unitary transform for four level system Quantum correlations in the state of four-level atom are investigated by using generic unitary transforms of the classical (diagonal) density matrix. Partial cases of pure state, $X$-state, Werner state are studied in details. The geometrical meaning of  unitary Hilbert reference-frame rotations generating entanglement in the initially separable state is discussed. Characteristics of the entanglement in terms of concurrence, entropy and negativity are obtained as functions of the unitary matrix rotating the reference frame. Саша Дорофеева (МГУ) “A tutorial on spectral clustering” by Ulrike von Luxburg Spectral clustering is one of the most popular clustering algorithms. It is simple to implement, can be solved efficiently by standard software, and very often outperforms traditional clustering algorithms. In this tutorial different graph Laplacians and their basic properties are described, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Moreover, advantages and disadvantages of the different spectral clustering algorithms are discussed. 15.06.2017 Владимир Панов (ВШЭ) Low-frequency estimation for moving average Levy processes This talk is devoted to statistical inference for the stochastic integrals of the type $Z_{t}=\int_{R} K(t-s) dL_{s},$ where K is a deterministic function, and L is a Levy process.  We study the problem of statistical inference for the Levy density of the process L  from the observations of the process  Z. Максим Панов (МГУ) Stochastic Block Model with Overlaps This talk considers the parameter estimation problem in Stochastic Block Model with Overlaps (SBMO), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm called successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factorization. The proposed algorithm is provably consistent under SBMO with general conditions on the parameters of the model. SPOC is also shown to perform well experimentally in comparison to other algorithms. 25.05.2017 Юрий Янович(ИППИ) Асимптотические свойства процедур статистического оценивания на многообразиях Математическая модель многомерных нелинейных данных, названная моделью многообразия (manifold model), в соответствии с которой высокоразмерные данные расположены на (или вблизи) низкоразмерной нелинейной поверхности (многообразия) вложенного в высокоразмерное пространство наблюдений, появилась в 2000 году и набирает популярность в анализе данных. Разработано множество вычислительных процедур для работы с подобными данными в области оценивания многообразий (manifold learning). Их статистические свойства мало изучены и являются предметом исследования в работе. 18.05.2017 Владимир Спокойный(WIAS-Сколтех-ИППИ) Gaussian approximation of the squared norm of a high dimensional vector The paper considers the problem of approximating the distribution of the norm of a vector sum in a high dimension by a similar Gaussian vector sum with the same covariance structure. The aim is to understand how the accuracy depends on the sample size and the dimensionality of the space. 11.05.2017 Алексей Наумов(Сколтех-ИППИ) On some inequalities in probability theory and their applications to statistics and random matrix theory We will discuss Gaussian comparison and anti–concentration inequalities for norms of Gaussian random elements. We will also consider a general approach to estimation the moments of a class of statistics of independent random variables by Stein’s method. The talk will be based on recent joint results with V. Spokoiny, V. Ulyanov and A. Tikhomirov. 20.04.2017 Ростислав Гориславский(ВШЭ) Recovering communities in the general stochastic block model without knowing the parameters Community detection in graphs is one of the important problems in statistics and machine learning. In this talk, I am going to make a quick introduction to one of the fundamental random graph models stochastic block model. We will define two figures of merits: recovery and detection, and mention some thresholds for solving both problems. I will also introduce an algorithm for solving community detection problem in SBMs. 20.04.2017 Ростислав Гориславский (ВШЭ) Recovering communities in the general stochastic block model without knowing the parameters Community detection in graphs is one of the important problems in statistics and machine learning. In this talk, I am going to make a quick introduction to one of the fundamental random graph models stochastic block model. We will define two figures of merits: recovery and detection, and mention some thresholds for solving both problems. I will also introduce an algorithm for solving community detection problem in SBMs. Никита Ваганов (ВШЭ) Statistical self-similarity of financial data and stable-based modelling The first probabilistic framework for description price evolution was introduced by L.Bachelier in his thesis in 1900: the Brownian Motion was proposed as the first model for stock price evolution. Since then, many natural questions arose: how valid the normality assumption is, how to deal with skewed and heavy-tailed data? Seminal works of P.L\’evy, K.Ito, A.Y. Khinchin, B.Mandelbrot and others lead to the important concept of stability, and more general one of infinite divisibility, quite natural and popular in modern stochastic financial mathematics. I am going to talk about the concept of stability and the relative one of self-similarity, and how it is related to modeling of financial data (stock prices, interest rates, currency exchange rates). In particular, we will consider models with stochastic time and how to simulate them. My talk is based on the classical results in mathematical finance (Ane, Geman, Order Flow, Transaction Clock, and Normality of Asset Returns, The Journal of Finance, Vol. LV, NO 5, OCT 2000 among them), and modern research. 13.04.2017 Александра Суворикова (WIAS-ИППИ) Two-sample test based on Monge-Kantorovich statistical depth The lack of natural ordering in multidimensional spaces prevents immediate extension of ranking (two-sample) tests on the real line to higher dimensions. We present a new testing framework in $$R^d, d \geq 1$$ based on the concept of Monge-Kantorovich (MK) statistical depth by Chernozhukov et.al. (2014). The approach is distribution-free and relies on the fact, that optimal mapping (in Monge sense) preserves the relative ordering of the transported data cloud. In this talk we discuss MK depth and its application to distribution-free testing approaches Никита Пронько (ВШЭ) Транспорт эмпирических мер на последовательности низкой неравномерности Концепция “discrepancy” (расхождение) была введена более ста лет назад Германом Вейлем, и с тех пор нашла применение в численных методах квази-Монте-Карло, в контексте которой её можно называть мерой неравномерности, так как она используется в  качестве статистики, ограничивающей ошибку интегрирования на фиксированном наборе точек.  В 98 году Ф. Хикернеллом (arxive) была получена L_2 релаксация, делающая эту статистику удобной для вычислительных задач. В связи с недавними успехами методов транспорта меры, в частности работы Черножукова и других (arxive), встает вопрос о транспорте эмпирических мер на последовательности низкой неравномерности, то есть на детерминированный последовательности, сходящиеся к данному распределению наискорейшим образом в смысле слабого предела эмпирических мер. Я  расскажу  про основные математические результаты транспортной теории и теории меры неравномерности, и как они могут быль использованы вместе для обнаружения аномалий в наборах данных. 6.04.2017 Владимир Спокойный(WIAS-Сколтех-ИППИ) Subset selection using the “smallest accepted” rule Selection of significant features for a high dimensional regression is one of the classical problems in statistics and machine learning. This talk discusses a new procedure for subset selection based on the “smallest accepted rule” idea. We present and compare two ways of calibrating the procedure, one is based on the familywise error, and the other uses the False Discovery Rate criterium. 23.03.2017 Леонид Иосипой(Сколтех) Johnson-Lindenstrauss Lemma, Gordon’s Theorem and Restricted Isometry Property In this talk I am going to make a quick introduction to the Restricted Isometry Property (RIP). We will discuss generalizations of the Johnson-Lindenstrauss Lemma, and the use of Gordon’s theorem by understanding which projections are expected to keep the norm of sparse vectors and low-rank matrices. I will also mention several open problems in this active area of research. 16.03.2017 Мария Веретенникова(ВШЭ) Stochastic models for electroencephalogram data during coma In this research the focus is on finding stochastic processes which reflect the empirical properties of observed electroencephalogram time series. The talk will start with an overview of EEG as a technique for brain analysis. Then we will discuss several existing modeling approaches, their uses and limitations. After that we will advance to stochastic models suitable for describing EEG data in case of coma due to cerebral malaria. One of our methods is based on frequency band separation and the other just uses the observed properties of the increment process. 9.03.2017 Алексей Крошнин(Физтех-ИППИ) Расстояние Монжа-Канторовича и барицентры мер Существует множество способов определить расстояние между распределениями, но большинство из них не позволяют учитывать геометрию исходного пространства X, на котором рассматриваются меры. Транспортное расстояние Монжа-Канторовича примечательно тем, что в какой-то степени переносит глобальную структуру X на пространство мер. В докладе будут рассмотрены свойства пространства распределений, снабженного таким транспортным расстоянием. Также будет рассмотрено усреднение мер по отношению к расстоянию Монжа-Канторовича — барицентры Фреше, обобщающие понятие барицентров Вассерштейна. Будет показана их непрерывность и состоятельность в статистическом смысле. 02.03.2017 Денис Беломестный (University of Duisburg-Essen) Efficient simulation of Levy areas and related problems TBA