Structural Learning Seminar

Structural Learning Seminar is a seminar for researchers, Master’s and Ph.D. students of Skoltech, IITP RAS and HSE. This seminar is devoted to the problems of Mathematics and Computer Science and covers areas such as Mathematical Statistics, Machine Learning, Random Matrix Theory, Optimization, Information Theory, etc.

The organizer of the seminar: Vladimir Spokoiny.
Local organizers: Leonid Iosipoi, Alexey Naumov, Maxim Panov, Nikita Zhivotvskiy.

The seminar takes place on Thursday, room 615 at 5 PM.

Follow the seminar announcements in Telegram:

Spring 2018


Oleg Matveevskiy (MSU)

Graph-Based Methods in Semi-Supervised Learning.

In my review I would like to talk about Graph-Based Methods in Semi-Supervised Learning. Such algorithms as Label Propagation, Label Spreading and some others will be presented. We will consider the general mathematical principles of constructing these algorithms.


Denis Belomestny (University of Duisburg-Essen; HSE; IITP)

Introduction to MCMC algorithms and their convergence analysis.

In this talk, we introduce main MCMC algorithms and discuss their basic properties. Some applications to Bayesian statistics will also be given.


Vladimir Spokoiny (WIAS, Berlin)

Community detection using adaptive weights

The talk discusses the problem of network clustering. Inside of each cluster (community), the edge probability is higher than between clusters, this gives a hint how the community structure can be detected by a multi-scale testing procedure. The approach extends the clustering procedure from Efimov, Spokoiny, Adamyan (2017),


at 18:10


Eric Moulines (École Polytechnique)

Perturbed Proximal Gradient Algorithms.

We study a version of the proximal gradient algorithm for which the gradient is intractable and is approximated by Monte Carlo methods (and in particular Markov Chain Monte Carlo). We derive conditions on the step size and the Monte Carlo batch size under which convergence is guaranteed: both increasing batch size and constant batch size are considered. We also derive non- asymptotic bounds for an averaged version. Our results cover both the cases of biased and unbiased Monte Carlo approximation. To support our findings, we discuss the inference of a sparse generalized linear model with random effect and the problem of learning the edge structure and parameters of sparse undirected graphical models.

Joint seminar with the colloquium of Faculty of Computer Science HSE. Please, register.


Nikita Zhivotovskiy (Technion, Israel; IITP RAS, Skoltech)

Random graphs: concentration of the spectral radius.

In this talk, we consider several questions related to the concentration of the spectral radius of the random graph. We start with the Efron-Stein type inequalities for the moments of functions of independent random variables. Then we discuss the results of Bandeira and Van Handel applied to our case and finally we show the complete delocalization of the eigenvector corresponding to the largest eigenvalue for the adjacency matrix of a random graph.


Denis Belomestny (University of Duisburg Essen, HSE) and Alexey Naumov (Skoltech, HSE, IITP)

Statistical Learning Theory Day 2

Taming the uncertainty: new challenges in MC and MCMC methods

We discuss recent developments and open problems in Monte-Carlo and Markov Chain Monte-Carlo methodology.

Statistical challenges in Topological Data Analysis

Topological Data Analysis can broadly be described as a collection of data analysis methods that find structure in data. We review some of these methods and open questions.


Sergey Ivanov (Skoltech)

Network Embeddings for Graph Classification Problem.

A wide range of real world applications deal with network analysis and classification tasks. An ease of representing data with graphs makes them very valuable asset in any data mining toolbox; however, the complexity of working with graphs led researchers to seek for new ways of representing and analyzing graphs, of which network embeddings have become broadly popular due to their success in several machine learning areas such as graph classification, visualization, and pattern recognition. In this talk, we will present a history of solutions to graph classification problem, starting from graph kernels to the latest distributed embeddings based on convolutional neural networks and auto-encoders. In addition, we will present our recent result on anonymous walk embeddings, which show substantial uplift in classification accuracy to state-of-the-art algorithms. We will wrap up with a discussion of network embeddings applications and future directions.

The slides are available here.


Alexander Lipatev (MSU)

On sharp error bounds in general linear model.

General linear model was comprehensively described in seminal Introduction to Multivariate Analysis by T.W. Anderson. This model includes two tests that could be considered analogs of F-test in simple linear regression, namely Bartlett-Nanda-Pillai test and Lawley-Hotelling test. As usual for n (sample size)->infinity and p (dimensionality) fixed both statistics are asymptotically chi-square and for n->infinity, p->infinity, p/n


Vladimir Spokoiny (WIAS, Berlin)

Statistical inference with optimal transport (SLT day 3)

Optimal transportation (OT) theory provides a powerful toolbox for data analysis in nonlinear spaces, where nonlinearity appears as an inevitable consequence of complexity of objects of interest (e.g. medical images or meta-genomes). OT opens a new direction in creating complete package of statistical instruments which takes into account the underlying geometry of an observed data set. In this talk we introduce basics on statistical inference based on OT and present recent results.

Download presentation


Yury Maximov (Skoltech)

Mathematical Aspects of Emergency Control in Energy Grids

In this talk, we consider various mathematical aspects of control in power grids. First, we consider a simple linearized model (DC) of power flow and analyze the probability of the system being outside the operational limits. In the second-half, we consider non-linear non-convex model (AC) of the power flow and provide an inner polyhedral approximation of the region where the solution of the power flow equations exists. After that, we propose the adaptive importance sampling to estimate the probability of the system to be out of the polyhedral security region. The results are joint with Michael Chertkov (Skoltech), and Art Owen (Stanford).


Alexander Zimin (HSE,  Faculty of Mathematics)

Multidimensional transport problem.

Some partial cases are solved. For example: Find f(x, y) : 0, 1^2 \to \mathbb{R} satisfying the condition f(x, y) + f(y, z) + f(z, x) \le xyz with the maximal integral over square. Generalization will be illustrated by particular examples.


Alexandra Suvorikova (WIAS, Berlin)

Construction of non-asymptotic confidence sets in 2-Wasserstein space.

In this paper, we consider a probabilistic setting where the probability measures are considered to be random objects. We propose a procedure of construction non-asymptotic confidence sets for empirical barycenters in 2-Wasserstein space and develop the idea further to construction of a non-parametric two-sample test that is then applied to the detection of structural breaks in data with complex geometry.


Dmitry Zaporozhets (PDMI RAS)

Non-absorption probability for multidimensional random walk.

Suppose that we have a symmetric random walk in $R^d$ of the length $n$. We will calculate the probability that it’s convex hull does not contain the origin which is distribution-free. To this end, we will solve the equivalent problem of finding the number of Weyl chambers of type $B_n$ intersected by a generic linear subspace of $R^n$ of codimension $d$. This will be done using Whitney’s and Zaslavsky’s theorems from the theory of hyperplanes arrangements. The case $d=1$ corresponds to the formula by Sparre Andersen (1949) for the probability that such random walk in dimension one stays positive. Based on the joint works with Z.~Kabluchko, V.~Vysotsky.


at 16:00

Sergey Bobkov (University of Minnesota)

Mini-course “Strong probability distances and limit theorems”

The lectures explore strong distances in the space of probability distributions, including total variation, relative entropy, chi squared and more general Renyi/Tsallis informational divergences, as well as relative Fisher information. Special attention is given to the distances from the normal law. The first part of the course is devoted to the general theory, and the second part to the behavior of such informational distances along convolutions and associated central limit theorem.

Lectures 1-2.

General theory of Renyi and Tsallis informational divergences. Relative entropy and chi squared distance as particular cases. Relationship with total variation. Pinsker-type inequalities.

Lectures 3-4.

Entropy power inequality. Basic properties of relative entropy. Fisher information. Stam’s inequality and its applications.


Mikhail Urusov (University of Duisburg-Essen)




at 16:00

Sergey Bobkov (University of Minnesota)

Mini-course “Strong probability distances and limit theorems”

The lectures explore strong distances in the space of probability distributions, including total variation, relative entropy, chi squared and more general Renyi/Tsallis informational divergences, as well as relative Fisher information. Special attention is given to the distances from the normal law. The first part of the course is devoted to the general theory, and the second part to the behavior of such informational distances along convolutions and associated central limit theorem.

Lectures 5-6.

Informational divergences from the normal law. Poincare and logarithmic Sobolev inequalities. Cramer’s exponential series and their applications to the chi-squared distance. Superadditivity with respect to dimension.

Lectures 7-8.

Local limit theorem (the central limit theorem for densities) and Edgeworth expansions. Convergence to the normal law in Renyi’s distances.

Fall 2017


Nikita Zhivoroskiy (Skoltech, IITP)

A simple proof of Haussler’s lemma

A well-known result of Haussler says that if the subset V of the cube {0, 1}^n has the Vapnik-Chervonenkis dimension d, then the packing number V with respect to the Hamming metric is bounded above (сn/k) ^ d, where с is an absolute constant, and k is the smallest Hamming distance between the elements of the maximum package. This upper bound is used in the theory of empirical processes, statistics and computational geometry. In this talk we present a simple proof, which is a consequence of minimax analysis of classification algorithms.


Andzhey Koziuk (WIAS Berlin)

Gaussian comparison and approximation

Modern applications challenge classical statistics to understand its limitations and study underlying modelling assumptions. Particularly, important to realise how the noise misspecification, parametric assumption and asymptotic techniques influence statistical inference. It turns out that some of the applications are robust to the modelling limitations (see Spokoiny 2012 [1]). In the spirit of the challenge, we aim at studying explicitly closeness of multivariate probabilities addressing both parametric restriction and asymptotic techniques. First, we limit ourselves to the multivariate Gaussian family and compare two members of a family using Kolmogorov distance on the set of centred Euclidean balls. Moving further we extend comparison technique to an arbitrary family and prove non-classical Berry-Essen inequality. 

Articles: 1. Spokoiny, V., 2012. Parametric estimation. Finite sample theory. The Annals of Statistics, 40(6), pp.2877-2909.


Nikita Puchkin (MIPT-Skoltech)

Multiclass learning via spatial aggregation of local likelihood estimates

We consider a problem of multiclass classification and apply a procedure based on spatial stagewise aggregation of local likelihood estimates, proposed by D. Belomestny and V. Spokoiny. We establish convergence rates, obtain an oracle inequality and discuss the influence of the number of classes.

Igor Silin (MIPT-Skoltech)

Posterior distribution of spectral projector of covariance matrix

In this talk, we will discuss the spectral projectors of the covariance matrix in the multivariate normal model in the Bayesian framework. Specifically, we will prove a recent result that the posterior distribution of the Frobenius distance between the projector and the empirical projector can be approximated by the distribution of the Euclidean norm of the Gaussian random vector with properly chosen covariance. The conditions on a prior distribution will be described and several examples of possible priors will be provided. The result can be applied to the construction of the confidence sets for the true spectral projector as an alternative to the bootstrap technique developed by Naumov, Spokoiny and Ulyanov.


Vladimir Spokoiny (WIAS Berlin, Skoltech, IITP)

Research topics overview.

We will discuss the research projects published at topics for students


Alexander Podkopaev (MIPT-Skoltech)

Non-Gaussian Component Analysis: from Stein-like identity to spectral methods

Extracting relevant low-dimensional information from high-dimensional data is a common pre-processing task. In general, dimension reduction attempts to deal with the poor statistical outcomes associated with the “curse of dimensionality.” As to NGCA model, the main assumption is that the data is a mixture of two sources: a low-dimensional signal, which has a non-Gaussian distribution, and independent high-dimensional Gaussian noise. In this talk, we will begin with a general description of the topic and then discuss different approaches to extract the information needed for construction of a projector onto low-dimensional non-Gaussian subspace from the observations: beginning with Stein-identity based methods, moving forward towards spectral methods based on the analysis of reweighed covariance matrix where weights are chosen according to Gaussian damping technique, which will be also covered in the talk.


Anastasia Koloskova (Skoltech)

Steepest coordinate descent for SVM & Lasso.

We will discuss steepest coordinate descent algorithms for smooth and non-smooth problems. The talk will consist of two parts. The first part will be about extending existing analysis of the steepest coordinate descent with GS-s steepest rule for some class of non-smooth functions. The second part will be about how can we implement GS and GS-s rules efficiently.

Arshak Minasyan (Skoltech)

Minimization on Stiefel manifold with gradient type methods

Consider the minimization of smooth convex matrix function on Stiefel manifold, i.e. on the set of orthogonal matrices. The easiest example of such problem is the minimzation of quadratic form on the sphere (i.e. finding the eigenvector associated with the smallest eigenvalue). Another important example of such a problem is the robust version of principal component analysis (PCA) suggested by B. Polyak and M. Khlebnikov (Automatics & Telemechanics, 2017, N3). In the talk we will consider gradient type methods for such problems.


Leonid Iosipoi (Skoltech)

Variance reduction in Monte Carlo via empirical variance minimization.

One of the methods for increasing the efficiency of Monte Carlo simulation is by reducing the variance of simulation estimates. We are going to discuss an approach which is based on minimization of the empirical variance over a suitable class of zero mean control functionals. The convergence analysis of this approach and analyze its complexity will be presented.


Quentin Paris (HSE)

On the suprema of shifted-type empirical processes and their use in learning theory.

The talk presents new upper bounds on the suprema of shifted-type empirical processes. We first investigate adaptations of the symmetrization and contraction principle. We then provide upper bounds for generalized versions of so-called offset Rademacher processes as well as for shifted versions of so-called square and multiplier processes. The interest for our main results is illustrated in the context of statistical learning theory.


Alexey Kroshnin (IITP-MIPT)

Central Limit Theorem in the Wasserstein Distance

Estimation of convergence rate in the central limit theorem is an important and extensively developed area in probability theory. The Berry-Esseen theorem states that in the one-dimensional case the rate is n^{-1/2}. Similar results hold for various metrics and in multidimensional case as well. However, in this setting the rate also depends on dimension d, and obtaining explicit bounds w.r.t both d and n is still an open problem in many cases. In the talk, we consider the CLT with respect to the Wasserstein distance, which is an important and quite natural tool for investigation of distribution convergence, especially in high dimensions. Next, we describe new near optimal bounds for convergence rate in a special case considered by A. Zhai in the work arXiv:1602.05565.


Thibaut Le Gouic (Ecole centrale de Marseille, Marseille)

Recovering metric from full ordinal information.

Given a geodesic space (E,d), we show that full ordinal knowledge on the metric d determines uniquely – up to a constant factor – the metric d. For a subspace E_n of n points of E, converging in Hausdorff distance to E, we construct a metric d_n on E_n, based only on the knowledge of the ordinal information of d on E_n and establish a sharp upper bound of the Gromov-Hausdorff distance between (E_n,d_n) and (E,d).


Alexey Zaitsev (Skoltech)

Deep Boosting for Imbalanced Classification

We can upper bound binary classification error using the sum of empirical margin error and complexity penalty. For ensembles there are two ways to bound complexity of the model. The naive one uses the same bound for all weak classifiers in ensemble, while recent results suggest that we can provide separate complexity penalty for each weak classifier thus leading to more complex on average models. We consider boosting procedure based on these separate penalties named Deep Boosting by M.Mohri. Then we consider how to use Deep Boosting approach to improve ensemble classifiers for imbalanced classification problems.


at 16:00


Konstantin Tikhomirov (Princeton University)

Invertibility of non-Hermitian random square matrices: sparse and non-centered models

Invertibility of square matrices is very important in connection with the study of the spectral distribution, as well as numerical analysis, where the condition number of the matrix serves as a measure of loss of precision when solving systems of linear equations. In this talk I will consider two models of random matrices — non-centered square matrices with continuous distributions of the entries and adjacency matrices of random regular directed graphs — and discuss their invertibility. The talk is partially based on a joint work with A. Litvak, A. Lytova, N. Tomczak-Jaegermann and P. Youssef.

Elizaveta Rebrova (University of Michigan)

Non-asymptotic spectral properties of heavy-tailed random matrices

We study large n by n random matrices with i.i.d. entries. If the distribution of the entries have mean zero and at least gaussian decay, then it is known that the largest singular value is O(sqrt(n)) and the smallest singular value is O(1/sqrt(n)) with high probability. But would they typically have the same order if the entries distribution has heavier tails, say, just two finite moments? For s_min the answer turns out to be ”yes,” and for s_max the answer is ”no” (however, a local regularization is possible). I will talk about what is this local regularization, why the second moment condition makes sense, and, time permitting, discuss some applications and proof ideas. The talk is based on our joint work with K. Tikhomirov and R. Vershynin.


Alexandra Suvorikova (WIAS Berlin)

Gaussian process forecast with multidimensional distributional input

In this work we focus on forecasting a Gaussian process indexed by probability distributions. We introduce a family of positive definite kernels constructed with the use of optimal transportation distance and provide their probabilistic understanding. The technique allows to forecast efficiently Gaussian processes, which opens new perspective in Gaussian process modelling.

Spring 2017


Любовь Маркович (ИППИ)

Separability and entanglement in the Hilbert space reference frames related through the generic unitary transform for four level system

Quantum correlations in the state of four-level atom are investigated by using generic unitary transforms of the classical (diagonal) density matrix. Partial cases of pure state, $X$-state, Werner state are studied in details. The geometrical meaning of  unitary Hilbert reference-frame rotations generating entanglement in the initially separable state is discussed. Characteristics of the entanglement in terms of concurrence, entropy and negativity are obtained as functions of the unitary matrix rotating the reference frame.

Саша Дорофеева (МГУ)

“A tutorial on spectral clustering” by Ulrike von Luxburg

Spectral clustering is one of the most popular clustering algorithms. It is simple to implement, can be solved efficiently by standard software, and very often outperforms traditional clustering algorithms. In this tutorial different graph Laplacians and their basic properties are described, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Moreover, advantages and disadvantages of the different spectral clustering algorithms are discussed.


Владимир Панов (ВШЭ)

Low-frequency estimation for moving average Levy processes

This talk is devoted to statistical inference for the stochastic integrals of the type
\[Z_{t}=\int_{R} K(t-s) dL_{s},\] where K is a deterministic function, and L is a Levy process.  We study the problem of statistical inference for the Levy density of the process L  from the observations of the process  Z.

Максим Панов (МГУ)

Stochastic Block Model with Overlaps

This talk considers the parameter estimation problem in Stochastic Block Model with Overlaps (SBMO), which is a quite general instance of random graph model allowing for overlapping community structure. We present the new algorithm called successive projection overlapping clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factorization. The proposed algorithm is provably consistent under SBMO with general conditions on the parameters of the model. SPOC is also shown to perform well experimentally in comparison to other algorithms.


Юрий Янович(ИППИ)

Асимптотические свойства процедур статистического оценивания на многообразиях

Математическая модель многомерных нелинейных данных, названная моделью многообразия (manifold model), в соответствии с которой высокоразмерные данные расположены на (или вблизи) низкоразмерной нелинейной поверхности (многообразия) вложенного в высокоразмерное пространство наблюдений, появилась в 2000 году и набирает популярность в анализе данных. Разработано множество вычислительных процедур для работы с подобными данными в области оценивания многообразий (manifold learning). Их статистические свойства мало изучены и являются предметом исследования в работе.


Владимир Спокойный(WIAS-Сколтех-ИППИ)

Gaussian approximation of the squared norm of a high dimensional vector

The paper considers the problem of approximating the distribution of the norm of a vector sum in a high dimension by a similar Gaussian vector sum with the same covariance structure. The aim is to understand how the accuracy depends on the sample size and the dimensionality of the space.


Алексей Наумов(Сколтех-ИППИ)

On some inequalities in probability theory and their applications to statistics and random matrix theory

We will discuss Gaussian comparison and anti–concentration inequalities for norms of Gaussian random elements. We will also consider a general approach to estimation the moments of a class of statistics of independent random variables by Stein’s method. The talk will be based on recent joint results with V. Spokoiny, V. Ulyanov and A. Tikhomirov.


Ростислав Гориславский(ВШЭ)

Recovering communities in the general stochastic block model without knowing the parameters

Community detection in graphs is one of the important problems in statistics and machine learning. In this talk, I am going to make a quick introduction to one of the fundamental random graph models stochastic block model. We will define two figures of merits: recovery and detection, and mention some thresholds for solving both problems. I will also introduce an algorithm for solving community detection problem in SBMs.


Ростислав Гориславский (ВШЭ)

Recovering communities in the general stochastic block model without knowing the parameters

Community detection in graphs is one of the important problems in statistics and machine learning. In this talk, I am going to make a quick introduction to one of the fundamental random graph models stochastic block model. We will define two figures of merits: recovery and detection, and mention some thresholds for solving both problems. I will also introduce an algorithm for solving community detection problem in SBMs.

Никита Ваганов (ВШЭ)

Statistical self-similarity of financial data and stable-based modelling

The first probabilistic framework for description price evolution was introduced by L.Bachelier in his thesis in 1900: the Brownian Motion was proposed as the first model for stock price evolution. Since then, many natural questions arose: how valid the normality assumption is, how to deal with skewed and heavy-tailed data? Seminal works of P.L\’evy, K.Ito, A.Y. Khinchin, B.Mandelbrot and others lead to the important concept of stability, and more general one of infinite divisibility, quite natural and popular in modern stochastic financial mathematics. I am going to talk about the concept of stability and the relative one of self-similarity, and how it is related to modeling of financial data (stock prices, interest rates, currency exchange rates). In particular, we will consider models with stochastic time and how to simulate them. My talk is based on the classical results in mathematical finance (Ane, Geman, Order Flow, Transaction Clock, and Normality of Asset Returns, The Journal of Finance, Vol. LV, NO 5, OCT 2000 among them), and modern research.


Александра Суворикова (WIAS-ИППИ)

Two-sample test based on Monge-Kantorovich statistical depth

The lack of natural ordering in multidimensional spaces prevents immediate extension of ranking (two-sample) tests on the real line to higher dimensions. We present a new testing framework in \(R^d, d \geq 1\) based on the concept of Monge-Kantorovich (MK) statistical depth by Chernozhukov (2014). The approach is distribution-free and relies on the fact, that optimal mapping (in Monge sense) preserves the relative ordering of the transported data cloud. In this talk we discuss MK depth and its application to distribution-free testing approaches

Никита Пронько (ВШЭ)

Транспорт эмпирических мер на последовательности низкой неравномерности

Концепция “discrepancy” (расхождение) была введена более ста лет назад Германом Вейлем, и с тех пор нашла применение в численных методах квази-Монте-Карло, в контексте которой её можно называть мерой неравномерности, так как она используется в  качестве статистики, ограничивающей ошибку интегрирования на фиксированном наборе точек.  В 98 году Ф. Хикернеллом (arxive) была получена L_2 релаксация, делающая эту статистику удобной для вычислительных задач. В связи с недавними успехами методов транспорта меры, в частности работы Черножукова и других (arxive), встает вопрос о транспорте эмпирических мер на последовательности низкой неравномерности, то есть на детерминированный последовательности, сходящиеся к данному распределению наискорейшим образом в смысле слабого предела эмпирических мер. Я  расскажу  про основные математические результаты транспортной теории и теории меры неравномерности, и как они могут быль использованы вместе для обнаружения аномалий в наборах данных.


Владимир Спокойный(WIAS-Сколтех-ИППИ)

Subset selection using the “smallest accepted” rule

Selection of significant features for a high dimensional regression is one of the classical problems in statistics and machine learning. This talk discusses a new procedure for subset selection based on the “smallest accepted rule” idea. We present and compare two ways of calibrating the procedure, one is based on the familywise error, and the other uses the False Discovery Rate criterium.


Леонид Иосипой(Сколтех)

Johnson-Lindenstrauss Lemma, Gordon’s Theorem and Restricted Isometry Property

In this talk I am going to make a quick introduction to the Restricted Isometry Property (RIP). We will discuss generalizations of the Johnson-Lindenstrauss Lemma, and the use of Gordon’s theorem by understanding which projections are expected to keep the norm of sparse vectors and low-rank matrices. I will also mention several open problems in this active area of research.


Мария Веретенникова(ВШЭ)

Stochastic models for electroencephalogram data during coma

In this research the focus is on finding stochastic processes which reflect the empirical properties of observed electroencephalogram time series. The talk will start with an overview of EEG as a technique for brain analysis. Then we will discuss several existing modeling approaches, their uses and limitations. After that we will advance to stochastic models suitable for describing EEG data in case of coma due to cerebral malaria. One of our methods is based on frequency band separation and the other just uses the observed properties of the increment process.


Алексей Крошнин(Физтех-ИППИ)

Расстояние Монжа-Канторовича и барицентры мер

Существует множество способов определить расстояние между распределениями, но большинство из них не позволяют учитывать геометрию исходного пространства X, на котором рассматриваются меры. Транспортное расстояние Монжа-Канторовича примечательно тем, что в какой-то степени переносит глобальную структуру X на пространство мер. В докладе будут рассмотрены свойства пространства распределений, снабженного таким транспортным расстоянием. Также будет рассмотрено усреднение мер по отношению к расстоянию Монжа-Канторовича — барицентры Фреше, обобщающие понятие барицентров Вассерштейна. Будет показана их непрерывность и состоятельность в статистическом смысле.


Денис Беломестный (University of Duisburg-Essen)

Efficient simulation of Levy areas and related problems


Fall 2016


Назарий Тупица (ФРТК МФТИ)

Линейная регрессия со случайным дизайном

В докладе будут коротко рассмотрены теоремы Фишера и Вилкса, а также неравенство концентрации для score-вектора в случае известного шума, затем они будут обобщены на случай неизвестного шума. Для этого будет построена оценка ковариационной матрицы score-вектора с помощью бутсрепа. Будет рассмотрено качество этой оценки.

Евгений Маршаков (Сколтех)

Differential Geometry with Applications to Low-Rank Matrix Completion

We will tell a few words about Riemannian optimization for low-rank matrix completion problem, more precisely, about algorithm that minimizes the least-square distance on the sampling set over fixed-rank matrices manifold. During the talk we will discuss in detail differential geometry that appears in proposed algorithm.


Андрей Купавский (EPFL, МФТИ)

Верхние и нижние оценки размера эпсилон-сетей

Пусть дано множество [n] из n элементов и некоторое семейство подмножеств этого множества. Подножество X в [n] называется эпсилон-сетью, если любое множество из семейства размера больше эпсилон n пересекается с X. В этом докладе я расскажу про последние результаты, касающиеся размера минимальных эпсилон-сетей для различных систем множеств. В частности, речь пойдет о верхней оценке Чана и др., основанной на так называемой shallow cell complexity, и о нижних оценках размера эпсилон-сетей для семейств множеств, заданных геометрически.


Денис Беломестный (University of Duisburg-Essen)

Threshold estimation for sparse high-dimensional deconvolution

The problem of covariance estimation for a p-dimensional normal vector X ? N(0, ?) observed with additional noise is studied. Only a very general non-parametric assumption is imposed on the distribution of the noise. In this semi-parametric deconvolution problem spectral thresholding estimators are constructed that adapt to sparsity in ?. We prove that the minimax convergence rates logarithmic in log p/n with n being the sample size.


Евгений Фролов (Сколтех)

Tensor Methods and Recommender Systems

Доклад посвящен применению тензорных методов в рекомендательных системах. Будут разобраны задачи построения моделей поведения пользователей на основе тензорных разложений, таких как каноническое разложение и разложение Такера. Области применения рассматриваемых подходов будут включать контекстно-зависимые рекомендации, рекомендации на основе ключевых слов или меток (тэгов), рекомендации с учетом трендов и изменений во времени. Будет также рассмотрена задача моделирования субъективной функции полезности и ряд концептуальных проблем, устраняемых с помощью тензорной формулировки.


Павел Яськов (МИ РАН им. В.А. Стеклова)

Вокруг теоремы Марченко-Пастура

Доклад будет строиться вокруг теоремы Марченко-Пастура для выборочных ковариационных матриц и условия слабой концентрации для квадратичных форм. Будет продемонстрировано, что данное условие при некоторых дополнительных ограничениях является необходимым и достаточным для выполнимости теоремы Марченко-Пастура. Также будет рассказано, как данное условие приводит к глобальному анизотропному закону для резольвент выборочных ковариационных матриц.

Александр Подкопаев (Сколтех)

Dimension reduction in unsupervised learning via Non-Gaussian Component Analysis

The report will be focused on the dimension reduction topic. In this talk, we will discuss methods based on Non-gaussian component analysis (NGCA). It can be formulated as a problem of identifying a low-dimensional non-Gaussian component of the whole distribution in an iterative and structure adaptive way. In more details, we will mainly consider NGCA procedure of identification of the non-Gaussian subspace using Principle Component Analysis (PCA) method, sparse NGCA (SNGCA) which replaces the PCA-based procedure with an algorithm based on convex projection and approach for direct estimation of the projector on the target subspace based on semidefinite programming. Recovering the structure when its effective dimension is unknown will be also discussed.


Quentin Paris (ФЭН ВШЭ)

Recent advances on the occupancy problem

From a classical point of view, the occupancy problem (also known as the urn scheme) is to describe the spread (or repartition) of a random sample over its support when it is drawn from a discrete distribution. In this talk we will start by briefly describing the problem and state a few well known results concerning the asymptotic behaviour and concentration of the so called occupancy probabilities. In the second part of the talk, we will present new results concerning finite sample (upper and lower) bounds for the occupancy probabilities. Finally, the third and last part of the talk will discuss some results and perspectives outside of the i.i.d. and discrete context: arbitrary distributions in a metric space and Markov chains.

Игорь Силин (Сколтех)

Эффективное обучение модели Изинга в произвольных графах

В докладе рассматривается алгоритм восстановления графа по наблюдениям состояний системы в модели Изинга, которая так же известна как Markov Random Field. Последние годы данная тема представляет большой интерес в статистике, машинном обучении и статистической физике. В начале будет введена вероятностная модель, затем будут отмечены некоторые ее свойства и поставлена задача. Затем будет предложен алгоритм, решающий задачу. Наконец, будет дано теоретическое объяснение его корректности. Предлагаемый алгоритм не требует никаких сильных предположений, кроме идентифицируемости модели. В основе алгоритма лежат свойства такой величины как “условное влияние” одной вершины на другую.


Алексей Зайцев (ИППИ РАН)

Минимаксно оптимальное отношение размеров выборок данных разной точности

В докладе речь пойдет о минимаксной ошибке интерполяции для регрессии на основе гауссовских процессов для многомерного входа. Как следствие получается минимаксная ошибка интерполяции для широко используемой модели данных разной точности. Такая минимаксная ошибка позволяет установить оптимальное соотношение между размерами выборок разной точности и сравнить ошибки, получаемые при использовании только данных высокой точности или данных разной точности для заданного бюджета. Получаемое соотношение зависит только от “гладкости” функций разной точности, коэффициента корреляции и относительной стоимости вычислений точной и грубой функции.


Любовь Маркович (ИПУ РАН)

Вероятностные, информационные и корреляционные характеристики квантовых систем

Доклад будет посвящен краткому обзору проведенных мною исследований свойств квантовых систем, включая квантовые корреляции, явления запутанности, соотношений неопределенностей и неравенств на статистические характеристики (энтропии и информации) квантовых систем кубитов и кудитов, систем с непрерывными переменными типа квантовых цепочек и многоуровневых атомов. Будут приведены вводные положения и примеры использования в квантовой фильтрации.

Никита Животовский (ИППИ РАН)

High probability upper bounds, online learning, and stability

In statistical learning, the guarantees for many learning algorithms are provided both in expectation and with high probability with respect to the learning sample. Usually, the concentration of measure inequalities provide sharp tails and reduce the problem to the analysis in expectation. But when considering some learning algorithms other than empirical risk minimization powerful concentration results are of no use and tight high probability bounds become an inaccessible dream. Simultaneously, tight results in expectation can be proven as a two line exercise via stability arguments. In this talk, we will discuss several successful approaches towards getting high probability bounds including the online to batch conversion and the analysis of monotonic learning rules.


Massih-Reza Amini (Universite Grenoble Alpes)

Multi-class to Binary reduction of Large-scale classification Problems

In the context of large-scale problems, traditional multiclass classification approaches have to deal with class imbalancement and complexity issues which make them inoperative in some extreme cases. In this talk we present a transformation that reduces the initial multiclass classification of examples into a binary classification of pairs of examples and classes. We present generalization error bounds that exhibit the interdependency between the pairs of examples and which recover known results on binary classification with i.i.d. data. We show the efficiency of the deduced algorithm compared to state-of-the-art multiclass classification strategies on two large-scale document collections especially in the interesting case where the number of classes becomes very large.

Никита Пучкин (ФУПМ МФТИ)

Spatial aggregation of local likelyhood estimates with applications to classification
(D. Belomestny, V. Spokoiny)

This paper presents a new method for spatially adaptive local (constant) likelihood estimation which applies to a broad class of nonparametric models, including the Gaussian, Poisson and binary response models. The main idea of the method is, given a sequence of local likelihood estimates (“weak” estimates), to construct a new aggregated estimate whose pointwise risk is of order of the smallest risk among all “weak” estimates. We also propose a new approach toward selecting the parameters of the procedure by providing the prescribed behavior of the resulting estimate in the simple parametric situation. We establish a number of important theoretical results concerning the optimality of the aggregated estimate. In particular, our “oracle” result claims that its risk is, up to some logarithmic multiplier, equal to the smallest risk for the given family of estimates. The performance of the procedure is illustrated by application to the classification problem. A numerical study demonstrates its reasonable performance in simulated and real-life examples.

Spring 2016


Никита Животовский (ИППИ РАН)

Схемы сжатия выборок

В докладе речь пойдет о классической задаче статистического обучения: каково наименьшее количество объектов нужно оставить в выборке, чтобы можно было безошибочно восстановить ответы на исходной выборке. Будет обсуждаться связь схем сжатия выборки с обучаемостью классов, минимаксными классификаторами и бустингом.


Алексей Наумов (ВМК МГУ, ИППИ РАН)

Structured Random Matrices by Ramon van Handel (Часть II)

В докладе речь пойдет о спектральных характеристиках случайных матриц, наделенных некоторой структурой (например, разреженные матрицы Вигнера или матрицы, элементы которых обладают заданным шаблоном дисперсии). Мы рассмотрим некоторые последние результаты в этой области, а также обсудим открытые проблемы.

Во второй части доклада мы получим верхнюю оценку для мат. ожидания нормы отклонения выборочной ковариационной матрицы от истинного значения в терминах эффективного ранга (Теорема 5.1 в прикрепленном файле).


Владимир Григорьевич Спокойный (институт им. Вейерштрасса, Берлин)

Bootstrap tuning in ordered model selection

In the problem of model selection for a given family of linear estimators, ordered by their variance, we offer a new “smallest accepted” approach motivated by Lepski’s method and multiple testing theory. The procedure selects the smallest model which satisfies an acceptance rule based on comparison with all larger models. The method is completely data-driven and does not use any prior information about the variance structure of the noise: its parameters are adjusted to the underlying possibly heterogeneous noise by the so-called “propagation condition” using a wild bootstrap method.


Валентин Сотсков (мехмат МГУ)

Roughness penalty for dimension reduction

В докладе речь пойдет об одном простом способе уменьшить сложность модели — пенализации. В случае, когда размерность оцениваемого параметра является большой (например, когда она сравнима с размерностью выборки), многие классические статистические результаты неприменимы, так как соответствующая ошибка пропорциональна этой размерности. Мы рассмотрим пенализацию квадратичного типа (регуляризация Тихонова) и ее влияние на &laquoэффективную размерность&raquo.


Александра Суворикова (университет им. Гумбольдта, Берлин)

Wasserstein barycenters of probability measures

In this work we discuss Wasserstein barycenters of probability measures with support in R^d. In particular, we consider a probabilistic setting where the probability measures are considered to be random objects. Our main focus lies on Gaussian probability measures. We study the explicit form of the Wasserstein distance and the barycenter for these measures. We also consider a probability distribution on a set of Gaussian measures with commuting covariance matrices for which we prove the explicit form of the population barycenter and the consistency of the sample estimate. In addition, we proof the validity of a bootstrap procedure that allows to compute confidence sets for the population barycenter of this distribution.


Никита Животовский (ИППИ РАН)

Равномерные законы больших чисел и локализация

В докладе речь пойдет об условиях равномерной сходимости частот к вероятностям. Будут рассматриваться не только абсолютные отклонения, но и относительные. Для последних в общих условиях будут обсуждаться эффекты локализации. В конце доклада будут обсуждены приложения описанной теории в различных задачах математической статистики.


Назарий Тупица (ФРТК МФТИ)

Linear regression with random design

В докладе будет рассмотрена модель линейной регрессии со случайными предикторами (random design). Мы обсудим, при каких условиях на предикторы, возможно заменить случайную матрицу ее математическим ожиданием. В конце доклада будут сформулированы теоремы Фишера и Вилкса для выборки фиксированного размера с учетом возможной мисспецификации модели (ошибочном параметрическом предположении о модели).


Бекжан Керимкулов (ФКН ВШЭ)

User-friendly tail bounds for sums of random matrices by Joel A. Tropp

В этот раз мы обсудим работу &laquoUser-friendly tail bounds for sums of random matrices&raquo (Joel A. Tropp, 2011). В ней рассматриваются основные вероятностные неравенства (Azuma, Bernstein, Chernoff, Hoeffding, McDiarmid) для уклонений максимального собственного значения суммы независимых случайных самосопряженных матриц. Результаты сформулированы так, что все условия легко проверить на конкретных примерах.


Алексей Наумов (ВМК МГУ, ИППИ РАН)

Structured Random Matrices by Ramon van Handel (Часть I)

В докладе речь пойдет о спектральных характеристиках случайных матриц, наделенных некоторой структурой (например, разреженные матрицы Вигнера или матрицы, элементы которых обладают заданным шаблоном дисперсии). Мы рассмотрим некоторые последние результаты в этой области, а также обсудим открытые проблемы.


Максим Панов (ИППИ РАН)

Finite Sample Bernstein – von Mises Theorem for Semiparametric Problems

Байесовский подход является одним из центральных направлений развития современной математической статистики. В данном подходе изучается апостериорное распределение параметров модели, т.е. распределение, получаемое в результате уточнения априорного распределения по результатам наблюдения данных. В этот раз мы обсудим теорему Бернштейна — фон Мизеса, которая утверждает асимптотическую близость апостериорного распределения к нормальному со средним, близким к оценке максимума правдоподобия, и с апостериорной ковариационной матрицей, близкой к обратной информационной матрице Фишера. Эта теорема дает теоретическое обоснование байесовских вычислений оценки максимума правдоподобия и ее ковариации.


Александра Дорофеева (ВМК МГУ)

Sharp deviation bounds for quadratic forms

В этот раз мы обсудим оценку вероятности больших уклонений для квадратичной формы случайного вектора. Мы рассмотрим гауссовский случай и, более общий, случай случайного вектора с конечным экспоненциальным моментом. Результаты, о которых пойдет речь, являются неасимптотическими, а все константы посчитаны явно.


Леонид Иосипой (мехмат МГУ)

Parametric estimation. Fisher and Wilks Theorems

В докладе речь пойдет об обобщении двух важных результатов из классической асимптотической статистики: теоремах Фишера и Вилкса. Мы обсудим, как их с помощью можно описать некоторые свойства оценок максимального правдоподобия и построить неасимптотический доверительный интервал. Результаты будут сформулированы для выборок фиксированного размера с учетом возможной мисспецификации (что наше параметрическое предположение о модели может быть неверным).