Selected papers:   (Complete publication list is here.)
-
Polynomial Learning of Distribution Families [arxiv]
M. Belkin, K. Sinha, FOCS 2010.
-
+ abstract
-
The question of polynomial learnability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learnability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learnability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call "polynomial families" in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a high-dimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions.
-
Convergence of Laplacian Eigenmaps [pdf, bib]
M. Belkin, P. Niyogi
preprint, short version NIPS 2008.
-
+ abstract
-
Geometrically based methods for various tasks of data analysis
have attracted considerable attention over the last few years.
In many of these algorithms, a central role is played by the
eigenvectors of the graph Laplacian of a
data-derived graph. In this paper, we show that if points are
sampled uniformly at random from an unknown submanifold ${\cal M}$
of $\R^N$, then the eigenvectors of a suitably constructed
graph Laplacian converge to the eigenfunctions of the Laplace
Beltrami operator on ${\cal M}$. This basic result directly
establishes the convergence of spectral manifold learning
algorithms such as Laplacian Eigenmaps and Diffusion Maps.
It also has implications for the understanding of geometric algorithms
in data analysis, computational harmonic analysis, geometric
random graphs, and graphics.
-
On Learning with Integral Operators
[pdf, bib]
L. Rosasco, M.Belkin, E. De Vito, Journal of Machine Learning Research, vol.11, pp.905-934, 2010.
-
+ abstract
-
A large number of learning algorithms, for example,
spectral clustering, kernel Principal Components
Analysis and many manifold methods are
based on estimating eigenvalues and eigenfunctions
of operators defined by a similarity function or a
kernel, given empirical data. Thus for the analysis
of algorithms, it is an important problem to
be able to assess the quality of such approximations.
The contribution of our paper is two-fold.
First, we use a technique based on a concentration
inequality for Hilbert spaces to provide new simplified
proofs for a number of results in spectral
approximation. Second, using these methods we
provide several new results for estimating spectral
properties of the graph Laplacian operator extending
and strengthening results from [28].
-
Data spectroscopy: eigenspaces of convolution operators and clustering [pdf, bib]
Tao Shi, Mikhail Belkin, Bin Yu
The Annals of Statistics, vol. 37, Number 6B (2009), 3960-3984.
-
+ abstract
-
This paper focuses on obtaining clustering information about a
distribution from its i.i.d. samples. We develop theoretical results to
understand and use clustering information contained in the eigenvectors
of data adjacency matrices based on a radial kernel function
with a sufficiently fast tail decay. In particular, we provide population
analyses to gain insights into which eigenvectors should be used and
when the clustering information for the distribution can be recovered
from the sample. We learn that a fixed number of top eigenvectors
might at the same time contain redundant clustering information and
miss relevant clustering information. We use this insight to design
the Data Spectroscopic clustering (DaSpec) algorithm that utilizes
properly selected eigenvectors to determine the number of clusters
automatically and to group the data accordingly. Our findings extend
the intuitions underlying existing spectral techniques such as
spectral clustering and Kernel Principal Components Analysis, and
provide new understanding into their usability and modes of failure.
Simulation studies and experiments on real world data are conducted
to show the potential of our algorithm. In particular, DaSpec is found
to handle unbalanced groups and recover clusters of different shapes
better than the competing methods.
- Towards a Theoretical Foundation for Laplacian-Based
Manifold Methods [pdf, bib]
M. Belkin, P. Niyogi
Journal of Computer and System Sciences, 2008.
Volume 74, Issue 8, pp. 1289-1308.
Special Issue on Learning Theory, invited (short version in COLT 2005).
-
+ abstract
-
In recent years manifold methods have attracted a considerable amount of atten-
tion in machine learning. However most algorithms in that class may be termed
"manifold-motivated" as they lack any explicit theoretical guarantees. In this pa-
per we take a step towards closing the gap between theory and practice for a class
of Laplacian-based manifold methods. These methods utilize the graph Laplacian
associated to a data set for a variety of applications in semi-supervised learning,
clustering, data representation.
We show that under certain conditions the graph Laplacian of a point cloud of
data samples converges to the Laplace-Beltrami operator on the underlying mani-
fold. Theorem 3.1 contains the .rst result showing convergence of a random graph
Laplacian to the manifold Laplacian in the context of machine learning.
-
Discrete Laplace Operator for Meshed Surfaces
[pdf, code, bib]
M. Belkin, J. Sun, Y. Wang, 24th Annual Symposium on Computational Geometry (SOCG) 2008.
-
+ abstract
-
In recent years a considerable amount of work in graphics and geometric optimization used tools
based on the Laplace-Beltrami operator on a surface. The applications of the Laplacian include mesh
editing, surface smoothing, and shape interpolations among others. However, it has been shown [12, 23,
25] that the popular cotangent approximation schemes do not provide convergent point-wise (or even
L^2) estimates, while many applications rely on point-wise estimation. Existence of such schemes has
been an open question [12].
In this paper we propose the first algorithm for approximating the Laplace operator of a surface from
a mesh with point-wise convergence guarantees applicable to arbitrary meshed surfaces. We show that
for a sufficiently fine mesh over an arbitrary surface, our mesh Laplacian is close to the Laplace-Beltrami
operator on the surface at every point of the surface.
Moreover, the proposed algorithm is simple and easily implementable. Experimental evidence shows
that our algorithm exhibits convergence empirically and outperforms cotangent-based methods in providing
accurate approximation of the Laplace operator for various meshes.
-
Consistency of Spectral Clustering [pdf, bib]
U. von Luxburg, M. Belkin, O. Bousquet,
The Annals of Statistics 2008, Vol. 36, No. 2, 555-586.
-
+ abstract
-
Consistency is a key property of statistical
algorithms when the data is drawn from some
underlying probability distribution. Surprisingly,
despite decades of work, little is known about
consistency of most clustering algorithms. In this
paper we investigate consistency of the popular
family of spectral clustering algorithms, which
clusters the data with the help of eigenvectors of
graph Laplacian matrices. We develop new methods to
establish that for increasing sample size, those
eigenvectors converge to the eigenvectors of certain
limit operators. As a result we can prove that one of
the two major classes of spectral clustering
(normalized clustering) converges under very general
conditions, while the other (unnormalized clustering)
is only consistent under strong additional
assumptions, which are not always satisfied in real
data. We conclude that our analysis provides strong
evidence for the superiority of normalized spectral
clustering.
-
Manifold Regularization:
a Geometric Framework for Learning from Labeled and
Unlabeled Examples [pdf, bib]
M. Belkin, P. Niyogi, V. Sindhwani
Journal of Machine Learning Research, 7(Nov):2399-2434,
2006.
-
+ abstract
- We propose a family of learning algorithms based on a
new form of regularization that allows us to exploit
the geometry of the marginal distribution. We focus
on a semi-supervised framework that incorporates
labeled and unlabeled data in a general-purpose
learner. Some transductive graph learning algorithms
and standard methods including support vector
machines and regularized least squares can be
obtained as special cases. We use properties of
reproducing kernel Hilbert spaces to prove new
Representer theorems that provide theoretical basis
for the algorithms. As a result (in contrast to
purely graph-based approaches) we obtain a natural
out-of-sample extension to novel examples and so are
able to handle both transductive and truly
semi-supervised settings. We present experimental
evidence suggesting that our semi-supervised
algorithms are able to use unlabeled data
effectively. Finally we have a brief discussion of
unsupervised and fully supervised learning within our
general framework.
-
Heat Flow and a Faster Algorithm to Compute the
Surface Area of a Convex Body [pdf, bib]
M. Belkin, H. Narayanan, P. Niyogi, FOCS 2006.
-
+ abstract
-
We draw on the observation that the amount of heat
diffusing outside of a heated body in a short period
of time is proportional to its surface area, to
design a simple algorithm for approximating the
surface area of a convex body given by a membership
oracle. Our method has a complexity of O*(n^4), where
n is the dimension, compared to O*(n^8.5) for the
previous best algorithm. We show that our complexity
cannot be improved given the current state-of-the-art
in volume estimation.
-
Semi-supervised Learning on Riemannian Manifolds
[pdf, bib]
M. Belkin, P. Niyogi
Machine Learning, 56 (invited, special Issue on
clustering), 209-239, 2004 (short version in NIPS 2002).
-
+ abstract
-
We consider the general problem of utilizing both
labeled and unlabeled data to improve classification
accuracy. Under the assumption that the data lie on a
submanifold in a high dimensional space, we develop
an algorithmic framework to classify a partially
labeled data set in a principled manner. The central
idea of our approach is that classification functions
are naturally defined only on the submanifold in
question rather than the total ambient space. Using
the Laplace-Beltrami operator one produces a basis
(the Laplacian Eigenmaps) for a Hilbert space of
square integrable functions on the submanifold. To
recover such a basis, only unlabeled examples are
required. Once such a basis is obtained, training can
be performed using the labeled data set. Our
algorithm models the manifold using the adjacency
graph for the data and approximates the Laplace-
Beltrami operator by the graph Laplacian. We provide
details of the algorithm, its theoretical
justification, and several practical applications for
image, speech, and text classification.
-
Laplacian Eigenmaps for Dimensionality Reduction and
Data Representation [pdf, bib]
M. Belkin, P. Niyogi
Neural Computation, June 2003; 15 (6):1373-1396 (short version in NIPS 2001).
-
+ abstract
-
One of the central problems in machine learning and
pattern recognition is to develop appropriate
representations for complex data. We consider the
problem of constructing a representation for data
lying on a lowdimensional manifold embedded in a
high-dimensional space. Drawing on the correspondence
between the graph Laplacian, the Laplace Beltrami
operator on the manifold, and the connections to the
heat equation, we propose a geometrically motivated
algorithm for representing the highdimensional data.
The algorithm provides a computationally efficient
approach to non
dimensionality reduction that
has locality-preserving properties and a natural
connection to clustering. Some potential applications
and illustrative examples are discussed.
|