Publications
of Mikhail Belkin
[Journal papers] [Conference papers] [Ph.D. Thesis] [Book Chapter] [Technical Reports]
Journal Papers and preprints:
-
Convergence of Laplacian Eigenmaps [pdf, bib]
M. Belkin, P. Niyogi
preprint,
2008.
-
+ abstract
-
Geometrically based methods for various tasks of data analysis
have attracted considerable attention over the last few years.
In many of these algorithms, a central role is played by the
eigenvectors of the graph Laplacian of a
data-derived graph. In this paper, we show that if points are
sampled uniformly at random from an unknown submanifold ${\cal M}$
of $\R^N$, then the eigenvectors of a suitably constructed
graph Laplacian converge to the eigenfunctions of the Laplace
Beltrami operator on ${\cal M}$. This basic result directly
establishes the convergence of spectral manifold learning
algorithms such as Laplacian Eigenmaps and Diffusion Maps.
It also has implications for the understanding of geometric algorithms
in data analysis, computational harmonic analysis, geometric
random graphs, and graphics.
-
Manifold Regularization:
a Geometric Framework for Learning from Labeled and
Unlabeled Examples [pdf, bib]
M. Belkin, P. Niyogi, V. Sindhwani
Journal of Machine Learning Research, 7(Nov):2399-2434,
2006.
-
+ abstract
- We propose a family of learning algorithms based on a
new form of regularization that allows us to exploit
the geometry of the marginal distribution. We focus
on a semi-supervised framework that incorporates
labeled and unlabeled data in a general-purpose
learner. Some transductive graph learning algorithms
and standard methods including support vector
machines and regularized least squares can be
obtained as special cases. We use properties of
reproducing kernel Hilbert spaces to prove new
Representer theorems that provide theoretical basis
for the algorithms. As a result (in contrast to
purely graph-based approaches) we obtain a natural
out-of-sample extension to novel examples and so are
able to handle both transductive and truly
semi-supervised settings. We present experimental
evidence suggesting that our semi-supervised
algorithms are able to use unlabeled data
effectively. Finally we have a brief discussion of
unsupervised and fully supervised learning within our
general framework.
-
Towards a Theoretical Foundation for Laplacian-Based
Manifold Methods [pdf]
M. Belkin, P. Niyogi
Journal of Computer and System Sciences, 2008. Invited,
special issue on learning theory (to appear).
-
Consistency of Spectral Clustering [pdf, bib]
U. von Luxburg, M. Belkin, O. Bousquet,
The Annals of Statistics 2008, Vol. 36, No. 2, 555-586.
-
+ abstract
-
Consistency is a key property of statistical
algorithms when the data is drawn from some
underlying probability distribution. Surprisingly,
despite decades of work, little is known about
consistency of most clustering algorithms. In this
paper we investigate consistency of the popular
family of spectral clustering algorithms, which
clusters the data with the help of eigenvectors of
graph Laplacian matrices. We develop new methods to
establish that for increasing sample size, those
eigenvectors converge to the eigenvectors of certain
limit operators. As a result we can prove that one of
the two major classes of spectral clustering
(normalized clustering) converges under very general
conditions, while the other (unnormalized clustering)
is only consistent under strong additional
assumptions, which are not always satisfied in real
data. We conclude that our analysis provides strong
evidence for the superiority of normalized spectral
clustering.
-
Semi-supervised Learning on Riemannian Manifolds
[pdf, bib]
M. Belkin, P. Niyogi
Machine Learning, 56, Invited, secial Issue on
Clustering, 209-239, 2004.
-
+ abstract
-
We consider the general problem of utilizing both
labeled and unlabeled data to improve classification
accuracy. Under the assumption that the data lie on a
submanifold in a high dimensional space, we develop
an algorithmic framework to classify a partially
labeled data set in a principled manner. The central
idea of our approach is that classification functions
are naturally defined only on the submanifold in
question rather than the total ambient space. Using
the Laplace-Beltrami operator one produces a basis
(the Laplacian Eigenmaps) for a Hilbert space of
square integrable functions on the submanifold. To
recover such a basis, only unlabeled examples are
required. Once such a basis is obtained, training can
be performed using the labeled data set. Our
algorithm models the manifold using the adjacency
graph for the data and approximates the Laplace-
Beltrami operator by the graph Laplacian. We provide
details of the algorithm, its theoretical
justification, and several practical applications for
image, speech, and text classification.
-
Laplacian Eigenmaps for Dimensionality Reduction and
Data Representation [pdf, bib]
M. Belkin, P. Niyogi
Neural Computation, June 2003; 15 (6):1373-1396.
-
+ abstract
-
One of the central problems in machine learning and
pattern recognition is to develop appropriate
representations for complex data. We consider the
problem of constructing a representation for data
lying on a lowdimensional manifold embedded in a
high-dimensional space. Drawing on the correspondence
between the graph Laplacian, the Laplace Beltrami
operator on the manifold, and the connections to the
heat equation, we propose a geometrically motivated
algorithm for representing the highdimensional data.
The algorithm provides a computationally efficient
approach to non
dimensionality reduction that
has locality-preserving properties and a natural
connection to clustering. Some potential applications
and illustrative examples are discussed.
Refereed and Invited Conference
Proceedings:
-
Data Spectroscopy: Learning Mixture Models using Eigenspaces of
Convolution Operators
[pdf, bib]
T. Shi, M. Belkin, B. Yu, ICML 2008.
-
+ abstract
-
In this paper we develop a spectral frame-
work for estimating mixture distributions,
specifically Gaussian mixture models. In
physics, spectroscopy is often used for the
identification of substances through their
spectrum. Treating a kernel function K(x, y)
as “light” and the sampled data as “sub-
stance”, the spectrum of their interaction
(eigenvalues and eigenvectors of the kernel
matrix K) unveils certain aspects of the un-
derlying parametric distribution p, such as
the parameters of a Gaussian mixture. Our
approach extends the intuitions and analyses
underlying the existing spectral techniques,
such as spectral clustering and Kernel Prin-
cipal Components Analysis (KPCA).
We construct algorithms to estimate param-
eters of Gaussian mixture models, includ-
ing the number of mixture components, their
means and covariance matrices, which are im-
portant in many practical applications. We
provide a theoretical framework and show en-
couraging experimental results.
-
Discrete Laplace Operator for Meshed Surfaces
[pdf, bib]
M. Belkin, J. Sun, Y. Wang, SOCG 2008.
-
+ abstract
-
In recent years a considerable amount of work in graphics and geometric optimization used tools
based on the Laplace-Beltrami operator on a surface. The applications of the Laplacian include mesh
editing, surface smoothing, and shape interpolations among others. However, it has been shown [12, 23,
25] that the popular cotangent approximation schemes do not provide convergent point-wise (or even
L^2) estimates, while many applications rely on point-wise estimation. Existence of such schemes has
been an open question [12].
In this paper we propose the first algorithm for approximating the Laplace operator of a surface from
a mesh with point-wise convergence guarantees applicable to arbitrary meshed surfaces. We show that
for a sufficiently fine mesh over an arbitrary surface, our mesh Laplacian is close to the Laplace-Beltrami
operator on the surface at every point of the surface.
Moreover, the proposed algorithm is simple and easily implementable. Experimental evidence shows
that our algorithm exhibits convergence empirically and outperforms cotangent-based methods in providing
accurate approximation of the Laplace operator for various meshes.
-
The value of labeled and unlabeled examples when the model is imperfect
[pdf, bib]
K. Sinha, M. Belkin, NIPS 2007.
-
+ abstract
-
Semi-supervised learning, i.e. learning from both labeled and unlabeled data has
received signi.cant attention in the machine learning literature in recent years.
Still our understanding of the theoretical foundations of the usefulness of unlabeled
data remains somewhat limited. The simplest and the best understood situation
is when the data is described by an identi.able mixture model, and where
each class comes from a pure component. This natural setup and its implications
ware analyzed in [11, 5]. One important result was that in certain regimes, labeled
data becomes exponentially more valuable than unlabeled data.
However, in most realistic situations, one would not expect that the data comes
from a parametric mixture distribution with identifiable components. There have
been recent efforts to analyze the non-parametric situation, for example, .cluster.
and manifold assumptions have been suggested as a basis for analysis. Still,
a satisfactory and fairly complete theoretical understanding of the nonparametric
problem, similar to that in [11, 5] has not yet been developed.
In this paper we investigate an intermediate situation, when the data comes from a
probability distribution, which can be modeled, but not perfectly, by an identi.able
mixture distribution. This seems applicable to many situation, when, for example,
a mixture of Gaussians is used to model the data. the contribution of this paper is
an analysis of the role of labeled and unlabeled data depending on the amount of
imperfection in the model.
-
Convergence of Laplacian
Eigenmaps
M. Belkin, P. Niyogi, NIPS 2006.
-
On the Relation Between Low Density Separation,
Spectral Clustering and Graph Cuts [pdf, bib]
H. Narayanan, M. Belkin, P. Niyogi, NIPS 2006.
-
+ abstract
-
One of the intuitions underlying many graph-based
methods for clustering and semi-supervised learning,
is that class or cluster boundaries pass through
areas of low probability density. In this paper we
provide some formal analysis of that notion for a
probability distribution. We introduce a notion of
weighted boundary volume, which measures the length
of the class/cluster boundary weighted by the density
of the underlying probability distribution. We show
that sizes of the cuts of certain commonly used data
adjacency graphs converge to this continuous weighted
volume of the boundary.
-
Heat Flow and a Faster Algorithm to Compute the
Surface Area of a Convex Body [pdf, bib]
M. Belkin, H. Narayanan, P. Niyogi, FOCS 2006.
-
+ abstract
-
We draw on the observation that the amount of heat
diffusing outside of a heated body in a short period
of time is proportional to its surface area, to
design a simple algorithm for approximating the
surface area of a convex body given by a membership
oracle. Our method has a complexity of O*(n^4), where
n is the dimension, compared to O*(n^8.5) for the
previous best algorithm. We show that our complexity
cannot be improved given the current state-of-the-art
in volume estimation.
-
Maximum Margin Semi-Supervised Learning for Structured
Variables [pdf, bib]
Y. Altun, D. McAllester, M. Belkin, NIPS 2005.
-
+ abstract
-
Many real-world classification problems involve the
prediction of multiple inter-dependent variables
forming some structural dependency. Recent progress
in machine learning has mainly focused on supervised
classification of such structured variables. In this
paper, we investigate structured classification in a
semi-supervised setting. We present a discriminative
approach that utilizes the intrinsic geometry of
input patterns revealed by unlabeled data points and
we derive a maximum-margin formulation of
semi-supervised learning for structured variables.
Unlike transductive algorithms, our formulation
naturally extends to new test points.
-
Beyond the Point Cloud: from Transductive to
Semi-supervised Learning [pdf, bib]
V. Sindhwani, P. Niyogi, M. Belkin, ICML 2005.
-
+ abstract
-
Due to its occurrence in engineering domains and
implications for natural learning, the problem of
utilizing unlabeled data is attracting increasing
attention in machine learning. A large body of recent
literature has focussed on the transductive setting
where labels of unlabeled examples are estimated by
learning a function defined only over the point cloud
data. In a truly semi-supervised setting however, a
learning machine has access to labeled and unlabeled
examples and must make predictions on data points
never encountered before. In this paper, we show how
to turn transductive and standard supervised learning
algorithms into semi-supervised learners. We
construct a family of data-dependent norms on
Reproducing Kernel Hilbert Spaces (RKHS). These norms
allow us to warp the structure of the RKHS to reflect
the underlying geometry of the data. We derive
explicit formulas for the corresponding new kernels.
Our approach demonstrates state of the art
performance on a variety of classification tasks.
-
A Co-Regularization Approach to Semi-supervised
Learning with Multiple Views [pdf]
V. Sindhwani, P. Niyogi, M. Belkin,
ICML Workshop on Learning with Multiple Views, 2005.
-
+ abstract
-
The Co-Training algorithm uses unlabeled
examples in multiple views to bootstrap classifiers in each view, typically in a greedy
manner, and operating under assumptions
of view-independence and compatibility. In
this paper, we propose a Co-Regularization
framework where classifiers are learnt in each
view through forms of multi-view regularization. We propose algorithms within this
framework that are based on optimizing measures of agreement and smoothness over la-
beled and unlabeled examples. These algorithms naturally extend standard
regularization methods like Support Vector Machines
(SVM) and Regularized Least squares (RLS)
for multi-view semi-supervised learning, and
inherit their benefits and applicability to
high-dimensional classification problems. An
empirical investigation is presented that confirms the promise of this approach.
-
Linear Manifold Regularization for Large Scale
Semi-supervised Learning
V. Sindhwani, P. Niyogi, M. Belkin, S. Keerthi
ICML Workshop on Learning with Partially Classified
Training Data, 2005
-
Towards a Theoretical Foundation for Laplacian-Based
Manifold Methods [pdf, bib]
M. Belkin, P. Niyogi, COLT 2005
-
+ abstract
-
The Co-Training algorithm uses unlabeled
examples in multiple views to bootstrap classifiers in each view, typically in a greedy
manner, and operating under assumptions
of view-independence and compatibility. In
this paper, we propose a Co-Regularization
framework where classifiers are learnt in each
view through forms of multi-view regularization. We propose algorithms within this
framework that are based on optimizing measures of agreement and smoothness over la-
beled and unlabeled examples. These algorithms naturally extend standard
regularization methods like Support Vector Machines
(SVM) and Regularized Least squares (RLS)
for multi-view semi-supervised learning, and
inherit their benefits and applicability to
high-dimensional classification problems. An
empirical investigation is presented that confirms the promise of this approach.
-
On Manifold Regularization [pdf]
M. Belkin, P. Niyogi, V. Sindhwani, AISTATS 2005
-
Limits of Spectral Clustering [pdf]    Outstanding
Student Paper Award
U. von Luxburg, O. Bousquet, M. Belkin, NIPS 2004.
-
Regularization and Semi-supervised Learning on Large
Graphs [pdf, bib]
M. Belkin, I. Matveeva, P. Niyogi, COLT 2004.
-
On the Convergence of Spectral Clustering on Random
Samples: the Normalized Case [pdf]
U. von Luxburg, O. Bousquet, M. Belkin, COLT 2004.
-
Tikhonov Regularization and Semi-supervised Learning
on Large Graphs
M. Belkin, I. Matveeva, P. Niyogi
ICASSP 2004, Special Session: Manifolds and Geometry in
Signal Processing (Invited paper).
-
Using Manifold Structure for Partially Labeled
Classification [pdf]
M. Belkin, P. Niyogi, NIPS 2002
-
Laplacian Eigenmaps and Spectral Techniques for
Embedding and Clustering [pdf]
M. Belkin, P. Niyogi, NIPS 2001.
-
Using Eigenvectors of the Bigram Graph to Infer
Morpheme Identity
[pdf]
M. Belkin, J. Goldsmith
Morphological and Phonological Learning: Proceedings of
the 6th Workshop
of the ACL Special Interest Group in Computational
Phonology (SIGPHON), Philadelphia, July 2002, pp.
41-47.
Ph.D. Thesis:
-
Problems of Learning on Manifolds
[pdf, bib]
Ph.D. Dissertation, University of Chicago, Dept. of
Mathematics, 2003.
Advisor: Partha Niyogi.
Book chapter:
-
The Geometric Basis of Semi-supervised Learning
[pdf]
V. Sindhwani, M. Belkin, P. Niyogi.
Book chapter in "Semi-supervised Learning", Chapelle, O.,
B. Schölkopf and A. Zien, Eds., MIT Press, 2006.
Technical Reports:
-
Consistency of Spectral Clustering [pdf]
U. von Luxburg, M. Belkin, O. Bousquet
Max Planck Insitute for Biological Cybernetics Technical
Report TR-134, 2004. (The Annals of Statistics, to
appear)
-
+ abstract
-
Consistency is a key property of statistical
algorithms when the data is drawn from some
underlying probability distribution. Surprisingly,
despite decades of work, little is known about
consistency of most clustering algorithms. In this
paper we investigate consistency of the popular
family of spectral clustering algorithms, which
clusters the data with the help of eigenvectors of
graph Laplacian matrices. We develop new methods to
establish that for increasing sample size, those
eigenvectors converge to the eigenvectors of certain
limit operators. As a result we can prove that one of
the two major classes of spectral clustering
(normalized clustering) converges under very general
conditions, while the other (unnormalized clustering)
is only consistent under strong additional
assumptions, which are not always satisfied in real
data. We conclude that our analysis provides strong
evidence for the superiority of normalized spectral
clustering.
-
Manifold Regularization: a Geometric Framework for
Learning from Examples
M. Belkin, P. Niyogi, V. Sindhwani
University of Chicago CS Technical Report TR-2004-06,
2004.
-
Regression and Regularization on Large Graphs
[link]
M. Belkin, I. Matveeva, P. Niyogi
The University of Chicago CS Technical Report
TR-2003-11.
-
Semi-supervised learning on manifolds
[link]
M. Belkin, P. Niyogi
The University of Chicago CS Technical Report
TR-2002-12.
-
Laplacian Eigenmaps for Dimensionality Reduction and
Data Representation
[link]
M. Belkin, P. Niyogi
The University of Chicago CS Technical Report TR-2002-01.