Publications of Mikhail Belkin

[Journal papers]  [Conference papers]  [Ph.D. Thesis]  [Book Chapter]  [Technical Reports]


Journal Papers and preprints:

  1. Convergence of Laplacian Eigenmaps [pdf, bib]
    M. Belkin, P. Niyogi
    preprint, 2008.
    + abstract
    Geometrically based methods for various tasks of data analysis have attracted considerable attention over the last few years. In many of these algorithms, a central role is played by the eigenvectors of the graph Laplacian of a data-derived graph. In this paper, we show that if points are sampled uniformly at random from an unknown submanifold ${\cal M}$ of $\R^N$, then the eigenvectors of a suitably constructed graph Laplacian converge to the eigenfunctions of the Laplace Beltrami operator on ${\cal M}$. This basic result directly establishes the convergence of spectral manifold learning algorithms such as Laplacian Eigenmaps and Diffusion Maps. It also has implications for the understanding of geometric algorithms in data analysis, computational harmonic analysis, geometric random graphs, and graphics.

  2. Manifold Regularization: a Geometric Framework for Learning from Labeled and Unlabeled Examples [pdf, bib]
    M. Belkin, P. Niyogi, V. Sindhwani
    Journal of Machine Learning Research, 7(Nov):2399-2434, 2006.
    + abstract
    We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including support vector machines and regularized least squares can be obtained as special cases. We use properties of reproducing kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graph-based approaches) we obtain a natural out-of-sample extension to novel examples and so are able to handle both transductive and truly semi-supervised settings. We present experimental evidence suggesting that our semi-supervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework.

  3. Towards a Theoretical Foundation for Laplacian-Based Manifold Methods [pdf]
    M. Belkin, P. Niyogi
    Journal of Computer and System Sciences, 2008. Invited, special issue on learning theory (to appear).

  4. Consistency of Spectral Clustering [pdf, bib]
    U. von Luxburg, M. Belkin, O. Bousquet,
    The Annals of Statistics 2008, Vol. 36, No. 2, 555-586.
    + abstract
    Consistency is a key property of statistical algorithms when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

  5. Semi-supervised Learning on Riemannian Manifolds [pdf, bib]
    M. Belkin, P. Niyogi
    Machine Learning, 56, Invited, secial Issue on Clustering, 209-239, 2004.
    + abstract
    We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace- Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification.

  6. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation [pdf, bib]
    M. Belkin, P. Niyogi
    Neural Computation, June 2003; 15 (6):1373-1396.
    + abstract
    One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a lowdimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the highdimensional data. The algorithm provides a computationally efficient approach to non dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.


Refereed and Invited Conference Proceedings:
  1. Data Spectroscopy: Learning Mixture Models using Eigenspaces of Convolution Operators [pdf, bib]
    T. Shi, M. Belkin, B. Yu, ICML 2008.
    + abstract
    In this paper we develop a spectral frame- work for estimating mixture distributions, specifically Gaussian mixture models. In physics, spectroscopy is often used for the identification of substances through their spectrum. Treating a kernel function K(x, y) as “light” and the sampled data as “sub- stance”, the spectrum of their interaction (eigenvalues and eigenvectors of the kernel matrix K) unveils certain aspects of the un- derlying parametric distribution p, such as the parameters of a Gaussian mixture. Our approach extends the intuitions and analyses underlying the existing spectral techniques, such as spectral clustering and Kernel Prin- cipal Components Analysis (KPCA). We construct algorithms to estimate param- eters of Gaussian mixture models, includ- ing the number of mixture components, their means and covariance matrices, which are im- portant in many practical applications. We provide a theoretical framework and show en- couraging experimental results.

  2. Discrete Laplace Operator for Meshed Surfaces [pdf, bib]
    M. Belkin, J. Sun, Y. Wang, SOCG 2008.
    + abstract
    In recent years a considerable amount of work in graphics and geometric optimization used tools based on the Laplace-Beltrami operator on a surface. The applications of the Laplacian include mesh editing, surface smoothing, and shape interpolations among others. However, it has been shown [12, 23, 25] that the popular cotangent approximation schemes do not provide convergent point-wise (or even L^2) estimates, while many applications rely on point-wise estimation. Existence of such schemes has been an open question [12]. In this paper we propose the first algorithm for approximating the Laplace operator of a surface from a mesh with point-wise convergence guarantees applicable to arbitrary meshed surfaces. We show that for a sufficiently fine mesh over an arbitrary surface, our mesh Laplacian is close to the Laplace-Beltrami operator on the surface at every point of the surface. Moreover, the proposed algorithm is simple and easily implementable. Experimental evidence shows that our algorithm exhibits convergence empirically and outperforms cotangent-based methods in providing accurate approximation of the Laplace operator for various meshes.

  3. The value of labeled and unlabeled examples when the model is imperfect [pdf, bib]
    K. Sinha, M. Belkin, NIPS 2007.
    + abstract
    Semi-supervised learning, i.e. learning from both labeled and unlabeled data has received signi.cant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unlabeled data remains somewhat limited. The simplest and the best understood situation is when the data is described by an identi.able mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in [11, 5]. One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identifiable components. There have been recent efforts to analyze the non-parametric situation, for example, .cluster. and manifold assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in [11, 5] has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identi.able mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model.

  4. Convergence of Laplacian Eigenmaps
    M. Belkin, P. Niyogi, NIPS 2006.

  5. On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts [pdf, bib]
    H. Narayanan, M. Belkin, P. Niyogi, NIPS 2006.
    + abstract
    One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary.

  6. Heat Flow and a Faster Algorithm to Compute the Surface Area of a Convex Body [pdf, bib]
    M. Belkin, H. Narayanan, P. Niyogi, FOCS 2006.
    + abstract
    We draw on the observation that the amount of heat diffusing outside of a heated body in a short period of time is proportional to its surface area, to design a simple algorithm for approximating the surface area of a convex body given by a membership oracle. Our method has a complexity of O*(n^4), where n is the dimension, compared to O*(n^8.5) for the previous best algorithm. We show that our complexity cannot be improved given the current state-of-the-art in volume estimation.

  7. Maximum Margin Semi-Supervised Learning for Structured Variables [pdf, bib]
    Y. Altun, D. McAllester, M. Belkin, NIPS 2005.
    + abstract
    Many real-world classification problems involve the prediction of multiple inter-dependent variables forming some structural dependency. Recent progress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables. Unlike transductive algorithms, our formulation naturally extends to new test points.

  8. Beyond the Point Cloud: from Transductive to Semi-supervised Learning [pdf, bib]
    V. Sindhwani, P. Niyogi, M. Belkin, ICML 2005.
    + abstract
    Due to its occurrence in engineering domains and implications for natural learning, the problem of utilizing unlabeled data is attracting increasing attention in machine learning. A large body of recent literature has focussed on the transductive setting where labels of unlabeled examples are estimated by learning a function defined only over the point cloud data. In a truly semi-supervised setting however, a learning machine has access to labeled and unlabeled examples and must make predictions on data points never encountered before. In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. We construct a family of data-dependent norms on Reproducing Kernel Hilbert Spaces (RKHS). These norms allow us to warp the structure of the RKHS to reflect the underlying geometry of the data. We derive explicit formulas for the corresponding new kernels. Our approach demonstrates state of the art performance on a variety of classification tasks.

  9. A Co-Regularization Approach to Semi-supervised Learning with Multiple Views [pdf]
    V. Sindhwani, P. Niyogi, M. Belkin,
    ICML Workshop on Learning with Multiple Views, 2005.
    + abstract
    The Co-Training algorithm uses unlabeled examples in multiple views to bootstrap classifiers in each view, typically in a greedy manner, and operating under assumptions of view-independence and compatibility. In this paper, we propose a Co-Regularization framework where classifiers are learnt in each view through forms of multi-view regularization. We propose algorithms within this framework that are based on optimizing measures of agreement and smoothness over la- beled and unlabeled examples. These algorithms naturally extend standard regularization methods like Support Vector Machines (SVM) and Regularized Least squares (RLS) for multi-view semi-supervised learning, and inherit their benefits and applicability to high-dimensional classification problems. An empirical investigation is presented that confirms the promise of this approach.

  10. Linear Manifold Regularization for Large Scale Semi-supervised Learning
    V. Sindhwani, P. Niyogi, M. Belkin, S. Keerthi
    ICML Workshop on Learning with Partially Classified Training Data, 2005

  11. Towards a Theoretical Foundation for Laplacian-Based Manifold Methods [pdf, bib]
    M. Belkin, P. Niyogi, COLT 2005
    + abstract
    The Co-Training algorithm uses unlabeled examples in multiple views to bootstrap classifiers in each view, typically in a greedy manner, and operating under assumptions of view-independence and compatibility. In this paper, we propose a Co-Regularization framework where classifiers are learnt in each view through forms of multi-view regularization. We propose algorithms within this framework that are based on optimizing measures of agreement and smoothness over la- beled and unlabeled examples. These algorithms naturally extend standard regularization methods like Support Vector Machines (SVM) and Regularized Least squares (RLS) for multi-view semi-supervised learning, and inherit their benefits and applicability to high-dimensional classification problems. An empirical investigation is presented that confirms the promise of this approach.

  12. On Manifold Regularization [pdf]
    M. Belkin, P. Niyogi, V. Sindhwani, AISTATS 2005

  13. Limits of Spectral Clustering [pdf]    Outstanding Student Paper Award
    U. von Luxburg, O. Bousquet, M. Belkin, NIPS 2004.

  14. Regularization and Semi-supervised Learning on Large Graphs [pdf, bib]
    M. Belkin, I. Matveeva, P. Niyogi, COLT 2004.

  15. On the Convergence of Spectral Clustering on Random Samples: the Normalized Case [pdf]
    U. von Luxburg, O. Bousquet, M. Belkin, COLT 2004.

  16. Tikhonov Regularization and Semi-supervised Learning on Large Graphs
    M. Belkin, I. Matveeva, P. Niyogi
    ICASSP 2004, Special Session: Manifolds and Geometry in Signal Processing (Invited paper).

  17. Using Manifold Structure for Partially Labeled Classification  [pdf]
    M. Belkin, P. Niyogi, NIPS 2002

  18. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering  [pdf]
    M. Belkin, P. Niyogi, NIPS 2001.

  19. Using Eigenvectors of the Bigram Graph to Infer Morpheme Identity   [pdf]
    M. Belkin, J. Goldsmith
    Morphological and Phonological Learning: Proceedings of the 6th Workshop
    of the ACL Special Interest Group in Computational Phonology (SIGPHON), Philadelphia, July 2002, pp. 41-47.


Ph.D. Thesis:
  • Problems of Learning on Manifolds   [pdf, bib]
    Ph.D. Dissertation, University of Chicago, Dept. of Mathematics, 2003.
    Advisor: Partha Niyogi.


Book chapter:
  • The Geometric Basis of Semi-supervised Learning   [pdf]
    V. Sindhwani, M. Belkin, P. Niyogi.
    Book chapter in "Semi-supervised Learning", Chapelle, O., B. Schölkopf and A. Zien, Eds., MIT Press, 2006.


Technical Reports:

  • Consistency of Spectral Clustering [pdf]
    U. von Luxburg, M. Belkin, O. Bousquet
    Max Planck Insitute for Biological Cybernetics Technical Report TR-134, 2004. (The Annals of Statistics, to appear)
    + abstract
    Consistency is a key property of statistical algorithms when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

  • Manifold Regularization: a Geometric Framework for Learning from Examples
    M. Belkin, P. Niyogi, V. Sindhwani
    University of Chicago CS Technical Report TR-2004-06, 2004.

  • Regression and Regularization on Large Graphs [link]
    M. Belkin, I. Matveeva, P. Niyogi
    The University of Chicago CS Technical Report TR-2003-11.

  • Semi-supervised learning on manifolds [link]
    M. Belkin, P. Niyogi
    The University of Chicago CS Technical Report TR-2002-12.

  • Laplacian Eigenmaps for Dimensionality Reduction and Data Representation [link]
    M. Belkin, P. Niyogi
    The University of Chicago CS Technical Report TR-2002-01.