Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling

Jerome Bellegarda, Apple Computer

Abstract

Originally formulated in the context of information retrieval, latent semantic analysis exhibits three main characteristics: (i) discrete entities (namely words and documents) are mapped onto a continuous vector space; (ii) this mapping is determined by global correlation patterns; and (iii) dimensionality reduction is an integral part of the process. Because such fairly generic properties may be advantageous in a variety of different contexts, this has sparked interest in a broader interpretation of the underlying paradigm. The outcome is latent semantic mapping, a data-driven framework for modeling global relationships implicit in large volumes of (not necessarily textual) data. The purpose of this talk is to give a general overview of the framework, and then touch on a number of applications (some within NLP, some outside) where it has recently proven beneficial. We conclude with a discussion of the inherent trade-offs associated with the approach, and some perspectives on its potential applicability to other areas of pattern recognition.