Index

k nearest neighbor classification (kNN), multinomial Naive Bayes vs., 249.57 k nearest neighbor classification (kNN), as nonlinear classification

Properties of Naive Bayes

K-medoids

K-means

kappa statistic

Assessing relevance | References and further reading | References and further reading

kernel

kernel function

kernel trick

key-value pairs

keyword-in-context

Results snippets

kNN classification

k nearest neighbor

Kruskal's algorithm

References and further reading

Kullback-Leibler divergence

Extended language modeling approaches | Exercises | References and further reading

KWIC

see keyword-in-context

label

The text classification problem

labeling

Text classification and Naive

Labeling, defined

Text classification and Naive

language

Finite automata and language

language identification

Tokenization | References and further reading

language model

Finite automata and language

Laplace smoothing

Naive Bayes text classification

Latent Dirichlet Allocation

References and further reading

latent semantic indexing

Latent semantic indexing

LDA

References and further reading

learning algorithm

The text classification problem

learning error

The bias-variance tradeoff

learning method

The text classification problem

lemma

Stemming and lemmatization

lemmatization

Stemming and lemmatization

lemmatizer

Stemming and lemmatization

length-normalization

Dot products

Levenshtein distance

Edit distance

lexicalized subtree

A vector space model

lexicon

An example information retrieval

likelihood

Review of basic probability

likelihood ratio

Finite automata and language

linear classifier

Linear versus nonlinear classifiers | A simple example of

linear problem

Linear versus nonlinear classifiers

linear separability

Linear versus nonlinear classifiers

link farms

References and further reading

link spam

Spam | Link analysis

LLRUN

References and further reading

LM

Using query likelihood language

Logarithmic merging

Dynamic indexing | Dynamic indexing | Dynamic indexing

lossless

Statistical properties of terms

lossy compression

Statistical properties of terms

low-rank approximation

Low-rank approximations

LSA

Latent semantic indexing

LSI as soft clustering

Latent semantic indexing

machine translation

Types of language models | Using query likelihood language | Extended language modeling approaches

machine-learned relevance

Learning weights | A simple example of

Macroaveraging

Evaluation of text classification | Evaluation of text classification | Evaluation of text classification

MAP

Evaluation of ranked retrieval | Probability estimates in theory | Naive Bayes text classification

Map phase

Distributed indexing | Distributed indexing

MapReduce

Distributed indexing | Distributed indexing | Distributed indexing | Distributed indexing | References and further reading

margin

Support vector machines: The

marginal relevance

Critiques and justifications of

marginal statistic

Assessing relevance

Master node

Distributed indexing | Distributed indexing

matrix decomposition

Matrix decompositions

maximization step

Model-based clustering

maximum a posteriori

Probability estimates in theory | Properties of Naive Bayes

maximum a posteriori class

Naive Bayes text classification

maximum likelihood estimate

Probability estimates in theory | Naive Bayes text classification

Maximum likelihood estimate ( MLE )

Naive Bayes text classification

Maximum likelihood estimate (MLE)

Mutual information

maximum likelihood estimation

Estimating the query generation

Mean Average Precision

see MAP

medoid

K-means

memory capacity

The bias-variance tradeoff

memory-based learning

Time complexity and optimality

Mercator

Crawling

Mercer kernel

Nonlinear SVMs

merge

postings: Processing Boolean queries

merge algorithm

Processing Boolean queries

metadata

microaveraging

Evaluation of text classification

minimum spanning tree

References and further reading | Exercises

minimum variance clustering

References and further reading

MLE

see maximum likelihood estimate

ModApte split

Evaluation of text classification | Evaluation of text classification | References and further reading

model complexity

The bias-variance tradeoff | Cluster cardinality in K-means

model-based clustering

Model-based clustering

monotonicity

Hierarchical agglomerative clustering

multiclass classification

Classification with more than

multiclass SVM

References and further reading

multilabel classification

Classification with more than

multimodal class

Rocchio classification

Multinomial Naive Bayes, random variable X / U

Properties of Naive Bayes

multinomial classification

Classification with more than

multinomial distribution

Multinomial distributions over words

Multinomial model

Relation to multinomial unigram | Relation to multinomial unigram | The Bernoulli model | A variant of the

multinomial Naive Bayes

Naive Bayes text classification

Multinomial Naive Bayes, in text classification

Naive Bayes text classification

Multinomial Naive Bayes, in text classification

Relation to multinomial unigram

Multinomial Naive Bayes, optimal classifier

Properties of Naive Bayes

Multinomial Naive Bayes, positional independence assumption

Naive Bayes text classification | Properties of Naive Bayes

Multinomial Naive Bayes, sparseness

Naive Bayes text classification

multinomial NB

see multinomial Naive Bayes

multivalue classification

Classification with more than

multivariate Bernoulli model

The Bernoulli model

mutual information

Mutual information | Evaluation of clustering

Naive Bayes assumption

Deriving a ranking function

named entity tagging

XML retrieval | Features for text

National Institute of Standards and Technology

Standard test collections

natural language processing

navigational queries

User query needs

NDCG

Evaluation of ranked retrieval

nested elements

Challenges in XML retrieval

NEXI

Basic XML concepts

next word index

Combination schemes

Nibble

Variable byte codes | Variable byte codes

NLP

see natural language processing

NMI

Evaluation of clustering

noise document

Linear versus nonlinear classifiers

noise feature

Properties of Naive Bayes | Feature selection

nonlinear classifier

Linear versus nonlinear classifiers

nonlinear problem

Linear versus nonlinear classifiers

normal vector

Rocchio classification

normalized discounted cumulative gain

Evaluation of ranked retrieval

normalized mutual information

Evaluation of clustering

novelty detection

Optimality of HAC

NTCIR

Standard test collections | References and further reading

objective function

Problem statement | K-means

odds

Review of basic probability

odds ratio

Deriving a ranking function

Okapi weighting

Okapi BM25: a non-binary

one-of classification

The text classification problem | Evaluation of text classification | Evaluation of text classification | Classification with more than

optimal classifier

Properties of Naive Bayes | The bias-variance tradeoff

optimal clustering

Optimality of HAC

optimal learning method

The bias-variance tradeoff

ordinal regression

Result ranking by machine

out-links

The web graph

outlier

K-means

overfitting

Feature selection | The bias-variance tradeoff

Oxford English Dictionary

Statistical properties of terms

PageRank

PageRank

paid inclusion

Spam

parameter tuning

Information retrieval system evaluation | References and further reading | References and further reading | References and further reading

parameter tying

Separate feature spaces for

parameter-free compression

Gamma codes

parameterized compression

References and further reading

parametric index

Parametric and zone indexes

parametric search

XML retrieval

Parser

Distributed indexing | Distributed indexing

partition rule

Review of basic probability

partitional clustering

A note on terminology.

passage retrieval

References and further reading

patent databases

XML retrieval

perceptron algorithm

References and further reading | References and further reading

performance

Evaluation of text classification

permuterm index

Permuterm indexes

personalized PageRank

Topic-specific PageRank

phrase index

Biword indexes

phrase queries

Positional postings and phrase | References and further reading

phrase search

The extended Boolean model

pivoted document length normalization

Pivoted normalized document length

Pointwise mutual information

Mutual information | References and further reading | References and further reading

polychotomous

Classification with more than

polytomous classification

Classification with more than

polytope

k nearest neighbor

pooling

Assessing relevance | References and further reading

pornography filtering

Text classification and Naive | Features for text

Porter stemmer

Stemming and lemmatization

positional independence

Properties of Naive Bayes

positional index

Positional indexes

posterior probability

Review of basic probability

posting

An example information retrieval | An example information retrieval | A first take at | Blocked sort-based indexing | Index compression

Postings

compression and: Index compression
in block sort-based indexing: Blocked sort-based indexing

postings list

An example information retrieval

power law

Zipf's law: Modeling the | The web graph

precision

An example information retrieval | Evaluation of unranked retrieval

precision at

Evaluation of ranked retrieval

precision-recall curve

Evaluation of ranked retrieval

prefix-free code

Gamma codes

Preprocessing, effects of

Statistical properties of terms

principal direction divisive partitioning

References and further reading

principal left eigenvector

Markov chains

prior probability

Review of basic probability

Probability Ranking Principle

The 1/0 loss case

probability vector

Markov chains

prototype

Vector space classification

proximity operator

The extended Boolean model

proximity weighting

Query-term proximity

pseudo relevance feedback

Pseudo relevance feedback

pseudocounts

Probability estimates in theory

pull model

References and further reading

purity

Evaluation of clustering

push model

References and further reading

Quadratic Programming

Support vector machines: The

query

An example information retrieval

free text: The extended Boolean model | The extended Boolean model | Term frequency and weighting
simple conjunctive: Processing Boolean queries

query expansion

Query expansion

query likelihood model

Using query likelihood language

query optimization

Processing Boolean queries

query-by-example

Basic XML concepts | Language modeling versus other

R-precision

Evaluation of ranked retrieval | References and further reading

Rand index

Evaluation of clustering

adjusted: References and further reading

random variable

Review of basic probability

random variable $\xvar$

Properties of Naive Bayes

random variable $\wvar$

Properties of Naive Bayes

random variable

Properties of Naive Bayes

Random variables, C

Properties of Naive Bayes

rank

Linear algebra review

Ranked Boolean retrieval

Weighted zone scoring

ranked retrieval

Other types of indexes | References and further reading

model: The extended Boolean model

Ranked retrieval models

described: Other types of indexes

ranking SVM

Result ranking by machine

recall

An example information retrieval | Evaluation of unranked retrieval

Reduce phase

Distributed indexing | Distributed indexing

reduced SVD

Term-document matrices and singular | Low-rank approximations

regression

Result ranking by machine

regular expressions

An example information retrieval | References and further reading

regularization

Soft margin classification

relational database

XML retrieval | Text-centric vs. data-centric XML

relative frequency

Probability estimates in theory

relevance

An example information retrieval | Information retrieval system evaluation

relevance feedback

Relevance feedback and pseudo

residual sum of squares

K-means

results snippets

Putting it all together

retrieval model

Boolean: An example information retrieval

Retrieval Status Value

Deriving a ranking function

retrieval systems

Other types of indexes

Reuters-21578

Standard test collections

Reuters-21578 collection, text classification in

Evaluation of text classification | Evaluation of text classification | Evaluation of text classification | Evaluation of text classification

Reuters-RCV1

Blocked sort-based indexing | Standard test collections

Reuters-RCV1 collection

described: Blocked sort-based indexing | Blocked sort-based indexing | References and further reading
dictionary-as-a-string storage: Dictionary compression | Dictionary as a string

RF

Relevance feedback and pseudo

Robots Exclusion Protocol

Crawler architecture

ROC curve

Evaluation of ranked retrieval

Rocchio algorithm

The Rocchio (1971) algorithm.

Rocchio classification

Rocchio classification

Routing

Text classification and Naive | Text classification and Naive | References and further reading

RSS

K-means

rule of 30

Statistical properties of terms

Rules in text classification

Text classification and Naive | Text classification and Naive

Scatter-Gather

Clustering in information retrieval

schema

Basic XML concepts

schema diversity

Challenges in XML retrieval

schema heterogeneity

Challenges in XML retrieval

search advertising

Advertising as the economic

search engine marketing

Advertising as the economic

Search Engine Optimizers

Spam

search result clustering

Clustering in information retrieval

search results

Clustering in information retrieval

security

Other types of indexes | Other types of indexes

seed

K-means

seek time

Hardware basics

Segment file

Distributed indexing | Distributed indexing

semi-supervised learning

Choosing what kind of

semistructured query

XML retrieval

semistructured retrieval

Boolean retrieval | XML retrieval

sensitivity

Evaluation of ranked retrieval

sentiment detection

Text classification and Naive | Text classification and Naive

Sequence model

Properties of Naive Bayes | Properties of Naive Bayes

shingling

Near-duplicates and shingling

single-label classification

Classification with more than

single-link clustering

Single-link and complete-link clustering

single-linkage clustering

see single-link clustering

single-pass in-memory indexing

Single-pass in-memory indexing

Single-pass in-memory indexing (SPIMI)

Blocked sort-based indexing | Single-pass in-memory indexing | References and further reading

singleton

Hierarchical agglomerative clustering

singleton cluster

K-means

singular value decomposition

Term-document matrices and singular

skip list

Faster postings list intersection | References and further reading

slack variables

Soft margin classification

SMART

The Rocchio (1971) algorithm.

smoothing

Maximum tf normalization | Probability estimates in theory

add $\frac{1}{2}$: Probability estimates in theory
add $\alpha$: Probability estimates in theory
add $\frac{1}{2}$: Probability estimates in theory
add $\frac{1}{2}$: Probabilistic approaches to relevance
add $\frac{1}{2}$: Probabilistic approaches to relevance
add $\frac{1}{2}$: Okapi BM25: a non-binary
add $\frac{1}{2}$: Relation to multinomial unigram
Bayesian prior: Probability estimates in theory | Probabilistic approaches to relevance | Estimating the query generation
linear interpolation: Estimating the query generation

snippet

Results snippets

soft assignment

Flat clustering

soft clustering

Flat clustering | A note on terminology. | Hierarchical clustering

Sort-based multiway merge

References and further reading

sorting

in index construction: A first take at

soundex

Phonetic correction

spam

Features for text | Spam

email: Text classification and Naive
web: Text classification and Naive

sparseness

Types of language models | Estimating the query generation | Naive Bayes text classification

specificity

Evaluation of ranked retrieval

spectral clustering

References and further reading

speech recognition

Types of language models

spelling correction

Putting it all together | Types of language models | Multinomial distributions over words

spider

Overview

spider traps

Index size and estimation

SPIMI

Single-pass in-memory indexing

splits

Distributed indexing

sponsored search

Advertising as the economic

Standing query

Text classification and Naive | Text classification and Naive

static quality scores

Static quality scores and

static web pages

Web characteristics

statistical significance

$\chi ^2$ Feature selectionChi2 Feature

Statistical text classification

Text classification and Naive | Text classification and Naive

steady-state

Definition: | The PageRank computation

stemming

Stemming and lemmatization | References and further reading

stochastic matrix

Markov chains

stop list

Dropping common terms: stop

stop words: Term frequency and weighting

stop words

Tokenization | Dropping common terms: stop | Combination schemes | Term frequency and weighting | Maximum tf normalization

structural SVM

Result ranking by machine

structural SVMs

Multiclass SVMs

structural term

A vector space model

structured document retrieval principle

Challenges in XML retrieval

structured query

XML retrieval

structured retrieval

XML retrieval | XML retrieval

summarization

References and further reading

summary

dynamic: Results snippets
static: Results snippets

Supervised learning

The text classification problem | The text classification problem

support vector

Support vector machines: The

support vector machine

Support vector machines and | References and further reading

multiclass: Multiclass SVMs

Support vector machines ( SVMs ) , effectiveness

Evaluation of text classification

SVD

References and further reading | References and further reading | Term-document matrices and singular

SVM

see support vector machine

symmetric diagonal decomposition

Matrix decompositions | Term-document matrices and singular | Term-document matrices and singular

synonymy

Relevance feedback and query

teleport

PageRank

term

An example information retrieval | The term vocabulary and | Tokenization

term frequency

The extended Boolean model | Term frequency and weighting

term normalization

Normalization (equivalence classing of

term partitioning

Distributing indexes

term-at-a-time

Computing vector scores | Impact ordering

term-document matrix

Dot products

term-partitioned index

Distributed indexing

termID

Blocked sort-based indexing

Test data

The text classification problem | The text classification problem

test set

The text classification problem | Evaluation of text classification

text categorization

Text classification and Naive

text classification

Text classification and Naive

Text classification, defined

Text classification and Naive

Text classification, feature selection

Feature selection | Comparison of feature selection

Text classification, overview

The text classification problem | The text classification problem

Text classification, vertical search engines

Text classification and Naive

text summarization

Results snippets

text-centric XML

Text-centric vs. data-centric XML

tf

see term frequency

tf-idf

Tf-idf weighting

tiered indexes

Tiered indexes

token

The term vocabulary and | Tokenization

token normalization

Normalization (equivalence classing of

top docs

References and further reading

top-down clustering

Divisive clustering

topic

Standard test collections | Text classification and Naive

in XML retrieval: Evaluation of XML retrieval

topic classification

Text classification and Naive

topic spotting

Text classification and Naive

topic-specific PageRank

Topic-specific PageRank

topical relevance

Evaluation of XML retrieval

training set

The text classification problem | Evaluation of text classification

transactional query

User query needs

transductive SVMs

Choosing what kind of

translation model

Extended language modeling approaches

TREC

Standard test collections | References and further reading

trec_eval

References and further reading

truecasing

Capitalization/case-folding. | References and further reading

truncated SVD

Term-document matrices and singular | Low-rank approximations | Latent semantic indexing

two-class classifier

Evaluation of text classification

type

Tokenization

unary code

Gamma codes

unigram language model

Types of language models

union-find algorithm

Optimality of HAC | Near-duplicates and shingling

universal code

Gamma codes

unsupervised learning

Flat clustering

URL

Background and history

URL normalization

Crawler architecture

Utility measure

References and further reading | References and further reading

Variable byte encoding

Postings file compression | Variable byte codes | Variable byte codes

variance

The bias-variance tradeoff

vector space model

The vector space model

vertical search engine

Text classification and Naive

vocabulary

An example information retrieval

Voronoi tessellation

k nearest neighbor

Ward's method

References and further reading

web crawler

Overview

weight vector

Support vector machines: The

weighted zone scoring

Parametric and zone indexes

Wikipedia

Evaluation of XML retrieval

wildcard query

An example information retrieval | Dictionaries and tolerant retrieval | Wildcard queries

within-point scatter

Exercises

word segmentation

Tokenization

XML

Obtaining the character sequence | XML retrieval

XML attribute

XML DOM

XML DTD

XML element

XML fragment

References and further reading

XML Schema

Basic XML concepts

XML tag

Basic XML concepts

XPath

Basic XML concepts

Zipf's law

Zipf's law: Modeling the

zone

Parametric and zone indexes | Improving classifier performance | Document zones in text | Connections to text summarization.

zone index

Parametric and zone indexes

zone search

XML retrieval

© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009-04-07