next up previous
Next: Slide 17-18 Up: Notes on Information Retrieval Previous: Slides 10-13

Slides 14-16

The vector space model is the most widely used IR model. There is little theoretical evidence to justify its use, but it is fast, efficient and it does work. Note that the similarity metric used in the example here is the dot product. In practice, more complex formulae, such as the cosine distance, are employed.

The vector space model assumes terms in a document are independent of one another. We know that this is not true, for example, the terms ``water'' and ``pistol'' in ``water pistol'' have a dependency relation, but in practice making this assumptions simplifies things greatly for the engineer.

Latent Semantic Indexing (LSI) was developed and patented by the Bellcore group comprising Susan Dumais et al. A description of it can be found in [DDL+90]. The most successful probabilistic retrieval system is the University of Massachusetts INQUERY system [CCH92], which uses a Bayesian belief network to determine relevance probability. Mercure [BSD97] uses a neural net approach to match queries with documents.



Nerbonne J.
1999-09-20