Lemmatisers go for a more fine-grained analysis and transform words to their canonical dictionary form. This means that plurals are changed to singular, and verbs to infinitives (including irregular verbs). There are many lemmatisers, the Gate lemmatiser, was written by John Carroll and is is maintained by the University of Sheffield.
You will find a list of standard English stopwords as an appendix to [VR79]. For more information on index term selection, see [VR79,SM83]. [SJ72] describes an experiment to determine which types of terms are best at discriminating between relevant and irrelevant documents.
The comparison of term frequency within a document to the inverse term frequency within a database of documents can be fooled, if, e.g., terms are artificially made frequent. Tony Mullen wants his automatic poetry dispenser to be found by lots of searches that specify "poems" or "poetry". To see how he does it, be sure to select "View" -> "Page Source".