lemur
The Lemur Toolkit is designed to facilitate research in language modeling and information retrieval, where IR is broadly interpreted to include such technologies as ad hoc and distributed retrieval, cross-language IR, summarization, filtering, and classification.
Within the CIIR, we have extended lemur in several ways,
- implementation of a true KL measure (Haizheng Zhang and Fernando Diaz)
- lm composition and analysis: a tool for building, mixing, and analyzing groups of language models (Fernando Diaz)
- build weighted mixtures of lms. primitive lms can be built from subsets of documents. mixtures can be composites of these primitives or other mixtures.
- dump a sorted list of components for D(M1||M2) where M1 and M2 are arbitrary lms built from a collection.
- calculate the distance matrix for a set of lms using J-divergences (symmetric KL).
Caveats and Notes
- lemur does not handle relative paths nicely.
- RelFBEval has undocumented parameter resultFormat which behaves the same as TRECResultFormat?
Other items
Lemur FAQ
--
EricGalis - 07 Feb 2003
to top