Skip to topic | Skip to bottom
Home
Main
Main.KwikLook2r1.2 - 30 Jun 2003 - 20:12 - AlvaroBolivartopic end

Start of topic | Skip to actions
This paper presents an approach to NED that utilizes a combination of evidence derived from two distinct representations of a document’s content. While one of the representations is the usual free text vector, the other makes use of lexical chains (created using WordNet) to obtain the most prevalent topics discussed in the document - again as a vector of terms. An example for the latter is {car, truck, engine, vehicle}. Notice that this method automatically disambiguates terms. The two vectors are combined in a linear fashion, and the usual cluster-document similarity-threshold approach is followed.

This ‘data fusion’ model is compared with two models, one that utilizes only the free text vector, and another that utilizes only the prevalent topic vector. The authors deduced that a marginal increase in effectiveness can be achieved when lexical chain representations are used in conjunction with the free text representation, i.e. the data fusion model was marginally better. By varying the weights assigned to the constituent models in the data fusion model, optimal performance was obtained. It was found that treating free text representation as weaker evidence gave better performance. However, completely disregarding free text representation actually hurt performance. This was attributed to the inability of WordNet to correlate the relationship between proper nouns and semantically related concepts. For example, {Bill Clinton, US President}

-- GiridharKumaran - 26 Mar 2003
to top


You are here: Main > TDTProject > KwikLook2

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback