Yin Zhao, George Karypis, Evaluation of Hierarchical Clustering
Algorithms for Document Datasets, Technical Report #02-022, 2002
DESCRIPTION:
This paper explores the off-line hierarchical clustering problem. It
investigates two appraches: the partitional (top down) and the agglomerative
(bottom up). An intermediate approach called constraint agglomerative
clustering is also introduced and regarded as the most effective according to the evaluation. The approach constrains the space over which
agglomeration decisions are made by having a partial partion of the data
first. Multiple criterion functions were used for both appraches, all of
them based on the Vector Space Model with TFIDF weighting.
CRITIQUE:
Although the idea is to create hierarchies, the evaluation only used
collections classified into flat hierarchies. The evaluation measure
takes the best node in the hierarchy according with a F measure, summing over
the total number of classes.
COMMENTS:
Not very useful for TDT. TDT is an on-line task.
to top