|
This goal of this project is to build a model which segments paper citations and performs coreference between papers, venues, and authors in a unified model. we build on the work of Wellner et al (http://www.cs.umass.edu/~wellner/....? ) UAI'04
Javadocs
|
> > |
AronCulotta - 19 May 2004:
|
|
Notes:
|
< < |
- Charles skillfully ported and refactored Fuchun's and Ben's segmentation and coreference code from their users directories to projects/seg_plus_coref
|
> > |
- CharlesSutton? skillfully ported and refactored FuchunPeng? 's and BenWellner? 's segmentation and coreference code from their users directories to projects/seg_plus_coref
|
|
- Rewrote SGMLStringOperation to handle tag attributes
|
|
Summary: Venue coref increases cluster precision dramatically; recall, not so much.
|
< < |
| no venue | 91.9 | 98.5 | 95.1 |
| venue | 97.8 | 98.6 | 98.2 |
|
> > |
| no venue | 91.9 | 98.5 | 95.1 |
| venue | 97.8 | 98.6 | 98.2 |
|
|
Next steps:
- run trial using CRF output
- train classifier for venue coref
- co-cluster venues and papers
|
< < |
-- AronCulotta - 19 May 2004
|
> > |
AronCulotta - 21 May 2004:
Obtained baselines for venue coreference (coreference/VenueCoreference.java). Adapted Ben's code to do venue coreference (independent of paper coreference). See coreference/CitationUtils.java. Again, we assume perfect segmentation.
- base: same features as paper coref
- jn-bt-acr: base + one feature for approx journal match and one for booktitle match + acronym feature
- venue-acr: base + one feature for "venue" approx match + acronym
- all: base + jn-bt-acr + venue
| | Pr | Re | F1 |
| base | 98.7 | 52.7 | 68.7 |
| jn-bt-acr | 97.8 | 75.8 | 85.4 |
| venue-acr | 97.5 | 78.4 | 86.9 |
| all | 97.7 | 74.3 | 84.4 |
Still need to run with noisy CRF output. Then implement joint clustering of papers and venues. How to pass clustering info to segmentation??
|