Getting started with
UIMA
You'll need to download it:
http://www.alphaworks.ibm.com/tech/uima/download
IBM is a pain, they make you register to download,
but often one can just use
http://www.bugmenot.com/ to
get yourself a login and password. Or, I've downloaded
the files available for you and you can find them
at:
/usr/dan/users10/smucker/dea/uima/uima-downloads
Once installed, or within the SDK zip, you'll find
the
UIMA documentation. You'll want to start with
the UIMA_SDK_Users_Guide_Reference.pdf.
IBM really pushes the use of the Eclipse IDE:
http://www.eclipse.org/
I did my coding from within this tool, but
others avoid it completely and use traditional
editors. IBM's selling point is that in chapter
3 of the SDK guide, they explain how to setup
Eclipse to provide tools for easier editing of
configuration files that
UIMA uses.
Chapter 4 gets you going on building your first
annotator. The end goal of the
UIMA side of this
exercise will the the creation of a CPE (Collection
Processing Engine). That's in chapter 5 - think
readers and consumers.
To make your life easier, I suggest modifying
IBM's examples, which they use in the guide and
for which they supply code.
Gotchas:
As usual with Java stuff, it helps to put your
developed classes in your CLASSPATH. The packaged
UIMA scripts, like runCPE.bat, don't put the
classpath environment variable into the classpath
and you will need to modify the script if you want
it to have it.
Indri Notes
Download latest version:
http://www.lemurproject.org/
Trevor's page is very useful:
http://ciir.cs.umass.edu/~strohman/indri/
To understand extents and contexts, you'll need to
read Don's pages:
http://ciir.cs.umass.edu/~metzler/presentations/uiuc-indri.pdf
http://ciir.cs.umass.edu/~metzler/indriquerylang.html
http://ciir.cs.umass.edu/~metzler/indriretmodel.html
For those interested in producing offset annotations,
it appears that Indri 2.1 supports them:
http://www.lemurproject.org/lemur/offsetannotations.html
My marked up documents can be found in:
/usr/dan/users10/smucker/dea/uima/toy-collection
I processed the ft91.dat file from trec_vol_4.
Sentences are marked with the tag: ciir.uima.SentenceAnnotation
--
MarkSmucker - 17 Oct 2005
to top