HARD, High Accuracy Retrieval
from Documents
TREC 2003 clarification form guidelines
This document describes guidelines for creating, submitting, and getting
responses to clarification forms.
General rules
In order to make this a scalable process, the following rules must
be honored. Failure to do so means that your CF will either be
misread or ignored:
- The CF must display correctly on Netscape V4.78 running
on Solaris 2.5.1
- The CF cannot be larger than can be displayed on a 16-inch
monitor (an earlier draft indicated incorrectly that a 17-inch monitor
was the minimum)
- The screen real estate you have available is 1152 x 900
- The CF must be an HTML Web page. No Javascript, no
Java, no flash, no anything but HTML.
- The page may not refer to external
images: it must be self-contained
- The following types of data entry will be
permitted (others are possible, but check in advance):
- text boxes
- radio and check buttons
- drop-down menu selections
The assessor will spend no more than three (3) minutes filling
out the form for a particular topic, meaning up to 150 minutes per
site.
Some crude forms that show what this might look like and what
the returned results would be are available
here . Be sure to look at the source to see some of the hidden
fields.
NOTE: Information gathered in response to the clarification
form will be made public once the track is complete. For each
topic, the following information will be collated across all forms:
the site originating the form, the questions/options/etc asked, and the
answers.
Fields in clarification forms
Your form must include the following items:
- <form action="http://dev.ldc.upenn.edu/cgi-bin/Projects/HARD/cf_form.cgi"
method="post">
This indicates the script where the output will be generated. You
are welcome to use this cgi URL during development of your form, since all
it does is output the selected information.
- <input type="hidden" name="site" value="XXXXn">
Here, "XXXX" is a 4-letter code designating your site and "n" is a run
number. The site codes are below. The run numbers should reflect
your priority order. That is, XXXX1 will be processed then XXXX2 and
so on. If you only have one set of forms, please use 1. For
example, the first submission from UMass would be UMAS1.
- <input type="hidden" name="topicid" value="000">
Indicates the topic number. It should be 3-digit code with zeros
padding as needed. So 001 rather than 01 or 1.
- <input type="submit" name="send" value="submit">
This is the submit button that should appear somewhere on your page.
In addition, please include somewhere on the page the topic number (e.g.,
"001") and the title of the topic. The purpose of including this is
to provide a sanity check that the annotators are, indeed, answering the
correct questions.
Note that the annotators will have the topic description, narrative, and
their metadata values available to them for reference. You do not
need to put any of that information on the clarification form (except
the title and topic number, as just mentioned).
Put the forms in separate files, on per topic, with the filename XXXXn_000.html
where, again XXXX is your site code, n is the run number,
and 000 is the topic number. Here are example clarification
forms for training topics 002 and 008 .
Submitting clarification forms
Two clarification forms from each site will be filled out. If participation
and time permits, the LDC will fill out additional forms from each site.
The naming convention on forms makes it clear the order in which you
want the forms to be filled out (e.g., XXXX1 then XXXX2 then--if time--XXXX3
and so on).
Create a tar file that contains all of the forms for a run and optionally
gzip the file. The file should be named XXXXn.tar where XXXX
is your site code and n is the run number. The file should contain
exactly 50 files named XXXXn_000.tar. Here is an example file that contains only two forms.
Submit your forms to ftp://ftp.ldc.upenn.edu/pub/ldc/csr_incoming
. You will not see any files listed, but you will have permission to upload
a file from your local machine to this ftp directory.
- With Microsoft's Internet Explorer, the process involves clicking
with the right mouse button on the chosen file in your "desktop" or Windows
Explorer, selecting the "Copy" menu option, then going to the browser and
hitting the "Paste" button
- With Netscape, I believe there is an "Upload" option in the "File"
menu.
If you cannot upload the file with a browser, you can use old-fashioned
ftp:
- ftp ftp.ldc.upenn.edu
- username "anonymous", password of your email address
- cd pub/ldc/csr_incoming
- send your_file
- bye
NOTE: After you have uploaded the file, you will not be able to see
that it is there. The incoming directory is a write-only directory that
does not allow you to read its contents. Do not be surprised that it
is not readable.
When done, you must send an email message to Meghan Glenn (mlglenn@ldc.upenn.edu)
and Stephanie Strassel (strassel@ldc.upenn.edu) so that LDC knows that the
transfer has happened.
If you have any problems (e.g., because of firewall restrictions on your
site), please check with your local networking support people, if possible,
and contact the LDC only if you unable to resolve the problem locally.
Getting results back
You'll get back a tar file that looks like this
. The archive contains files named as XXXXn_topicid.000
where XXXX is the ID for the site, n is the run number,
topicid is the three-digit topic number, and seq is a suffix intended
to ensure that doubly-filled forms do not lose information. The
seq value will almost always be 001, though the example tar file includes
a repeated value to illustrate.
Site codes (4 letters)
CA SU San Marcos (Rocio Guillen): CASU
Clairvoyance (David Evans): CLAI
Glasgow: GLAS
Illinois Urbana-Champaign (ChengXiang Zhai): ILUC
IIT Bombay (Ganesh Ramakrishnan): IITB
Microsoft Research Cambridge (Steve Robertson): MRCA
Queens College SUNY (KL Kwok): QCSU
Rutgers (Nick Belkin): RUTG
Open Systems and Chinese Acad of Sciences (Zeng Wu): OPEN
Sabir (Chris Buckley): SABI
Tsinghua Univ, Computer Science IR Group (ShaoPingMa): TUCS
Tsinghua Univ, Knowledge Engineering Group (Bin Wang): TUKE
Univ Buffalo, CEDAR (Rohini Srihari): UBUF
Univ of Helsinki (Antoine Doucet): UHEL
Univ Maryland (Daqing He): UMAR
UMass Amherst (James Allan): UMAS
UNC Chapel Hill/Arctic: UNCC
Univ Waterloo (Olga Vechtomova):UWAT