INQUERY PROJECT Environment
PROJECT environment variables
- /home/irdata/ciir-vars
A shell script used to setup the INQUERY working environment. Besides
setting various environment variables described below, it also adds to the
users path additional directories for group wide tools and program access.
The settings will depend on a defined INQUERY $VERSION value, or will be
set based on a default $VERSION if one has not already been provided.
- $PROJECT
This is the root INQUERY project directory.
- $VERSION
The INQUERY version setting. Versions are of the form n.n[.n], e.g. 3.1
or 3.2, and correspond to a working version of INQUERY.
One may change versions by using the switchv alias.
switchv 3.2
This alias will not only reset your $VERSION value, but also update your
PATH and other environment settings to be consistent. Take care when
compiling or using INQUERY software that one is using the source or
executables consistent with this variables' value. Pay attention to
version settings during compilation to ensure correct source file
inclusions.
- $INQ_ARCH
The architecture type of the machine being used. This value is required
during compilations to determine correct byte ordering. Compilations
automatically use this value as an input macro definition. $INQ_ARCH
values currently include solaris (Sun), and alpha (DEC).
- $COLLECTIONS
The IR Lab posseses many different document collections. A collection is
one or more files containing a sequence of documents. The format of these
collections will vary. This environment variable is a pointer to the
directory containing various source files for the document collections.
There may or may not be standard query and relevance judgement files
associated with these collections. Collections such as TIPSTER, CACM,
WEST, INSPEC and NPL do have such additional associated files. The
$COLLECTIONS variable is independent of $VERSION settings.
- $INQDATA
INQUERY database built from various collection files may be found by using
the directory path associated with the $INQDATA environment variable.
These databases are usually of group wide significance, and are thus made
available to the group by "publicizing" there presence in this directory.
Certain medical collections (for example) residing here may still be
limited to certain group memberships. This variable incorporates the
$PROJECT and $VERSION variables. One may use the list_cols program to
determine the collection files that comprise an INQDATA (or any INQUERY)
database. This variable will determine what INQUERY databases may be
accessed when inquery or xinquery retrieval interfaces are used without
specifying a collection to open.
- $DOCSTOPS
The location of the default stopword list. A stopword list is a list of
words deemed to have no retrieval value, and as such, are dropped from
query or database building processes. The $DOCSTOPS value incorporates
current $PROJECT and $VERSION values and is currently equivalent to
$INQDATA. The default stopword file is named default.stp. Reference to
this file may be made as $DOCSTOPS/default.stp.
- $STEM_DIR
The location of stemming support files. This variable includes both
$PROJECT and $VERSION values and is currently equivalent to $DOCSTOPS.
This directory includes the exception lists and dictionary suppliment
files for the kstem stemmer, stopword and stemmer support files for
Spanish, Japanese and Chinese languages.
- $INQ_CITY_FILE
The name of a file of US cities used by a location recognizer.
- $INQ_HELP
Directory location of various INQUERY program help (hlp) files, used by
the Xinquery X interface. These files live in the help directory of a
$PROJECT/$VERSION INQUERY project directory hierarchy.
Version Directory Structure
- doc, src, h, lib, bin, utils, build
- Object vs. source release directory structure
- switchv macro for changing versions
Using Gmake
- Gmake Overview
The inquery gmake script assumes you're in the correct directory for the
architecture you're running on! If you're running on an Alpha, cd to your
alpha build directory (usually "build" or "alpha-build") if you have more
than one architecture build directory, before compiling using gmake. If
you fail to do this, you may end up with mixed object file types and
sometimes cryptic warnings about why something won't compile.
- Gmake.rules
- Where does gmake look to find files?
- The Gmake -I switch
- Setting gmake compilation command in emacs
- Setting compilation VERSION without changing environment
One may compile a version different from the current $VERSION value by
adding VERSION=n.n[.n] on the gmake command line. This will pass the
specified version to the compile, without having to change environment
settings.
- Example Use
cd $work_dir_31
# change to another work version
switchv 3.1 # ($VERSION setting can affect gmake!)
# update the INQUERY source
cvs update
# build alpha versions of programs
cd alpha-build
# clean old libraries and objects
rm *.a *.o
# make everything, or perhaps just one thing
gmake
# or just make xinquery
gmake xinquery
CVS
CVS is the file version control mechanism used for saving and
tracking INQUERY source files. CVS allows a user to checkout a
local copy of a source file or set of files (e.g. all INQUERY
sources) to be edited as needed. These copies may then be updated
with possible changes others have independently added, or committed
with your changes included. One should always update their cvs
directory sources before doing a commit.
One may also create branches of a CVS version tree to save customizations
to a version that are not to be merged back with a main branch, yet still
need to be tracked using version controls.
To create a inquery tree:
cvs checkout inquery
This will checkout the current mainline INQUERY sources into a new
sub-directory of the current directory, naming it "inquery". For a branch,
say 3.1 version, use "cvs checkout -r V3_1_PATCH inquery".
cd inquery
make-build-dirs
This creates the test-build directories for compiling the sources on various
IR Lab platforms.
To create a CVS branch:
cvs tag -b "branch-name"
cvs update -r "branch-name"
To update a branch from the mainline:
cvs-update-branch
After update, test, then commit files to branch.
To update your CVS file structure with changes committed by others to the same
CVS branch:
cvs update
This will update all the CVS files in the current directory. Thus, if you
are in the doc CVS directory, "cvs update" will update only doc files. If
one is in the inquery (root) CVS directory, an update will act upon all CVS
sub-directories within.
To update the mainline from the branch:
cvs-merge-patch TAG_O_BRANCH
another michelle script
After merge, test, then commit to mainline CVS
To commit changes to a version:
cvs commit
This will commit changes made to all CVS files in the current directory.
Thus, if only src files were changed, one may move to the src directory and
do a commit. One may specify the changed file only as well, e.g. "cvs
commit query.y.y". This will commit only the named file.
To add a file to a branch:
cvs add filename
cvs commit filename
cvs tag -b "branch-name" filename
cvs update -r "branch-name" filename
Review CVS logs:
o Display the CVS log for a specified file:
cvs log filename
o Prints out the cvs log messages for a given file within a specified branch:
cvs-branch-log [-h] [-d ]
where
-h Help -- this message
-d Since date -- print all messages up to the first one with
this date if it exists (ex. 1995/06/27)
o To check the logs for a regular expression:
cvs-query-logs V2_1_PATCH "November 9, 1994" | less
o Even better, did someone introduce a bug into your perfectly conceived
and executed code? Gotta find the devil who did it and run over their
foot in the parking lot! Here's how, courtesy of a Matt King script
(Careful! He's run over many a foot!).
Use cvs-find-revision.pl to find a CVS revision number containing a
specified regular expression. Then use cvs-get-revision.pl to find
the dog that did the deed!
E.g.
When and who checked the following line of merge_btl.c into the CVS
mainline inquery directory?
Int_t num_of_elm;
# cvs-find-revision.pl "Int_t\s+num_of_elm" src/merge_btl.c
Found one! Try:
cvs diff -r 2.86 -r 2.85 src/merge_btl.c
# cvs-get-revision.pl src/merge_btl.c 2.86
Looking for the log of src/merge_btl.c, revision 2.80
revision 2.80
date: 1996/04/24 18:05:21; author: smith; state: Exp; lines: +20 -7
patchmerge from V3_0_PATCH into the mainline.
...
cvs-find-revision.pl can take regular expressions in the search string
(just don't forget to use quotes). They're perl regular expressions,
so see the perl man page for more information.
"cvs-find-revision.pl -h" will display usage info.
Spell Checking
SPELLING:
use "look" and any reasonable approximation of the spelling you
are looking for! way cool.
Naive User
Purify and the Galahad User
Purify is a memory checking tools available on Sun machines to help detect
memory leaks. It is now considered a necessary step in testing new code
before it is made a part of an INQUERY release.
To use PURIFY:
- Become the "galahad" user on one of the Sun machines
- IMPure
- Check out gmake.rules for PURIFY information. Generally, uncomment the
$LINK definition to produce "purify" versions of source files
- Run your program paying attention to the purify output or logs. Check
out the man page for purify to learn the various error codes for purify
output.
Coding Standards and Specifications
Testing
To test a build:
- quick-test-build
will build the sources on all the CIIR supported platforms.
- local-test-build
do a "local test build" (NOT quick-test-build)