INQUERY PROJECT Environment

PROJECT environment variables

  - /home/irdata/ciir-vars
    A shell script used to setup the INQUERY working environment.  Besides
    setting various environment variables described below, it also adds to the
    users path additional directories for group wide tools and program access.
    The settings will depend on a defined INQUERY $VERSION value, or will be
    set based on a default $VERSION if one has not already been provided.

  - $PROJECT
    This is the root INQUERY project directory. 

  - $VERSION 
    The INQUERY version setting.  Versions are of the form n.n[.n], e.g. 3.1
    or 3.2, and correspond to a working version of INQUERY.
    One may change versions by using the switchv alias.

       switchv 3.2

    This alias will not only reset your $VERSION value, but also update your
    PATH and other environment settings to be consistent.  Take care when
    compiling or using INQUERY software that one is using the source or
    executables consistent with this variables' value.  Pay attention to
    version settings during compilation to ensure correct source file
    inclusions.

  - $INQ_ARCH 
    The architecture type of the machine being used.  This value is required
    during compilations to determine correct byte ordering.  Compilations
    automatically use this value as an input macro definition.  $INQ_ARCH
    values currently include solaris (Sun), and alpha (DEC).

  - $COLLECTIONS
    The IR Lab posseses many different document collections.  A collection is
    one or more files containing a sequence of documents.  The format of these
    collections will vary.  This environment variable is a pointer to the
    directory containing various source files for the document collections.
    There may or may not be standard query and relevance judgement files
    associated with these collections.  Collections such as TIPSTER, CACM,
    WEST, INSPEC and NPL do have such additional associated files.  The
    $COLLECTIONS variable is independent of $VERSION settings.

  - $INQDATA 
    INQUERY database built from various collection files may be found by using
    the directory path associated with the $INQDATA environment variable.
    These databases are usually of group wide significance, and are thus made
    available to the group by "publicizing" there presence in this directory.
    Certain medical collections (for example) residing here may still be
    limited to certain group memberships.  This variable incorporates the
    $PROJECT and $VERSION variables.  One may use the list_cols program to
    determine the collection files that comprise an INQDATA (or any INQUERY)
    database.  This variable will determine what INQUERY databases may be
    accessed when inquery or xinquery retrieval interfaces are used without
    specifying a collection to open.

  - $DOCSTOPS
    The location of the default stopword list.  A stopword list is a list of
    words deemed to have no retrieval value, and as such, are dropped from
    query or database building processes.  The $DOCSTOPS value incorporates
    current $PROJECT and $VERSION values and is currently equivalent to
    $INQDATA.  The default stopword file is named default.stp.  Reference to
    this file may be made as $DOCSTOPS/default.stp.

  - $STEM_DIR
    The location of stemming support files.  This variable includes both
    $PROJECT and $VERSION values and is currently equivalent to $DOCSTOPS.
    This directory includes the exception lists and dictionary suppliment
    files for the kstem stemmer, stopword and stemmer support files for
    Spanish, Japanese and Chinese languages.

  - $INQ_CITY_FILE
    The name of a file of US cities used by a location recognizer.  

  - $INQ_HELP
    Directory location of various INQUERY program help (hlp) files, used by
    the Xinquery X interface.  These files live in the help directory of a
    $PROJECT/$VERSION INQUERY project directory hierarchy.

Version Directory Structure

  - doc, src, h, lib, bin, utils, build
  - Object vs. source release directory structure
  - switchv macro for changing versions

Using Gmake

  - Gmake Overview
     
    The inquery gmake script assumes you're in the correct directory for the
    architecture you're running on!  If you're running on an Alpha, cd to your
    alpha build directory (usually "build" or "alpha-build") if you have more
    than one architecture build directory, before compiling using gmake.  If
    you fail to do this, you may end up with mixed object file types and
    sometimes cryptic warnings about why something won't compile.

  - Gmake.rules

  - Where does gmake look to find files?

  - The Gmake -I switch

  - Setting gmake compilation command in emacs

  - Setting compilation VERSION without changing environment

    One may compile a version different from the current $VERSION value by
    adding VERSION=n.n[.n] on the gmake command line.  This will pass the
    specified version to the compile, without having to change environment
    settings. 

  - Example Use

      cd $work_dir_31

      # change to another work version
      switchv 3.1             # ($VERSION setting can affect gmake!)

      # update the INQUERY source
      cvs update 

      # build alpha versions of programs
      cd alpha-build

      # clean old libraries and objects
      rm *.a *.o

      # make everything, or perhaps just one thing
      gmake 

      # or just make xinquery
      gmake xinquery

CVS

CVS is the file version control mechanism used for saving and tracking INQUERY source files. CVS allows a user to checkout a local copy of a source file or set of files (e.g. all INQUERY sources) to be edited as needed. These copies may then be updated with possible changes others have independently added, or committed with your changes included. One should always update their cvs directory sources before doing a commit. One may also create branches of a CVS version tree to save customizations to a version that are not to be merged back with a main branch, yet still need to be tracked using version controls.

To create a inquery tree:
    cvs checkout inquery

  This will checkout the current mainline INQUERY sources into a new
  sub-directory of the current directory, naming it "inquery".  For a branch,
  say 3.1 version, use "cvs checkout -r V3_1_PATCH inquery".

    cd inquery

    make-build-dirs

  This creates the test-build directories for compiling the sources on various
  IR Lab platforms.

To create a CVS branch:
  cvs tag -b "branch-name"
  cvs update -r "branch-name"

To update a branch from the mainline:
  cvs-update-branch

  After update, test, then commit files to branch.

To update your CVS file structure with changes committed by others to the same
CVS branch:

  cvs update

  This will update all the CVS files in the current directory.  Thus, if you
  are in the doc CVS directory, "cvs update" will update only doc files.  If
  one is in the inquery (root) CVS directory, an update will act upon all CVS
  sub-directories within.

To update the mainline from the branch:
  cvs-merge-patch TAG_O_BRANCH
                another michelle script
                After merge, test, then commit to mainline CVS

To commit changes to a version:
  cvs commit

  This will commit changes made to all CVS files in the current directory.
  Thus, if only src files were changed, one may move to the src directory and
  do a commit.  One may specify the changed file only as well, e.g. "cvs
  commit query.y.y".  This will commit only the named file.

To add a file to a branch:
  cvs add filename
  cvs commit filename
  cvs tag -b "branch-name" filename
  cvs update -r "branch-name" filename

Review CVS logs:

  o Display the CVS log for a specified file:

    cvs log filename

  o Prints out the cvs log messages for a given file within a specified branch:

    cvs-branch-log [-h] [-d ]  

    where 
      -h         Help       -- this message
      -d   Since date -- print all messages up to the first one with
                               this date if it exists  (ex. 1995/06/27)

  o To check the logs for a regular expression:

    cvs-query-logs V2_1_PATCH "November 9, 1994" | less

  o Even better, did someone introduce a bug into your perfectly conceived
    and executed code?  Gotta find the devil who did it and run over their
    foot in the parking lot!  Here's how, courtesy of a Matt King script
    (Careful!  He's run over many a foot!).

    Use cvs-find-revision.pl to find a CVS revision number containing a
    specified regular expression.  Then use cvs-get-revision.pl to find 
    the dog that did the deed!
    
    E.g.

      When and who checked the following line of merge_btl.c into the CVS
      mainline inquery directory?

        Int_t          num_of_elm;

      # cvs-find-revision.pl "Int_t\s+num_of_elm" src/merge_btl.c
      Found one!  Try:
      cvs diff -r 2.86 -r 2.85 src/merge_btl.c

      # cvs-get-revision.pl src/merge_btl.c 2.86
      Looking for the log of src/merge_btl.c, revision 2.80

      revision 2.80
      date: 1996/04/24 18:05:21;  author: smith;  state: Exp;  lines: +20 -7
      patchmerge from V3_0_PATCH into the mainline.
      ...

    cvs-find-revision.pl can take regular expressions in the search string
    (just don't forget to use quotes).  They're perl regular expressions, 
    so see the perl man page for more information.

    "cvs-find-revision.pl -h" will display usage info.

Spell Checking

SPELLING:
        use "look" and any reasonable approximation of the spelling you
        are looking for!  way cool.

Naive User

Purify and the Galahad User

Purify is a memory checking tools available on Sun machines to help detect memory leaks. It is now considered a necessary step in testing new code before it is made a part of an INQUERY release.
To use PURIFY:
  - Become the "galahad" user on one of the Sun machines
  - IMPure
  - Check out gmake.rules for PURIFY information.  Generally, uncomment the
    $LINK definition to produce "purify" versions of source files
  - Run your program paying attention to the purify output or logs.  Check 
    out the man page for purify to learn the various error codes for purify
    output.

Coding Standards and Specifications

Testing

To test a build:
  - quick-test-build
      will build the sources on all the CIIR supported platforms.

  - local-test-build
      do a "local test build" (NOT quick-test-build)