Galago | ||||||||
| Changed: | ||||||||
| < < |
(UNDER CONSTRUCTION) | |||||||
| > > |
(UNDER CONSTRUCTION) | |||||||
| Created by Trevor Strohman | ||||||||
| Line: 28 to 28 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Added: | ||||||||
| > > |
||||||||
| Line: 120 to 121 | ||||||||
| ||||||||
| Changed: | ||||||||
| < < |
The source code for each of these is maintained in a separate repository; however, there is an additional repository which stores the most recent Jar files, including libraries, from the other three. The latest version of any of these can be checked out using the following commands (you must have an account on Sydney to access these): | |||||||
| > > |
The source code for each of these is maintained in a separate repository; however, there is an additional repository which stores the most recent Jar files, including libraries, from new_galago and edu.umass. The latest version of any of these can be checked out using the following commands (you must have an account on Sydney to access these):
| |||||||
svn co svn+ssh://sydney.cs.umass.edu/home/hfeild/svn/galago/tags/latest galago svn co svn+ssh://sydney.cs.umass.edu/home/hfeild/svn/pig_galago/tags/latest pig_galago svn co svn+ssh://sydney.cs.umass.edu/home/hfeild/svn/edu.umass/tags/latest edu.umass | ||||||||
| Changed: | ||||||||
| < < |
svn co svn+ssh://sydney.cs.umass.edu/home/hfeild/svn/ciir_galago/tags/latest ciir_galago | |||||||
| > > |
svn co svn+ssh://sydney.cs.umass.edu/home/hfeild/svn/ciir_galago_bin/tags/latest ciir_galago_bin | |||||||
| Line: 152 to 153 | ||||||||
| Changed: | ||||||||
| < < |
The ciir_galag package is just a directory of Jar files, so no building is necessary.
| |||||||
| > > |
The ciir_galago_bin package has a directory of Jar files, so no building is necessary.
It also has a directory of sample Galago parameter files and a scripts/ directory. See
the README it contains for more information.
| |||||||
| Next, you will need to set up your CLASSPATH environment variable so that Java will know where to find the Jar files you checked out. | ||||||||
| Line: 1213 to 1216 | ||||||||
| Added: | ||||||||
| > > |
Running TupleFlowTo run TupleFlow on the above parameter file (or on one that you have created), do one of the following:Local (non-distributed):
mkdir /path/to/my/tmp/dir
java -Xmx900m galago.tupleflow.execution.JobExecutor \
local paramFile.xml /path/to/my/tmp/dir
DRMAA (distributed):
mkdir /path/to/my/tmp/dir
java -Xmx900m galago.tupleflow.execution.JobExecutor \
drmaa paramFile.xml /path/to/my/tmp/dir
stderr on your screen. However, if you use the distributed mode,
these messages are all kept in files within the temporary directory (in the example above,
these files are located in: /path/to/my/tmp/dir/stderr/.
Galago keeps track of which stages fail and succeed. A quick way to check is to do:
ls /path/to/my/tmp/dir/jobs/*/ hashCount
specified at the top of the parameter file). Stages that completed successfully will have
an addition file with the job number followed by .complete. If the job failed, it will
be followed by .error.
If you are able to fix the bug that caused the error, and the changes you made only affect
the data flow from where the errors occurred onwards, then if you pass the same temporary
directory to TupleFlow, it won't redo the stages that completed successfully, but rather start at
the failed stages.
However, if you do change something in your code that affects the flow of data from stages
that have already finished successfully, or if you want a clean start, delete the contents of
the temporary directory before running TupleFlow.
| |||||||
Indexing on Sydney / Swarm | ||||||||
| Line: 1224 to 1283 | ||||||||
java -Xmx900m galago.tupleflow.execution.JobExecutor \ | ||||||||
| Changed: | ||||||||
| < < |
local index_param_file.xml tmp_files/ | |||||||
| > > |
local index_param_file.xml /path/to/tmp/dir/ | |||||||
| Line: 1234 to 1293 | ||||||||
java -Xmx900m galago.tupleflow.execution.JobExecutor \ | ||||||||
| Changed: | ||||||||
| < < |
drmaa index_param_file.xml tmp_files/ | |||||||
| > > |
drmaa index_param_file.xml /path/to/tmp/dir/ | |||||||
| ||||||||
| Changed: | ||||||||
| < < |
Galago (UNDER CONSTRUCTION) | |||||||
| > > |
Galago(UNDER CONSTRUCTION) | |||||||
| Created by Trevor Strohman | ||||||||
| Line: 47 to 48 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Galago is made of up several components. The most powerful component is TupleFlow. TupleFlow is a kind of MapReduce framework that allows | ||||||||
| Changed: | ||||||||
| < < |
for the stages of the Map-Reduce to have multiple inputs and outputs. Trevor | |||||||
| > > |
for the stages of the map-reduce to have multiple inputs and outputs. Trevor | |||||||
| describes TupleFlow as, "a mix between MapReduce, make/ant, and a database system. TupleFlow is like MapReduce in that it can efficiently parallelize a large computation. It is like make or ant in that it runs based on a file that | ||||||||
| Line: 57 to 58 | ||||||||
| It is on top of the TupleFlow framework that Galago's indexers are built. The indexers are made of several Java classes which can be used in TupleFlow stages; depending on which ones you combine, you can make a tradition indexer (i.e. one that produces | ||||||||
| Changed: | ||||||||
| < < |
an inverted index) or a query likelihood binned indexer, among other. | |||||||
| > > |
an inverted index) or a query likelihood binned indexer, among others. | |||||||
| These are examples of the type of applications that TupleFlow can enhance. | ||||||||
| Changed: | ||||||||
| < < |
Galago's third most useful component is the retrieval system. | |||||||
| > > |
Another useful component of Galago is the retrieval system. | |||||||
| This does not rely on TupleFlow, but it knows how to read and interact with the | ||||||||
| Changed: | ||||||||
| < < |
inverted indexes created by Galago's indexer. It is also easily extended and | |||||||
| > > |
inverted indexes created by Galago's indexers. It is also easily extended and | |||||||
| is a great way to prototype new retrieval operators. There are other features that come with Trevor Strohman's release of Galago. However, | ||||||||
| Line: 84 to 85 | ||||||||
Trevor's Galago Branch | ||||||||
| Changed: | ||||||||
| < < |
Trevor's branch is available via this Git repository. To checkout a copy from here, you'll first need to get Git, which is available here. It is pretty easy to install. | |||||||
| > > |
Trevor's branch is available via this Git repository. To checkout a copy, you'll first need to get Git, which is available here. It is pretty easy to install. | |||||||
| Once you have Git, you can download the current version of Galago by issuing the command: | ||||||||
| Line: 92 to 93 | ||||||||
| git clone git://repo.or.cz/galago.git | ||||||||
| Changed: | ||||||||
| < < |
For information about what is included in Trevor's Galago branch and how to use it, see Galago Guidebook. | |||||||
| > > |
For information about what is included in Trevor's Galago branch and how to use it, see the Galago Guidebook. | |||||||
<!--
That should create a directory called galago with all of the Galago files in it. Change directories to:
| ||||||||
| Line: 117 to 118 | ||||||||
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| The source code for each of these is maintained in a separate repository; however, there is an additional repository which stores the most recent Jar files, including libraries, from the other three. The latest version of any of these can be checked out using the following commands (you must have an account on Sydney to access these): | ||||||||
| ||||