Downloads
This page lists tidbits of code or other electronic resources
which are available
for download. If you
make changes that you find are helpful, let me know and I'll
add in your changes to my version. Not all of them are authored
by me.
Datasets
- Open Library Query-Click Dataset
— This is an evaluation dataset collected during an analysis of search behavior from Open Library server logs. The dataset consists of 22,622 frequently submitted queries and their associated clicks plus a collection of 46,561,553 Open Library metadata records crawled on November 30, 2011. For more information, see the README.
-
View Readme
Download data set (two .bz2 files plus the README; uploaded 06-Nov-2012) - Searcher Frustration User Study Data
— This is a dataset collected during a user study of frustration during
web search at the University of Massachusetts Amherst in October 2009. The
study consists of query logs and sensor readings for thirty participants. For
more information, see the README.
This is available under an Open Database/Database Content license. Feel free to use, redistribute, and modify the dataset, but make sure to make it available under the same license and to give due attribution in any public use of the dataset.
-
View Readme
Download data set (.tar.bz2 file; uploaded 29-Jan-2010)
Random Code
- Fisheye menus using CSS zoom —
See my blog post on adapting the jQuery Interface Fisheye library to use CSS zoom.
Download (.tar.gz file; packaged on 06-Feb-2012)Download wikitext of a Wikipedia page — A ruby script that will download the wikitext for the given Wikipedia page title. Requires Ruby 1.9+, rubygems, and json.
-
Download
(.rb file; uploaded on 06-Jul-2011; right-click and choose 'Save-as' to download)
-
TagCloud generator
— A ruby script that takes a listing (either as a file or from stdin) of <count, word/phrase> pairs and produces a simple tag cloud html file. This is pretty simplistic, so if you want to change the font sizes, you'll need to tweak the minFontSize and maxFontSize variables...at some point, maybe I'll add parameters for that...
-
Download (.rb file; Uploaded 06-Jul-2011; right-lcick and choose 'Save-as' to download)
- Shamir's Secret Sharing Scheme
— A Javascript impelementation of Shamir's Secret Sharing
Scheme (see the
Wikipedia page).
-
See a demo
Download current version (.tar.gz file; uploaded on 26-Apr-2011).Introduction to Statistical Thought
Michael L. Lavine, a professor of statistics at UMass and a member of my dissertation committee, has a free e-book called Introduction to Statistical Thought, which is a great introduction to statitics at both the undergraduate and graduate level. It has lots of examples in R, which I found particularly useful since that is my choice stats tool. Check out the PDF!TupleFlow Program Downloads
- Extract N-Grams
— Meant for use on the Swarm at UMass, this program will extract the counts
associated with all 1- through 5-gram phrases in an input file. This is
done by processing the Google NGrams counts and the map-reduce framework,
TupleFlow. At some point, I would like to modify this to use any cluster with a
DRMAA interface and any n-gram collection. Note: this might not work any more; TupleFlow has changed bunches since 2008.
-
View Readme.
Download current version (.tar.gz file; uploaded on 08-Dec-2008).
QALP Downloads
- QALP::IdentifierSplitter
— A Perl module which contains several functions which allow you to split
compound source code identifiers (or any sequence of concatenated words) into
their constituent parts. For example, `spongebobsquarepants' will be split into
`sponge_bob_square_pants'. Two interface scripts are also included. Note: I've heard this is buggy (surprise!).
-
View Readme.
Download Version 0.01 (.tar.gz file; uploaded on 14-Feb-2008).
© 2013 Henry A. Feild - Extract N-Grams
— Meant for use on the Swarm at UMass, this program will extract the counts
associated with all 1- through 5-gram phrases in an input file. This is
done by processing the Google NGrams counts and the map-reduce framework,
TupleFlow. At some point, I would like to modify this to use any cluster with a
DRMAA interface and any n-gram collection. Note: this might not work any more; TupleFlow has changed bunches since 2008.