- Open Library Query-Click Dataset
— This is an evaluation dataset collected during an analysis of search behavior from Open Library server logs. The dataset consists of 22,622 frequently submitted queries and their associated clicks plus a collection of 46,561,553 Open Library metadata records crawled on November 30, 2011. For more information, see the README.
- Searcher Frustration User Study Data
— This is a dataset collected during a user study of frustration during
web search at the University of Massachusetts Amherst in October 2009. The
study consists of query logs and sensor readings for thirty participants. For
more information, see the README.
This is available under an Open Database/Database Content license. Feel free to use, redistribute, and modify the dataset, but make sure to make it available under the same license and to give due attribution in any public use of the dataset.
- Fisheye menus using CSS zoom —
See my blog post on adapting the jQuery Interface Fisheye library to use CSS zoom.
Download (.tar.gz file; packaged on 06-Feb-2012)
Download wikitext of a Wikipedia page — A ruby script that will download the wikitext for the given Wikipedia page title. Requires Ruby 1.9+, rubygems, and json.
(.rb file; uploaded on 06-Jul-2011; right-click and choose 'Save-as' to download)
— A ruby script that takes a listing (either as a file or from stdin) of <count, word/phrase> pairs and produces a simple tag cloud html file. This is pretty simplistic, so if you want to change the font sizes, you'll need to tweak the minFontSize and maxFontSize variables...at some point, maybe I'll add parameters for that...
Download (.rb file; Uploaded 06-Jul-2011; right-lcick and choose 'Save-as' to download)
- Shamir's Secret Sharing Scheme
Scheme (see the
Introduction to Statistical ThoughtMichael L. Lavine, a professor of statistics at UMass and a member of my dissertation committee, has a free e-book called Introduction to Statistical Thought, which is a great introduction to statitics at both the undergraduate and graduate level. It has lots of examples in R, which I found particularly useful since that is my choice stats tool. Check out the PDF!
TupleFlow Program Downloads
- Extract N-Grams
— Meant for use on the Swarm at UMass, this program will extract the counts
associated with all 1- through 5-gram phrases in an input file. This is
done by processing the Google NGrams counts and the map-reduce framework,
TupleFlow. At some point, I would like to modify this to use any cluster with a
DRMAA interface and any n-gram collection. Note: this might not work any more; TupleFlow has changed bunches since 2008.
— A Perl module which contains several functions which allow you to split
compound source code identifiers (or any sequence of concatenated words) into
their constituent parts. For example, `spongebobsquarepants' will be split into
`sponge_bob_square_pants'. Two interface scripts are also included. Note: I've heard this is buggy (surprise!).
© 2013 Henry A. Feild
- Extract N-Grams — Meant for use on the Swarm at UMass, this program will extract the counts associated with all 1- through 5-gram phrases in an input file. This is done by processing the Google NGrams counts and the map-reduce framework, TupleFlow. At some point, I would like to modify this to use any cluster with a DRMAA interface and any n-gram collection. Note: this might not work any more; TupleFlow has changed bunches since 2008.