README 06-Nov-2012 Contact: Henry Feild (hfeild@cs.umass.edu) CONTENTS ======== 1. Overview 2. Files 3. References 1. OVERVIEW =========== This collection consists of 46,561,553 metadata records crawled from the Open Library on November 30, 2011 and click distributions over records for 22,622 queries recorded over the year October 2010 through September 2011. These data are further described in [1]. 2. FILES ======== The data is kept in two files: open-library-metadata.tsv.bz2 (4.6 GB compressed, 34 GB uncompressed) This is the metadata. It is in the tab-delimited format: Different types have different metadata fields. The fields are in JSON format. We do not offer a comprehensive schematic of the metadata fields. Here is an example record: /type/author /authors/OL1000057A 2 2008-08-20T17:57:09.66187 {"name": "Kha\u0304lid Muh\u0323ammad \u02bbAli\u0304 al-H\u0323a\u0304jj", "personal_name": "Kha\u0304lid Muh\u0323ammad \u02bbAli\u0304 al-H\u0323a\u0304jj", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:09.66187"}, "key": "/authors/OL1000057A", "type": {"key": "/type/author"}, "revision": 2} open-library-eval-set.tar.bz2 (1.3 MB compressed, 7.2 MB uncompressed) This unpacks to a directory consisting of: open-library-eval-set/ README test/ train/ See the open-library-eval-set/README for details. 3. REFERENCES ============= [1] J.Y. Kim, H. Feild, and M. Cartright. "Understanding Book Search on the Web," CIKM 2012.