NB: appearance in this list means only that we think the collection exists in machine-readable form somewhere and might be available; there are serious copyright as well as other availability issues for most of these!
Exception: collections whose name is prefixed with "*" don't yet exist, as far as we know.
|
Name |
Representation(encoding) |
Description |
|
MELDEX (NZDML) Folksongs |
CMN (MELDEX) |
9400 German, Chinese, and Anglo-American folksongs from two sources. Better: "MELDEX Plus": add back in the c. 200 containing tuplets they removed. |
|
NZDML Fake book collection |
CMN (MELDEX?) |
over 1200 popular tunes |
|
*Barlow and Morgenstern |
CMN (??) |
10,000 themes of classical pieces |
|
Bach Chorales |
MIDI (SMF) |
short 4-part contrapuntal pieces: 185 (from BG v.39) + c.200 (from elsewhere) |
|
CCARH MuseData |
CMN (MuseData,kern) |
2461 complete movements of 634 classical pieces. NB: includes 185 Bach chorales. |
|
RISM |
CMN (Plaine & ...) |
100K? incipits from 300K? works |
|
Huron |
CMN (Humdrum kern) |
c.5000 pieces |
|
JHU/Levy sheet music |
CMN(image) |
scanned sheet music (29,000 pieces, c. 100,000 pages, c. 80% public-domain) |
|
L of C/Duke sheet music |
CMN(image) |
" ("Historic American Sheet Music, 1850-1920": 3042 pieces from the Duke coll.) |
|
L of C/copyright |
sheet music CMN (image) |
" ("American Sheet Music, 1870-1885": 22,000 pieces copyrighted in those years) |
|
NZDML MidiMax |
MIDI (SMF?) |
c.100K MIDI files (collected from the Web?) |
|
Uitdenbogerd & Zobel |
MIDI (SMF) |
10,466 MIDI files (collected from the Web) |
|
?? *Audio files |
Audio (??) |
?? (much prefer w/o lossy compression!) |
The model is first described in a classic article from the 1960s:
CLEVERDON, C.W., MILLS, J. and KEEN, M.: Factors Determining the Performance of Indexing Systems, Volume I - Design, Volume II - Test Results, ASLIB Cranfield Project, Cranfield (1966).
This article is reprinted in:
SPARCK JONES, K., and WILLETT, P., eds. Readings in Information Retrieval (San Francisco: Morgan Kaufmann, 1997).
It is also covered in some depth in:
van RIJSBERGEN, C. J.: INFORMATION RETRIEVAL (first edn: London: Butterworths, 1975). (online version at: http://www.dcs.gla.ac.uk/Keith/Preface.html )
and many others (apologies for omissions).
The following excellent work doesn't seem to mention either Cleverdon or the Cranfield model, but it includes a good discussion of IR evaluation in general and of TREC, which is based on the model:
WITTEN, I., MOFFAT, A and BELL, T: Managing Gigabytes (current edn.: Morgan Kaufmann, 1999).
Van Rijsbergen sums up the idea thus (here quoted from http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html without permission) :
The ... question (what to evaluate?) boils down to what can we measure that will reflect the ability of the system to satisfy the user. ... Cleverdon ... listed six main measurable quantities:It is claimed that (1)-(4) are readily assessed. It is recall and precision which attempt to measure what is now known as the effectiveness of the retrieval system. In other words it is a measure of the ability of the system to retrieve relevant documents while atthe same time holding back non-relevant ones. It is assumed that the more effective the system the more it will satisfy the user. It is also assumed that precision and recall are sufficient for the measurement of effectiveness.
- The coverage of the collection, that is, the extent to which the system includes relevant matter;
- the time lag, that is, the average interval between the time the search request is made and the time an answer is given;
- the form of presentation of the output;
- the effort involved on the part of the user in obtaining answers to his search requests;
- the recall of the system, that is, the proportion of relevant material actually retrieved in answer to a search request;
- the precision of the system, that is, the proportion of retrieved material that is actually relevant.
On the all-important definition of 'relevance', van Rijsbergen has this to say:
Relevance is a subjective notion. Different users may differ about the relevance or non-relevance of particular documents to given questions. However, the difference is not large enough to invalidate experiments which have been made with document collections for which test questions with corresponding relevance assessments are available. These questions are usually elicited from bona fide users, that is, users in a particular discipline who have an information need. The relevance assessments are made by a panel of experts in that discipline. So we now have the situation where a number of questions exist for which the 'correct' responses are known. It is a general assumption in the field of IR that should a retrieval strategy fare well under a large number of experimental conditions then it is likely to perform well in an operational situation where relevance is not known in advance.
In summary, Cranfield-model evaluation requires three things: a collection of documents; a set of queries; and a set of relevance judgments for those queries and documents.
Musical 'relevance' remains to be defined ...
Back to contents