The movie-search-ml20 dataset

If you used this dataset, please refer to the following papers:

Hamed Zamani and W. Bruce Croft, "Learning a Joint Search and Recommendation Model from User-Item Interactions". In WSDM 2020.

Matthias Hagen, Daniel Wagner, and Benno Stein, "A Corpus of RealisticKnown-Item Topics with Associated Web Pages in the ClueWeb09". In ECIR 2015.

Do not hesitate to contact Hamed Zamani (zamani@cs.umass.edu), if you have any questions.

The dataset consists of 919 questions. Each question is associated with one movie ID in the MovieLens 20M movie set.

Download: The data is publicaly available for research purposes: click here.

The file is tab-separated (tsv). Each row containts:

Timestamp: the time that the question was asked in Yahoo! Answers.
Subject: the subject of the question asked by the user.
Content: the content of the question post written by the user.
Answer Doc ID: the document ID of the relevant movie in the ClueWeb09 collection.
Answer URL: the URL of the relevant movie (usually a Wikipedia page).
Answer MovieLens20m ID: the ID of the relevant movie in the MovieLens 20M dataset.