nfL6: Yahoo Non-Factoid Question Dataset
Background
This dataset is derived from Yahoo's Webscope L6 collection using machine learning techiques such that the questions would contain non-factoid answers.
The dataset contains 87,361 questions and their corresponding answers. Each question contains its best answer along with additional other answers submitted by users. Only the best answer was reviewed in determining the quality of the question-answer pair.
The data fields correspond to dictionaries containing the information.
These dictionaries comprise the following:
- question
- The string representing the question.
- answer
- A string representing the best answer to the question.
- nbestanswers
- One or more strings representing other submitted answers to the question.
- main_category
- A string representing the Yahoo category for the submitted question.
- id
- The unique Yahoo ID string the question.
Components of a question may be obtained using specific keywords to access its parts. For example, the data in one question dictionary inside the json file may be accessed as follows:
obj['question'] The question string
obj['answer'] The highest voted answer string
obj['nbestanswers'] A list of strings representing other submitted answers for the question
obj['main_category'] A string representing the submitted question
obj['id'] A unique Yahoo's ID string for the question
An example script parsing this collection in python is shown below:
import json
questions = []
mydata = json.load(open('nfL6.json','r'))
for q_a in mydata:
questions.append(q_a['questions'])
Email Downloads for questions or comments concerning the dataset or this web page.
Dataset
This collection consists of a README.txt file containing information about the dataset and the dataset itself as compressed rar nfL6.json.rar or gzip'ed nfL6.json.gz files.
Download
Uncompress the rar archive using 7zip or rar/unrar on Windows machines. Both these
utilities may also be installed on Unix machines.
rar x nfL6.json.rar
7zip x nfL6.json.rar
On Unix machines, uncompress the file using gunzip:.
gunzip nfL6.json.gz
|
Size |
Size |
---|---|---|
README.txt |
|
|
Gzip'ed nfL6 JSON data file |
|
|
Rar nfL6 JSON data file |
|
|
We would like to thank Yahoo for collecting and distributing Webscope L6.