nfL6: Yahoo Non-Factoid Question Dataset
Background
This dataset is derived from Yahoo's Webscope L6 collection using machine learning techiques such that the questions would contain non-factoid answers.
The dataset contains 87,361 questions and their corresponding answers. Each question contains its best answer along with additional other answers submitted by users. Only the best answer was reviewed in determining the quality of the question-answer pair.
         The data fields correspond to dictionaries containing the information.
         These dictionaries comprise the following: 
         
         
- question
 - The string representing the question.
 - answer
 - A string representing the best answer to the question.
 - nbestanswers
 - One or more strings representing other submitted answers to the question.
 - main_category
 - A string representing the Yahoo category for the submitted question.
 - id
 - The unique Yahoo ID string the question.
 
Components of a question may be obtained using specific keywords to access its parts. For example, the data in one question dictionary inside the json file may be accessed as follows:
      
        
          obj['question']         The question string
          obj['answer']           The highest voted answer string
          obj['nbestanswers']     A list of strings representing other submitted answers for the question
          obj['main_category']    A string representing the submitted question
          obj['id']               A unique Yahoo's ID string for the question
        
      
      
        An example script parsing this collection in python is shown below: 
        
      
      
        
          import json
          questions = []
          mydata = json.load(open('nfL6.json','r'))
          for q_a in mydata:
            questions.append(q_a['questions'])
        
      
      
Email Downloads for questions or comments concerning the dataset or this web page.
Dataset
      
      
This collection consists of a README.txt file containing information about the dataset and the dataset itself as compressed rar nfL6.json.rar or gzip'ed nfL6.json.gz files.
Download
        Uncompress the rar archive using 7zip or rar/unrar on Windows machines.  Both these
        utilities may also be installed on Unix machines.
        
        
               rar x nfL6.json.rar
               7zip x nfL6.json.rar
        
      
        On Unix machines, uncompress the file using gunzip:.
        
        
               gunzip nfL6.json.gz
        
      
| 
               | 
            
               Size  | 
            
               Size  | 
          
|---|---|---|
| README.txt | 
               | 
            
               | 
          Gzip'ed nfL6 JSON data file | 
               | 
            
               | 
          
          
            Rar nfL6 JSON data file | 
               | 
            
               | 
          
        
      
We would like to thank Yahoo for collecting and distributing Webscope L6.
