MSDialog

Introduction

The MSDialog dataset is a labeled dialog dataset of question answering (QA) interactions between information seekers and answer providers from an online forum on Microsoft products (Microsoft Community). The dataset contains more than 2,000 multi-turn information-seeking conversations with 10,000 utterances that are annotated with user intent on the utterance level. Annotations were done using crowdsourcing with Amazon Mechanical Turk. MSDialog has several versions, including the complete set (MSDialog-Complete) and a labeled subset (MSDialog-Intent). We also preprocessed the data to produce MSDialog-ResponseRank for conversation response ranking.

MSDialog-Complete

We crawled over 35,000 dialogs from Microsoft Community, a forum that provides technical support for Microsoft products. This well-moderated forum contains user-generated questions with high-quality answers provided by Microsoft staff and other experienced users including Microsoft Most Valuable Professionals. In technical support online forums, a thread is typically initiated by a user-generated question and answered by experienced users (agents). The users may also exchange clarifications with the agents or give feedback based on answer quality. Thus the flow of a technical support thread resembles the information-seeking process if we consider threads as dialogs and posts as turns/utterances in dialogs. In addition to the dialog title and utterances, we also collected rich metadata, including question popularity, answer vote and user affiliation.

Data Fields

dialog_id: a unique id for a dialog
category: fine-grained product category
title: dialog title from the forum
dialog_time: the time that the first utterance of the dialog was posted
frequency: how many “I have the same question” votes does the question (the initial utterance of the dialog) have
utterances: a list of utterances in this dialog
- id: a global unique id for an utterance
- actor_type: user or agent (“user” refers to the information seeker that initiates the conversation. All the other conversation participants are considered as “agents”)
- utterance_pos: the utterance position in the dialog
- utterance: the content of the utterance
- vote: how many “helpful” vote does the answer received from the community
- utterance_time: the time that the utterance was posted
- affiliation: the affiliation of the actor (e.g. common user, Microsoft, MVP, etc.)
- is_answer: whether the utterance is selected as the best answer by the community

Note:

Empty affiliation means “Common User”
The user_id data field is anonymized. user_id is replaced with common names to protect poster’s privacy. We also tried our best to replace mentions of user_id in the utterances.
The “vote” for the first question in a dialog starts with “Freq_”, because it is actually the “question frequency” of the initial utterance in the dialog.

Example Data Format

{
    "20481": {
        "category": "Word",
        "dialog_time": "2017-09-21T04:15:54",
        "title": "Line and paragraph spacing in Office Word 2007",
                        "frequency": "0",
        "utterances": [{
            "affiliation": "Common User",
            "utterance_time": "2017-09-21T04:15:54",
            "utterance_pos": 1,
            "id": 192941,
            "user_id": "Michael",
            "actor_type": "User",
            "utterance": "Hello. Whenever I open a new Office Word ...",
            "is_answer": 0,
            "vote": "Freq_0"
        }, {
            "affiliation": "MVP",
            "utterance_time": "2017-09-21T05:16:23",
            "utterance_pos": 2,
            "id": 192944,
            "user_id": "Robin",
            "actor_type": "Agent",
            "utterance": "When using ...",
            "is_answer": 0,
            "vote": "0"
        }, { more utterances ... }],
    }
}

MSDialog-Intent

Based on the MSDialog-Complete, we selected some dialogs for user intent annotation on AMT. To ensure the quality and consistency of the dataset, we selected about 2,400 dialogs that meet the following criteria for annotation: (1) With 3 to 10 turns. (2) With 2 to 4 participants. (3) With at least one correct answer selected by the community. (4) Falls into one of the categories of Windows, Office, Bing, and Skype, which are the major categories of Microsoft products. We classify user intent in dialogs into 12 classes shown in the following table. Each utterance can be assigned multiple labels because an utterance can be associated multiple intent (e.g. GG+FQ).

Taxonomy

Code	Label	Description	Example
OQ	Original Question	The first question by a user that initiates the QA dialog.	If a computer is purchased with win 10 can it be downgraded to win 7?
RQ	Repeat Question	Posters other than the user repeat a previous question.	I am experiencing the same problem ...
CQ	Clarifying Question	Users or agents ask for clarification to get more details.	Your advice is not detailed enough. I'm not sure what you mean by ...
FD	Further Details	Users or agents provide more details.	Hi. Sorry for taking so long to reply. The information you need is ...
FQ	Follow Up Question	Users ask follow up questions about relevant issues.	Thanks. I really have one more simple question -- if I ...
IR	Information Request	Agents ask for information of users.	What is the make and model of the computer? Have you tried installing ...
PA	Potential Answer	A potential answer or solution provided by agents.	Hi. To change your PIN in Windows 10, you may follow the steps below: ...
PF	Positive Feedback	Users provide positive feedback for working solutions.	Hi. That was exactly the right fix. All set now. Tx!
NF	Negative Feedback	Users provide negative feedback for useless solutions.	Thank you for your help, but the steps below did not resolve the problem ...
GG	Greetings/Gratitude	Users or agents greet each others or express gratitude.	Thank you all for your responses to my question ...
JK	Junk	There is no useful information in the post.	Emojis. Sigh .... Thread closed by moderator ...
O	Others	Posts that cannot be categorized using other classes.	N/A

Data Fields

Same with MSDialog-Complete, with an extra field of user intent label under “utterances” called “tags”. “tags” include multiple user intent labels separated by space (e.g. “GG OQ”) .

Statistics

Items	Min	Max	Mean	Median
# Turns Per Dialog	3	10	4.56	4
# Participants Per Dialog	2	4	2.79	3
Dialog Length (Words)	27	1,469	296.90	241
Utterance Length (Words)	1	939	65.16	47

Item	MSDialog-Complete	MSDialog-Intent
# Dialogs	35,000	2,199
# Utterances	300,000	10,020
Avg. # Participants	3.18	2.79
Avg. # Turns Per Dialog	8.94	4.56
Avg. # Words Per Utterance	75.91	65.16

We also provide the data split of MSDialog-Intent that we used for our paper "User Intent Prediction for Information-seeking Conversations". The split version is referred to as MSdialog-IntentPred. We also include the feature files. Feel free to refer to our source code to see how the feature files are generated.

MSDialog-ResponseRank

We also preprocessed the MSDialog-Complete data to construct a benchmark data set for response ranking in information-seeking conversations. Given MSDialog-Complete data, we filtered dialogs whose number of turns are out of the range [3,99]. After that we split the data into training/validation/testing partitions by question_time. Specifically, the training data contains 25,019 dialogs from “2005-11-12” to “2017-08-20”. The validation data contains 4,654 dialogs from “2017-08-21” to “2017-09-20”. The testing data contains 5,064 dialogs from “2017-09-21” to “2017-10-04”.

The next step is to generate the dialog context and response candidates. For each utterance by the “User” (We consider the utterances by the user except the first utterance, since there is no associated dialog context with it), we collected the previous c utterances as the dialog context, where c = min(t-1,10) and t-1 is the total number of utterances before the t-th utterance. The true response by the “Agent” becomes the positive response candidate. For the negative response candidates, we adopted negative sampling to construct them following previous work. For each dialog context, we firstly used the true response as the query to retrieve the top 1,000 results from the whole response set of agents with BM25. Then we randomly sampled 9 responses from them to construct the negative response candidates. For more details of data preprocessing, you can check our SIGIR’18 paper on response ranking in information-seeking conversation included in the Citations section.

Data Fields

MSDialog-ResponseRank data includes three tsv files for the training, validation and testing of response ranking models. The format of these three files are as follows:

label \t utterance_1 \t utterance_2 \t ...... \t candidate_response

This format is also adopted by the ubuntu dialog corpus used in several papers. Each line is corresponding to a conversation context/candidate response pair. Suppose there are n_i columns separated by tab in the i-th line. The first column is a binary label to indicate whether the candidate response is the positive candidate response returned by the agent or the sampled negative candidate response. Then the next (n_i - 2) columns are the utterances in the conversation context including the current input utterance by the user. The last column is the candidate response started with “ <<<AGENT>>>:”.

Example Data Format

We show an example conversation context/response pair as follows. For better readability, we put each column of this line into different rows.

        0
        \t
        I upgraded last week with no apparent problems and used Sticky Notes as recently as two nights ago. ......  Sticky Notes doesn't even appear in the list of programs on this machine.  Argh!  Help!!!!  I have information on those notes I NEED.
        \t
        Hello,  Thank you for contacting Microsoft Community.  I can understand the inconvenience caused, be assured that we are here to help you with your concern.   Method: 1 SFC scan: Method: 2 If the issue persists I would suggest you to put .....         
        \t
        Hi Jenith,  It's an Acer Aspire 5750-P5WE0, previously running Windows 7 Home Premium.  I do not get any error messages if I open C:\Windows.old\Users, but all .....
        \t
         <<<AGENT>>>: We suggest that you perform a Clean Boot and try to reset the app. You may refer to this Microsoft article for more information.  Note: Follow the ......

Statistics

The statistics of MSDialog-ResponseRank data is presented as follows:

Data	MSDialog-ResonseRank
Items	Train	Valid	Test
# Context-response pairs	173,680	37,210	35,110
# Candidates per context	10	10	10
# Positive candidates per context	1	1	1
Min # turns per context	2	2	2
Max # turns per context	11	11	11
Avg # turns per context	5.0	4.9	4.4
Avg # words per context	271.4	263.2	227.4
Avg # words per response	66.7	67.6	66.8

Note that the statistics on average words per context/response are based on the preprocessed version of the data after removing stop words and words that appear less than 5 times in the whole corpus.

User ID Anonymization before Data Release

To protect user privacy, we performed user ID anonymization to all versions of MSDialog prior to data release. For MSDialog-Complete and MSDialog-Intent, we replace the user IDs in the “user_id” data field and their mentions in utterances with fake user IDs. For MSDialog-ResponseRanking, we use the Stanford Named Entity Recognizer to recognize person names in the data and replace them with “PERSON_PLACEHOLDER”. Note that the anonymization process may affect the results reported in our paper.

Instructions on Getting the Data

Any researchers interested in using the dataset for internal research should contact info@ciir.cs.umass.edu for access. Please include your name, institution and country you will be downloading from. When you contact us, if you are legitimate researchers, we will supply you with a password to a URL where you will have access to download the data.

By downloading the data, you agree that the data will be used only for internal research and that you will not share the dataset(s) with others.

Citations

If you use MSDialog in your paper, please include citations to the following papers:

Chen Qu, Liu Yang, W. Bruce Croft, Johanne R Trippas, Yongfeng Zhang, Minghui Qiu. Analyzing and Characterizing User Intent in Information-seeking Conversations. SIGIR 2018.
Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W. Bruce Croft, Jun Huang, Haiqing Chen. Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems. SIGIR 2018.
Chen Qu, Liu Yang, W. Bruce Croft, Yongfeng Zhang, Johanne R Trippas and Minghui Qiu. User Intent Prediction in Information-seeking Conversations. CHIIR 2019.

Bibtext

        @inproceedings{InforSeek_User_Intent,
            author = {Qu, C. and Yang, L. and Croft, W. B. and Trippas, J. and Zhang, Y. and Qiu, M.},
            title = {Analyzing and Characterizing User Intent in Information-seeking Conversations. },
            booktitle = {SIGIR '18},
            year = {2018},
        } 

        @inproceedings{InforSeek_Response_Ranking,
            author = {Yang, L. and Qiu, M. and Qu, C. and Guo, J. and Zhang, Y. and Croft, W. B. and Huang, J. and Chen, H.},
            title = {Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems},
            booktitle = {SIGIR '18},
            year = {2018},
        } 

        @inproceedings{InforSeek_User_Intent_Pred,
            author = {Qu, C. and Yang, L. and Croft, W. B. and Zhang, Y. and Trippas, J. and Qiu, M.},
            title = {User Intent Prediction in Information-seeking Conversations},
            booktitle = {CHIIR '19},
            year = {2019},
        }

Have Questions?

Ask us questions at our google group or via emails to the authors of the papers.

Acknowledgement

This work was supported in part by the Center for Intelligent Information Retrieval and in part by NSF grant #IIS-1419693 and NSF grant #IIS-1715095. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor.

MSDialog Agreement

Use of the MSDialog Dataset
By downloading the MSDialog data, you as the “User” agree to use the data distributed by the Center for Intelligent Information Retrieval (CIIR) subject to the following understandings, terms and conditions:

Permitted Uses
The Data may only be used for internal evaluation and research purposes, and user will not share the dataset(s) with others.

Small excerpts of the information may be displayed to others or published in a scientific or technical context, solely for the purpose of describing the research and development and related issues, provided that user includes citations to the CIIR publications listed in the MSDialog data ReadMe file if the MSDialog data is used in a research paper. All efforts must be made not to infringe on the rights of any third party including, but limited to, the authors and publishers of the excerpts.

Copyright
The Information has been obtained by crawling the Internet. Due to the amount of data it has not been practicable to obtain permission from copyright owners to provide the data for the uses permitted under this Agreement (“Permitted Uses”).

User understands that all the documents in the data are documents which have been at some time made publicly available on the Internet and which have been collected using a process which respects the commonly accepted methods (such as robots.txt) for indicating that the documents should not be so collected.

The copyright holders retain ownership and reserve all rights pertaining to the use and distribution of the information. Except as specifically permitted above and as necessary to use and maintain the integrity of the information on computers used by the organization; the display, reproduction, transmission, distribution or publication of the information is prohibited. Violations of the copyright restrictions on the information may result in legal liability.

Disclaimer of Warranty
USER ACKNOWLEDGES AND AGREES THAT “DATA” RECEIVED ARE PROVIDED BY THE CENTER FOR INTELLIGENT INFORMATION RETRIEVAL AND OTHER CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR DATA CONTRIBUTORS BE LIABLE FOR SPECIAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, INCIDENTAL OR OTHER DAMAGES, LOSSES, COSTS, CHARGES, CLAIMS, DEMANDS, FEES OR EXPENSES OF ANY NATURE OR KIND ARISING IN ANY WAY FROM THE FURNISHING OF OR USER’S USE OF THE DATA RECEIVED.