CIIR Talk Series: Mohammad Aliannejadi | Center for Intelligent Information Retrieval

Speaker: Mohammad Aliannejadi, University of Amsterdam

Title: Data Augmentation and User Simulation using Large Language Models

Date: Friday, October 6, 2023 - 1:30 - 2:30 PM EDT (North American Daylight Saving Time) via Zoom. On campus attendees will gather in CS 151 to view the presentation.

Abstract: With the recent advancements in large language models (LLMs), the generation of synthetic data and the simulation of user-system interactions have gained attention. The impressive success of LLMs in multiple NLP tasks provides opportunities for the IR community in various areas, including data augmentation and user simulation. In this talk, I will present our recent work on data augmentation. Unlike other studies, we leverage LLMs to generate synthetic documents from queries in a few-shot setting, outperforming state-of-the-art data augmentation methods. Additionally, I will provide an overview of our work on user simulation using LLMs, focusing on providing feedback in mixed-initiative conversations and transitioning from reactive to proactive user simulators. We demonstrate that GPT-2 and GPT-3.5 can match human performance in providing user feedback in single-turn and multi-turn settings. Moreover, we utilize GPT-4 in a proactive user simulation setting, where the simulated user can lead the conversation by delving into a given topic. In this study, we follow a similar setup to the QuAC dataset and examine the effectiveness of GPT-4 in playing the roles of both the student and the teacher.

Bio: Mohammad Aliannejadi is an Assistant Professor at the University of Amsterdam, The Netherlands. His research interests include single and mixed-initiative conversational information access, user simulation, and recommender systems. Previously, he completed his PhD at the Università della Svizzera italiana, Switzerland, where he focused on novel information access approaches in conversations. He is passionate about advancing conversational search systems and has co-organized multiple data challenges in this area, including the ClariQ Conversational AI Challenge (ConvAI 3), the NeurIPS competition on Interactive Grounded Language Understanding in a Collaborative Environment (IGLU), the TREC Conversational Assistance Track (CAsT), and the TREC Interactive Knowledge Assistance Track (iKAT).

Zoom Link: Subscribe to mailing list (details above) for Zoom Link/Passcode notifications; or click here for Zoom link and reach out to Alex Taubman for the passcode.