Learning and Language Technologies Symposium

Thursday, May 4, 2023

Poster session 11:00 am - 1:45 pm in Computer Science Building, Room CS 150/151

Research Talks 2:00 - 4:30 pm in in Computer Science Building, Room CS 150/151*

Join us for our first Learning and Language Technologies Symposium! We will be presenting research that addresses how computing can be used to help people learn and how human language can be modeled or mined for those and related challenges. The symposium has two parts: (1) a poster session where undergraduate students (in particular) can have some pizza while they meet doctoral students to learn about how and why they chose to be researchers in these areas, and (2) a series of short talks where graduate students (and undergraduates) and faculty can hear from CICS and visiting faculty on their research in this exciting field.

*Note -- the talks were initially scheduled for the LGRC. They have been moved to the CS Building (Room CS 150/151). Please check back and RSVP so that you get updates in case of any other changes.

Please register in advance (one RSVP site for both the poster session and research talks):

RSVP


POSTER SESSION

11:00 a.m. to 1:45 p.m. in the UMass Amherst Computer Science Building, Room CS 151

Stop by to talk with Manning CICS doctoral students to hear about their research and to learn about why research could be the career for you. Pizza will be provided.

  • Metaphors in Pre-Trained Language Models: Probing and Generalization Across Datasets and Languages - Ehsan Aghazadeh
  • Efficient and modality independent zero shot event extraction of entities with actor representatives - Erica Cai, Brendan O'Connor
  • ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution - Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack Merullo, Luke Yeh, Mohit Iyyer, Brendan O’Connor
  • Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation - Zhiqi Huang, Puxuan Yu, James Allan
  • Gender and Power in Latin Narratives - Marisa Hudspeth, Sam Kovaly, Minhwa Lee, Chau Pham, and Przemyslaw Grabowicz
  • Towards Improving Information Flow: NLP Approaches for Fact-Checking and Content Moderation - Nazanin Jafari
  • Query-driven Segment Selection for Ranking Long Documents - Youngwoo Kim
  • Generalized Weak Supervision for Neural Information Retrieval - Yen-Chieh Lien
  • Query and Document Representation for Tip-of-the-Tongue Known-Item Retrieval - Mahta Rafiee
  • When Large Language Models Meet Personalization - Alireza Salemi
  • Conversational Information Seeking - Chris Samarinas
  • Curriculum Learning for Dense Retrieval Distillation - Hansi Zeng
  • Predicting Prerequisite Relations for Unseen Concepts - Yaxin Zhu

RESEARCH TALKS

(View recordings of the research talks)

2:00 p.m. to 4:30 p.m. in the UMass Amherst Computer Science Building, Room CS 151

Hear from Manning CICS and visiting faculty about their learning and language technologies research.

Jaime Arguello, UNC Chapel Hill
2:00 pm
Understanding the “Pathway” Towards a Searcher’s Learning Objective[collapsed title="ABSTRACT/BIO"]ABSTRACT: Many studies in search-as-learning have aimed to understand factors that influence learning during search. Specifically, studies have focused on characteristics of the individual searcher, their objective, and the system. Less research has focused on the process through which searchers learn during a search session. When people search to learn, they often have a specific learning objective in mind. Additionally, achieving the objective often involves a pathway or sequence of learning-oriented subgoals. In this talk, I will present a study that investigated the effects of a searcher’s learning objective on three types of outcomes: (1) perceptions of the task, (2) search behaviors, and (3) the pathways followed by searchers toward the objective. To manipulate learning objectives and to characterize pathways, our study leveraged both dimensions of the Anderson & Krathwohl (A&K) taxonomy. Participants pursued learning objectives that varied across three cognitive processes (apply, evaluate, create) and three knowledge types (factual, conceptual, procedural). To understand the pathways followed by participants, the study used a think-aloud protocol. Think-aloud comments and recorded screen activities were used to represent search sessions as sequences of learning-oriented subgoals that were each manually assigned to a cell from A&K’s taxonomy. Our results found effects on all three types of outcomes. Interestingly, the learning objective influenced the types of pathways followed by participants. For example, factual objectives involved more remember-level subgoals, conceptual objectives involved more understand-level subgoals, and procedural objectives involved more create-level subgoals. In the talk, I will discuss implications, lessons learned, and opportunities for future work.

BIO: Jaime Arguello is a Professor at the School of Information and Library Science at the University of North Carolina (UNC) at Chapel Hill. Jaime received his Ph.D. from the Language Technologies Institute at Carnegie Mellon University in 2011. Since then, his research has focused on a wide range of areas, including aggregated search, voice query reformulation, understanding search behaviors during complex tasks, developing search assistance tools to support searchers complex tasks, and understanding the effects of specific cognitive abilities on search behaviors and outcomes. His papers have received awards at CSCW 2022, SIGIR 2009, ECIR 2011, IIiX 2014, and ECIR 2017.[/collapsed]

Brendan O’Connor, UMass Amherst
2:30 pm
Event Extraction for Social Science[collapsed title="ABSTRACT/BIO"]ABSTRACT: Natural language processing is coming into its own as a method for computational social science: helping understanding sociocultural phenomena by semi-automatically analyzing textual corpora of news, media, and books. A key challenge is to manage human expertise and input, and ideally relieve the burden of intensive annotation often used in supervised learning methods, while making broader statistical inferences relevant to social scientists.

I'll present some work focused on automated event extraction in low-supervision settings, which in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus: all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.

BIO: Brendan O’Connor (http://brenocon.com) is an associate professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst, who works in the intersection of computational social science and natural language processing. At UMass, he is an Associate Director of the Computational Social Science Institute. He holds a PhD in Machine Learning from Carnegie Mellon University and BS/MS in Symbolic Systems from Stanford University; he has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and worked at technology companies including Facebook Data Science and Crowdflower.[/collapsed]

Negin Rahimi, UMass Amherst
2:50pm
Information Seeking as Learning[collapsed title="ABSTRACT/BIO"]ABSTRACT: TBA

BIO: Razieh (Negin) Rahimi is a researcher assistant professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst. Previously, she was a postdoctoral researcher at the Center for Intelligent Information Retrieval, UMass Amherst where her mentor was James Allan. Her latest work investigates explanation of deep learning to rank models for information retrieval and search results. Before joining UMass, she worked at the Chinese University of Hong Kong. Negin received her B.Sc. and Ph.D. in Computer Science at University of Tehran.[/collapsed]
Rob Capra, UNC Chapel Hill
3:10 pm
How does AI chat change search behaviors?[collapsed title="ABSTRACT/BIO"]ABSTRACT: Generative AI tools such as chatGPT are poised to change the way people engage with online information. In this talk, I will present an exploratory user study with 10 participants who used a combined Chat+Search system that utilized the OpenAI GPT3.5 API and the Bing Web Search v5 API. Participants completed three search tasks. I will report on ways that users integrated AI chat into their search process, things they liked and disliked about the chat system, their trust in the chat responses, and their mental models of how the chat system generated responses. In addition, I will report on our research lab’s efforts to understand and support users’ search needs when they are engaged in complex tasks.

BIO: Dr. Robert Capra is a Professor in the School of Information and Library Science at the University of North Carolina at Chapel Hill. His interests include human-computer interaction, interactive information retrieval, and personal information management. In his research, he focuses on how people search for information and on developing tools to support users’ search needs. Dr. Capra is a recipient of a National Science Foundation CAREER grant and he publishes regularly in top computer and information science conferences and journals.

He holds a Ph.D. in computer science from Virginia Tech and Master’s and Bachelor’s degrees in computer science from Washington University in St. Louis. At Virginia Tech, he was part of the Center for Human-Computer Interaction where he investigated multi-platform interfaces, information re-finding, and interfaces for digital libraries. Prior to Virginia Tech, he worked in corporate research and development, spending five years in the Speech and Language Technologies group at SBC Communications (now AT&T Labs Austin) where he focused on voice user interfaces, speech recognition, and natural language processing.

Dr. Capra is an active member in the HCI and information science communities. He has co-edited special issues of IEEE Computer, ACM Transactions on Information Systems (TOIS), Information Processing & Management, and he is currently on the editorial board of JASIST. He a senior program committee member for SIGIR and is very active in the ACM Conference on Human Information Interaction and Retrieval (CHIIR), where he previously served as chair of the steering committee.[/collapsed]

Andrew Lan, UMass Amherst
3:40
GPT-based Open-ended Knowledge Tracing[collapsed title="ABSTRACT/BIO"]ABSTRACT: In education applications, knowledge tracing refers to the problem of estimating students’ time-varying concept/skill mastery level from their past responses to questions and predicting their future performance. One key limitation of most existing knowledge tracing methods is that they treat student responses to questions as binary-valued, i.e., whether they are correct or incorrect. Response correctness analysis/prediction ignores important information on student knowledge contained in the exact content of the responses, especially for open-ended questions. In this work, we conduct the first exploration into open-ended knowledge tracing (OKT) by studying the new task of predicting students’ exact open-ended responses to questions. We ground our work is grounded in the domain of computer science education with programming questions and develop a solution by fusing language models with student knowledge tracing methods.

BIO: Andrew Lan is an assistant professor in the Manning College of Information and Computer Sciences, University of Massachusetts Amherst. His research focuses on the development of artificial intelligence (AI) methods to enable scalable and effective personalized learning in education. His research spans areas such as learner modeling, personalization, content generation, and human-in-the-loop AI.[/collapsed]

Ivon Arroyo, UMass Amherst
4:00 pm
A Bilingual Mathematics Intelligent Tutoring System for Latinx Students[collapsed title="ABSTRACT/BIO"]ABSTRACT: I present the results of localizing a Mathematics Intelligent Tutoring System that addresses the motivational aspect of learning to the language and culture of a country in Latinamerica. We analyzed its impact after three different schools using the software for seven weeks in three schools in Argentina. Results yielded a significant improvement in mathematics performance in Spanish (after 6 sessions) showing the feasibility of learning mathematics with MathSpring in Spanish as a supplement to classroom instruction, with curriculum/software localized to the region. Further analyses showed an affective impact, as students improved on their grit --perseverance and passion for long term goals, as affective digital characters instilled perseverance, speaking to them in their own local Spanish accent. This work is laying the foundation to create Multi-racial Bilingual Personalized Digital Tutors for Mathematics Learning, for math classes and afterschool programs. This is a math learning platform for grades 5-8, which will adjust the language provided (English/Spanish) depending on preferences and needs. This research addresses several problems: 1) a language barrier while learning math, as math teachers lack the language teaching expertise to teach ELs; 2) a racial/ethnic representation problem, where students encounter material and characters that don’t reflect them; including diversity in tutoring agents inside educational software is costly and has not been able to be achieved as part of other research projects; 3) increasing the value of bilingualism: for students who speak English as their main language, learning to speak Spanish is important to value/appreciate a different culture; in addition, most Latinx students hear Spanish at home but are not formally taught Spanish in school.

BIO: Ivon Arroyo is an associate professor in the Manning College of Information and Computer Sciences and the College of Education, University of Massachusetts Amherst. She specializes in learning sciences, computer science, and education/cognitive psychology, with a focus on mathematics learning and assessment for K-12 students. Previously, Prof. Arroyo was an associate professor of social science and policy studies at Worcester Polytechnic Institute. She holds a BS in computer science from Universidad Blas Pascal in Argentina, and an MS in computer science and an EdD in math and science education from UMass Amherst. [/collapsed]

The collaborative work between the UMass Amherst CIIR team and the UNC Chapel Hill team is funded by the National Science Foundation under Grant No. 2106282 and 2106334 (UNC).