Athena: Learning-oriented Search with Personalized Learning Flows

A Collaborative Project with the University of Massachusetts Amherst
and the University of North Carolina at Chapel Hill

University of Massachusetts Amherst:
James Allan, PI
Razieh Negin Rahimi and Hamed Zamani, co-PIs

University of North Carolina at Chapel Hill:
Jaime Arguello, PI
Robert Capra, co-PI

Project Award Information

NSF Award Number: 2106282
Award Title: Athena: Learning-oriented Search with Personalized Learning Flows
Duration: 10/01/2021 - 09/30/2024

Project Abstract

The Athena project will develop technology called "search as learning," a set of search technologies that encourage and support learning rather than just simple document finding. In order to learn, searchers must engage with information that is both novel and understandable. Therefore, at the core, Athena will support learning by modeling several important factors: (1) the knowledge connections between documents covering a topic, (2) a user's current state of knowledge on that topic, (3) the types of knowledge a user is likely to gain from a document, and (4) the knowledge required for a user to successfully engage with a document. The Athena project will involve two types of end-to-end systems, both of which will model and leverage the learner's state of knowledge (LSK): an LSK-aware search engine and an LSK-aware question answering system. The Athena systems will guide a user through a topic and find relevant information in the context of previously encountered information and the topic structure captured in a web of topics. The team will evaluate Athena using standard measures as well as a series of studies involving human subjects. If the Athena project is successful, it will make it easier for people to use search engines and related technologies to learn about complex topics, where there are numerous interrelated and dependent subtopics that should be considered. Given that search is among the most common online activities on and off the Web, Athena and its technologies will have a substantial impact on searchers trying to learn such topics.

Athena enables "search as learning" using a data structure referred to as a Learning Flow Graph (LFG). An LFG comprises nodes that represent sub-topics (e.g., concepts) within a given domain and vertices that represent relations between sub-topics (e.g., one sub-topic being foundational to understand another). Athena leverages LFGs to model the different factors mentioned above. It uses probability distributions across nodes in an LFG to model: (1) a user's knowledge state, (2) the potential knowledge gains from an information item, and (3) the prerequisite knowledge required for a user to successfully engage with an information item. The Athena team will develop algorithms for generating LFGs from structured and semi- and unstructured resources (e.g., course syllabi, tables of contents, book indices, knowledge bases, query logs), algorithms for integrating LFGs into search and question-answering models, and algorithms for re-estimating LFGs and a user's knowledge state based on search behaviors (e.g., queries, clicks, skips, dwell times, etc.). Structuring textual data to find the optimal learning paths through it is of great interest, though most existing work has focused on extracting information to fill slots in a "knowledge base," a much finer grained task. The LFG representation also provides a type of explanation of a larger topic, connecting to the broad interest in explainable systems. The Athena work will extend the state of the art in text representation, neural approaches including attention techniques, query and topic modeling, contextual text summarization, and understanding human approaches to complex search activities.

Highlights

On May 4, 2023, the UMass Amherst CIIR and University of North Carolina Chapel Hill Athena project collaborators organized the team's first Learning and Language Technologies Symposium held at UMass Amherst. Faculty and doctoral students presented research that addresses how computing can be used to help people learn and how human language can be modeled or mined for those and related challenges. The symposium had two parts: (1) a poster session where undergraduate students met doctoral students to learn about how and why they chose to be researchers in these areas, and (2) a series of short talks where graduate students (and undergraduates) and faculty heard from UMass Amherst Manning College of Information and Computer Sciences (CICS) and UNC Chapel Hill faculty on their research in this exciting field. The event talks/poster session were attended by 50-60 students.

Publications
IR-1296: Zamani, H. and Zhu, Y., "Predicting Prerequisite Relations for Unseen Concepts," in the Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi, Dubai, 12/7-12/2022, pp. 8542-8548.

IR-1299: Salemi, A., Altmayer Pizzorno, J. and Zamani, H., "A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering," in Proceeding of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), July 23--27, 2023, Taipei, Taiwan, pp. 110-120.

IR-1303: Salemi, A., Rafiee, M. and Zamani, H., "Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering," in Proceedings of The 13th (9th ACM SIGIR) International Conference on the Theory of Information Retrieval (ICTIR 2023), Taipei, Taiwan, July 23, 2023.

Point of Contact: James Allan - allan@cs.umass.edu
Center for Intelligent Information Retrieval (CIIR)
Manning College of Information and Computer Sciences
140 Governors Drive
University of Massachusetts Amherst
Amherst, MA 01003-9264

This material is based upon work supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation under Grant No. 2106282 and 2106334 (UNC). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.