Athena: Learning-oriented Search with Personalized Learning Flows | Center for Intelligent Information Retrieval

A Collaborative Project with the University of Massachusetts Amherst
and the University of North Carolina at Chapel Hill

University of Massachusetts Amherst:
James Allan, PI
Razieh Negin Rahimi and Hamed Zamani, co-PIs

University of North Carolina at Chapel Hill:
Jaime Arguello, PI
Robert Capra, co-PI

Project Award Information

NSF Award Number: 2106282
Award Title: Athena: Learning-oriented Search with Personalized Learning Flows
Duration: 10/01/2021 - 09/30/2025

Project Abstract

The Athena project will develop technology called "search as learning," a set of search technologies that encourage and support learning rather than just simple document finding. In order to learn, searchers must engage with information that is both novel and understandable. Therefore, at the core, Athena will support learning by modeling several important factors: (1) the knowledge connections between documents covering a topic, (2) a user's current state of knowledge on that topic, (3) the types of knowledge a user is likely to gain from a document, and (4) the knowledge required for a user to successfully engage with a document. The Athena project will involve two types of end-to-end systems, both of which will model and leverage the learner's state of knowledge (LSK): an LSK-aware search engine and an LSK-aware question answering system. The Athena systems will guide a user through a topic and find relevant information in the context of previously encountered information and the topic structure captured in a web of topics. The team will evaluate Athena using standard measures as well as a series of studies involving human subjects. If the Athena project is successful, it will make it easier for people to use search engines and related technologies to learn about complex topics, where there are numerous interrelated and dependent subtopics that should be considered. Given that search is among the most common online activities on and off the Web, Athena and its technologies will have a substantial impact on searchers trying to learn such topics.

Athena enables "search as learning" using a data structure referred to as a Learning Flow Graph (LFG). An LFG comprises nodes that represent sub-topics (e.g., concepts) within a given domain and vertices that represent relations between sub-topics (e.g., one sub-topic being foundational to understand another). Athena leverages LFGs to model the different factors mentioned above. It uses probability distributions across nodes in an LFG to model: (1) a user's knowledge state, (2) the potential knowledge gains from an information item, and (3) the prerequisite knowledge required for a user to successfully engage with an information item. The Athena team will develop algorithms for generating LFGs from structured and semi- and unstructured resources (e.g., course syllabi, tables of contents, book indices, knowledge bases, query logs), algorithms for integrating LFGs into search and question-answering models, and algorithms for re-estimating LFGs and a user's knowledge state based on search behaviors (e.g., queries, clicks, skips, dwell times, etc.). Structuring textual data to find the optimal learning paths through it is of great interest, though most existing work has focused on extracting information to fill slots in a "knowledge base," a much finer grained task. The LFG representation also provides a type of explanation of a larger topic, connecting to the broad interest in explainable systems. The Athena work will extend the state of the art in text representation, neural approaches including attention techniques, query and topic modeling, contextual text summarization, and understanding human approaches to complex search activities.

Highlights

On May 4, 2023, the UMass Amherst CIIR and University of North Carolina Chapel Hill Athena project collaborators organized the team's first Learning and Language Technologies Symposium held at UMass Amherst. Faculty and doctoral students presented research that addresses how computing can be used to help people learn and how human language can be modeled or mined for those and related challenges. The symposium had two parts: (1) a poster session where undergraduate students met doctoral students to learn about how and why they chose to be researchers in these areas, and (2) a series of short talks where graduate students (and undergraduates) and faculty heard from UMass Amherst Manning College of Information and Computer Sciences (CICS) and UNC Chapel Hill faculty on their research in this exciting field. The event talks/poster session were attended by 50-60 students.

A second NSF-funded Project Athena symposium was held on April 25, 2024 at UNC Chapel Hill. UNC collaborators Dr. Jaime Arguello and Dr. Rob Capra hosted the event. The Symposium on Information and Interaction Learning featured six research lectures, including those by the CIIR's James Allan, Negin Rahimi, and Hamed Zamani.

Publications
IR-1296: Zamani, H. and Zhu, Y., "Predicting Prerequisite Relations for Unseen Concepts," in the Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi, Dubai, December 7-12, 2022, pp. 8542-8548.

IR-1299: Salemi, A., Altmayer Pizzorno, J. and Zamani, H., "A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering," in Proceeding of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), July 23--27, 2023, Taipei, Taiwan, pp. 110-120.

IR-1303: Salemi, A., Rafiee, M. and Zamani, H., "Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering," in Proceedings of The 13th (9th ACM SIGIR) International Conference on the Theory of Information Retrieval (ICTIR 2023), Taipei, Taiwan, July 23, 2023, pp. 169-176.

IR-1310: Yu, P., Rahimi, N., Huang, Z. and Allan, J., "Search Result Diversification Using Query Aspects as Bottlenecks," in the Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23), October 21–25, 2023, Birmingham, United Kingdom, pp. 3040-3051.

IR-1314: Jafari, N. and Allan, J., "Target Span Detection for Implicit Harmful Content," in the Proceedings of the 2024 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2024), Washington, D.C., July 13, 2024.

IR-1322: Zhu, Y. and Zamani, H., "ICXML: An In-Context Learning Framework for Zero-Shot Extreme Multi-Label Classification," in the Findings of the Association for Computational Linguistics: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, June 16-21, 2024, pp. 2086-2098.

IR-1327: Kim, Y., "Extracting token-level semantic matching in text-pair classification tasks," Ph.D. Dissertation, University of Massachusetts Amherst, 2024.

IR-1333: Yu, P., "Leveraging Explanations for Information Retrieval Systems Under Data Scarcity," Ph.D. Dissertation, University of Massachusetts Amherst, 2024

IR-1345: Samarinas, C., Krubner, A., Salemi, A., Kim, Y. and Zamani, H., "Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation," in the Findings of The 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, July 27–August 1, 2025, pp. 13468-13482

Point of Contact: James Allan - allan@cs.umass.edu
Center for Intelligent Information Retrieval (CIIR)
Manning College of Information and Computer Sciences
140 Governors Drive
University of Massachusetts Amherst
Amherst, MA 01003-9264

This material is based upon work supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation under Grant No. 2106282 and 2106334 (UNC). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.