Athena: Learning-oriented Search with Personalized Learning Flows

A Collaborative Project with the University of Massachusetts Amherst
and the University of North Carolina at Chapel Hill

University of Massachusetts Amherst:
James Allan, PI
Razieh Negin Rahimi and Hamed Zamani, co-PIs

University of North Carolina at Chapel Hill:
Jaime Arguello, PI
Robert Capra, co-PI

Project Award Information

NSF Award Number: 2106282
Award Title: Athena: Learning-oriented Search with Personalized Learning Flows
Duration: 10/01/2021 - 09/30/2024

Project Abstract

The Athena project will develop technology called "search as learning," a set of search technologies that encourage and support learning rather than just simple document finding. In order to learn, searchers must engage with information that is both novel and understandable. Therefore, at the core, Athena will support learning by modeling several important factors: (1) the knowledge connections between documents covering a topic, (2) a user's current state of knowledge on that topic, (3) the types of knowledge a user is likely to gain from a document, and (4) the knowledge required for a user to successfully engage with a document. The Athena project will involve two types of end-to-end systems, both of which will model and leverage the learner's state of knowledge (LSK): an LSK-aware search engine and an LSK-aware question answering system. The Athena systems will guide a user through a topic and find relevant information in the context of previously encountered information and the topic structure captured in a web of topics. The team will evaluate Athena using standard measures as well as a series of studies involving human subjects. If the Athena project is successful, it will make it easier for people to use search engines and related technologies to learn about complex topics, where there are numerous interrelated and dependent subtopics that should be considered. Given that search is among the most common online activities on and off the Web, Athena and its technologies will have a substantial impact on searchers trying to learn such topics.

Athena enables "search as learning" using a data structure referred to as a Learning Flow Graph (LFG). An LFG comprises nodes that represent sub-topics (e.g., concepts) within a given domain and vertices that represent relations between sub-topics (e.g., one sub-topic being foundational to understand another). Athena leverages LFGs to model the different factors mentioned above. It uses probability distributions across nodes in an LFG to model: (1) a user's knowledge state, (2) the potential knowledge gains from an information item, and (3) the prerequisite knowledge required for a user to successfully engage with an information item. The Athena team will develop algorithms for generating LFGs from structured and semi- and unstructured resources (e.g., course syllabi, tables of contents, book indices, knowledge bases, query logs), algorithms for integrating LFGs into search and question-answering models, and algorithms for re-estimating LFGs and a user's knowledge state based on search behaviors (e.g., queries, clicks, skips, dwell times, etc.). Structuring textual data to find the optimal learning paths through it is of great interest, though most existing work has focused on extracting information to fill slots in a "knowledge base," a much finer grained task. The LFG representation also provides a type of explanation of a larger topic, connecting to the broad interest in explainable systems. The Athena work will extend the state of the art in text representation, neural approaches including attention techniques, query and topic modeling, contextual text summarization, and understanding human approaches to complex search activities.

Publications
Forthcoming

Point of Contact: James Allan - allan@cs.umass.edu
Center for Intelligent Information Retrieval (CIIR)
College of Information and Computer Sciences
140 Governors Drive
University of Massachusetts Amherst
Amherst, MA 01003-9264

This material is based upon work supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation under Grant No. 2106282 and 2106334 (UNC). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.