CIIR Talk Series: Masaharu Yoshioka

Speaker: Masaharu Yoshioka, Hokkaido University (Speaker will be in person)

Talk Title: Information Extraction from the Domain Text
- Application of Legal Information Extraction and Chemical Reaction Process Information Extraction

Date: Friday, February 20, 2026 - 1:30 - 2:30 PM EST (North American Eastern Standard Time)

Abstract: Recent developments in natural language processing techniques, including generative AI, have enabled us to extract useful information from domain-specific texts. In this presentation, we introduce two ongoing research projects. The first is the Competition on Legal Information Extraction and Entailment (COLIEE) [1], which aims to support legal information processing tasks such as retrieving relevant statutes or court cases and performing entailment based on the retrieved results. The second project focuses on extracting chemical reaction process information from scientific papers [2,3].

The COLIEE project consists of four tasks, which are defined by the combination of two types of legal systems (statute law and case law) and two types of subtasks (retrieval and entailment). We have constructed benchmark datasets to evaluate system performance and have been organizing this competition since 2015. At the beginning of the project, the tasks were quite challenging for participants because legal information retrieval requires consideration of contextual information to determine the similarity between queries and target documents. However, the recent advancement of large language models (LLMs), which can effectively capture contextual and semantic relationships between terms, has significantly improved system performance. In this presentation, we introduce the COLIEE project along with examples of participants’ systems and their progress over the years.

For chemical reaction process information extraction, we have proposed an annotation framework to identify and structure reaction-related information from scientific text [2]. Using the annotated corpus, reaction process information can be extracted automatically with machine learning techniques. The extracted data are then converted into machine-actionable code for robotic synthesis systems. Although this conversion can be performed by a generative AI-based system, it is highly effective to visualize the automatically annotated results for reviewing and validating the conversion outputs [3].

[1,2,3] See presentation for citations. 

Bio:  Masaharu Yoshioka is a Professor of Faculty of Information Science and Technology, and Institute for Chemical Reaction Design and Discovery of Hokkaido University. He received the B.E. and M.E. degrees of precision engineering and the Ph.D. degree of precision machinery engineering from University of Tokyo, Japan, in 1991, 1993, and 1996, respectively. From April 1996 to March 2000, he was a Research Associate of National Center for Science and Information Systems, Japan. From April 2000 to May 2001, he was a Research Associate of National Institute of Informatics, Japan. From June 2001, he joined the Graduate School of Engineering as a Associate Professor and this school is reorganized as Graduate School of Information Science and Technology in 2004. From January 2019, he became a Professor of Faculty of Information Science and Technology. He also joined Institute for Chemical Reaction Design and Discovery from January 2020. From February 2025, he also serves as a visiting professor at Center for Juris-Informatics, in the Joint Support-Center for Data Science Research. His research interests include application of knowledge engineering technology for information access and knowledge management, Linked Open Data, and application of knowledge engineering technology for a particular research domain (e.g., cheminformatics and nanoinformatics).