Topical Positioning System (TPS) for Informed Reading of Web Pages

Principal Investigator:
James Allan, PI
allan@cs.umass.edu

Center for Intelligent Information Retrieval (CIIR)
College of Information and Computer Sciences
140 Governors Drive
University of Massachusetts Amherst
Amherst, MA 01003-9264

Project Summary

The Issels Treatment is a “comprehensive immunotherapy for cancer.” It is based on the idea that the body’s own immune system can be supercharged to get rid of cancer cells. A professionally created web site is quite convincing in its presentation of evidence and success stories, and many people reading the site would be ready to give the technique a try. There’s just one problem: there is no evidence whatsoever that the approach works. The American Cancer Society considers it ineffective and it is listed as a “dubious treatment” by Quackwatch, a well-known recorder of health-care fraud and myths. Nonetheless, an unsuspecting reader is likely to be misled. This project seeks to counter that situation by the development of a Topical Positioning System (TPS) which will provide the reader a topical “lay of the land” for web pages, showing how a page fits into its broader topic (or topics) to encourage readers to consider its content more carefully.

This project thus addresses the challenge of increasing the critical literacy of people looking for information on the Web, information regarding healthcare, policy, or any other broadly discussed topic. Search engines have made it easy for people to find useful information; however, those search engines have also made it equally easy for people to stumble upon advice and comments that are of dubious or even negative value. This project’s goal is to develop technologies to assist people in critical evaluation of the material they find, and to help them understand why a page is educative or why it is not. It is driven by a vision of a “TPS” tool that shows a person whether the web page in front of them discusses a provocative topic, whether the material is presented in a heavily biased way, whether it represents an outlier (fringe) idea, and how its discussion of issues relates to the broader context and to information presented in authoritative sources. The motivating goal is ultimately not to present the person with a filter or grade for each web page, but to help them recognize that there is a larger discussion and to critically evaluate how the page in question is positioned in that discussion.

This research applies and extends text analysis and comparison techniques to this problem. It uses statistical language modeling, topic modeling, machine learning, and link analysis techniques to represent Web pages and clusters of Web pages. It requires both off-line pre-processing to organize web-scale collections and on-line, query-time fine-tuning of the organization for presentation in a TPS browser add-on. The research builds upon earlier work measuring whether a document’s topic is highly subjective and/or controversial. It will extend topic modeling and clustering approaches to present the relationship between a document and its topic. It forms the foundation of new approaches for conveying this found information to users, with the goal of developing their own critical literacy skills, enabling them to make careful and appropriate decisions in their future. It will create new approaches for conveying this found TPS information to the user, with the goal of developing their own critical literacy skills for healthcare, politics, technology, or most topics discussed on the web, enabling them to make careful and appropriate decisions in their future investigations.

Publications:

IR-925: (2013) Aktolga, E. and Allan, J., "Sentiment Diversification With Different Biases," Proceedings of the 36th Annual ACM SIGIR Conference (SIGIR 2013), July 28-August 1, 2013, Dublin, Ireland, pp. 593-602.

IR-953: (2013) Dori-Hacohen, S. and Allan, J., "Detecting Controversy on the Web," Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), San Francisco, CA, Oct. 27-Nov. 1, 2013, pp. 1845-1848.

IR-959: (2013) Feild, H., "Exploring Privacy and Personalization in Information Retrieval Applications," Ph.D. Thesis, University of Massachusetts Amherst, 2013

IR-1006: (2015) Dori-Hacohen, S. and Allan, J., "Automated Controversy Detection on the Web," Proceedings of the 37th European Conference on Information Retrieval (ECIR 2015), Vienna, Austria, March 29 - April 2, 2015, pp 423-434.

IR-1027: (2015) Dori-Hacohen, S., Yom-Tov, E. and Allan, J., "Navigating Controversy as a Complex Search Task," in Proceedings of the first international workshop on Supporting Complex Search Tasks, volume 1338 of CEUR Workshop Proceedings. CEUR-WS.org, 2015.

IR-1034: (2015) Dori-Hacohen, S., "Controversy Detection and Stance Analysis," Proceedings of the Doctoral Consortium of the 38th Annual ACM SIGIR Conference (SIGIR 2015). Santiago, Chile, August 9-13, 2015, p. 1057.

IR-1061: (2016) Jang, M. and Allan, J., "Improving Automated Controversy Detection on the Web," Proceedings of The International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16), Pisa, Italy, July 17-21, 2016, pp. 865-868.

IR-1064: (2016) Dori-Hacohen, S., Jensen, D. and Allan, J., "Controversy Detection in Wikipedia Using Collective Classification," Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), Pisa, July 18-20, 2016, pp. 797-800.

This work is supported in part by the Center for Intelligent Information Retrieval (CIIR) and in part by the National Science Foundation (NSF IIS-1217281).