About this course 

Information Retrieval (IR) is the study of methods for capturing, representing, storing, organizing, and retrieving unstructured or loosely structured information.  Its most well known aspect is also known as document retrieval: the process of indexing and retrieving text documents.  However, the field of Information Retrieval includes almost any type of unstructured or semi-structured data, including newswire stories, transcribed speech, email, blogs, images, or even video.  When the data consists of material found on the Web, Information Retrieval is a critical aspect of Web search engines.

CMPSCI 646 is a graduate-level class in Information Retrieval.  It covers the basic ideas of IR to provide the student with an intuition for how search engines work, why they're successful, and to some degree how they fail.  The course touches on popular and important approaches to the problem, providing both historical context as well as state-of-the-art results.

The classwork is grounded in a semester-long project.  In this project, students will start with a provided collection of documents, will convert them into a form suitable for indexing, will index them using an available search engine, and will carry out an effectiveness evaluation.  The capstone of the project will be a student-selected (professor-approved) additional IR exercise with the same data.