About this course
Information Retrieval (IR) is the study of methods for capturing,
representing, storing, organizing, and retrieving unstructured or loosely
structured information. Its most well known aspect is also known as
document retrieval: the process of indexing and retrieving text documents.
However, the field of Information Retrieval includes almost any type of
unstructured or semi-structured data, including newswire stories, transcribed
speech, email, blogs, images, or even video. When the data consists of
material found on the Web, Information Retrieval is a critical aspect
of Web search engines.
CMPSCI 646 is a graduate-level class in
Information Retrieval. It covers the basic ideas of IR to provide the
student with an intuition for how search engines work, why they're successful,
and to some degree how they fail. The course touches on popular and
important approaches to the problem, providing both historical context as well
as state-of-the-art results.
The classwork is grounded in a semester-long
project. In this project, students will start with a provided collection
of documents, will convert them into a form suitable for indexing, will index
them using an available search engine, and will carry out an effectiveness
evaluation. The capstone of the project will be a student-selected
(professor-approved) additional IR exercise with the same
data.