cs446, Fall 2024

cs446, search engines
James Allan
Fall 2024

CMPSCI 446 is an undergraduate-level course in search engines and in Information Retrieval, the the science and engineering of indexing, organizing, searching, and making sense of unstructured or mostly unstructured information, particularly text. The class provides an overview of the important issues in information retrieval, and how those issues affect the design and implementation of search engines. The course emphasizes the technology used in Web search engines, and the information retrieval theories and concepts that underlie all search applications. Mathematical experience (as provided by CMPSCI 240) is required. You should also be able to program in Java or Python (other languages may be accepted).

For Fall 2024, cs446 is using the Canvas learning management system (as of March 2024, the course site does not yet exist). This page that you are currently reading provides some helpful information before Canvas is open (probably a few days before class starts, but Canvas is new to me!) but once class starts, Canvas will be the only place where you can be certain you have the up-to-date status.

Topics

This class should cover the following topics -- i.e., the chapters of the text. Some topics may be omitted and the order may shift depending in part on student needs and interests:

Meeting times

The course will meet for two lectures a week: Tuesday and Thursday afternoons, 4:00-5:15pm, in a room that is not yet decided.

Textbook

The following text is required for this course.

You can download a free copy of the book if you prefer. It includes corrections of minor errors that were discovered after the book was published, but is otherwise the same as the original.

You may find the following textbook useful for additional presentations of some of the course topics.

Assignments and exams

Your grade in this class will be based upon homeworks, in-class exercises, projects, an in-person midterm exam, and an in-person final exam.

Collaboration and help

You may discuss the ideas behind assignments with others. You may ask others for help understanding class and search engine concepts. You may study with friends. However...

The work that you do and submit must be your own. It may not be copied from the web, from another student in the class, or from anyone else. If you stumble upon and use a solution from the textbook or from class, you are expected to acknowledge the source of the work (for example, "// The following way of solving this problem is on page 215 of the textbook"). In such a case, you should include a brief discussion of how the solution works to indicate that you have some understanding of what is provided.

LLMs and ChatGPT and friends. What is the implication for using large language models and so on? You have to cite your sources, so you need to cite the LLM/GPT system you used and how the resulting answer is appropriate (as with any source), including the prompt you used to find the information. You are strongly discouraged from using those sources to find direct answers to your questions because you will be less likely to learn the core ideas and thus less likely to do well on the exams which are a substantial part of your course grade.

Your effort on exams (midterms or final) must be your own. Cheating on exams will result in your receiving an F in the class, with no exceptions.

Your homework submissions must be your own work and not in collaboration with anyone. Though, as listed above you are welcome to talk with others are key concepts.

Your in-class exercise work may be group or joint work. What you hand in should be your own work or the result of your group's discussion.

Your project work must be your own work and not a copy of someone else's work, nor done in collaboration with anyone. That includes not using similar assignments from other offerings of similar classes at UMass or elsewhere. It also includes the use of LLMs/GPTs for anything other than simple parts of the assignment (e.g., getting the code for a QuickSort algorithm or something). You must take care that others are not able to make a copy of your work.