NOTE: If you are hoping to take the honors colloquium part of 446 -- aka H446 -- you must fill out the override form available on the college's override page ("override froms for COMPSCI, INFO, and CICS courses"). Note that the Fall 2025 form will not be available until April 25, 2025.
CMPSCI 446 is an undergraduate-level course in search engines and in Information Retrieval, the the science and engineering of indexing, organizing, searching, and making sense of unstructured or mostly unstructured information, particularly text. The class provides an overview of the important issues in information retrieval, and how those issues affect the design and implementation of search engines. The course emphasizes the technology used in Web search engines, and the information retrieval theories and concepts that underlie all search applications. Mathematical experience (as provided by CMPSCI 240) is required. You should also be able to program in Python (other languages may be accepted, though the course is in part designed on the expectation of Python).
For Fall 2025, cs446 is using the Canvas learning management system (as of early April 20245 the course site does not yet exist in Canvas). This page that you are currently reading provides some helpful information before Canvas is open (probably a few days before class starts) but once class starts, Canvas will be the only place where you can be certain you have the up-to-date status.
The course will meet for two lectures a week: Tuesday and Thursday mornings, 8:30-9:45am (sorry), in Thompson 102. I have requested that the class switch to the new CICS building when it is ready, but cannot guarantee if or when that will happen.
The following text is required for this course.
Your grade in this class will be based upon homeworks, in-class exercises, projects, an in-person midterm exam, and an in-person final exam.
You may discuss the ideas behind assignments with others. You may ask others for help understanding class and search engine concepts. You may study with friends. However...
The work that you do and submit must be your own. It may not be copied from the web, from another student in the class, or from anyone else. If you stumble upon and use a solution from the textbook or from class, you are expected to acknowledge the source of the work (for example, "// The following way of solving this problem is on page 215 of the textbook"). In such a case, you should include a brief discussion of how the solution works to indicate that you have some understanding of what is provided.
LLMs and ChatGPT and friends. What is the implication for using large language models and so on? You have to cite your sources, so you need to cite the LLM/GPT system you used and how the resulting answer is appropriate (as with any source), including the prompt you used to find the information. You are strongly discouraged from using those sources to find direct answers to your questions because you will be less likely to learn the core ideas and thus less likely to do well on the exams which are a substantial part of your course grade.
Your effort on exams (midterms or final) must be your own. Cheating of any type on exams will result in your receiving an F in the class, with no exceptions. Cheating includes enabling dishonesty by, for example, allowing someone to copy your work.
Your homework submissions must be your own work and not in collaboration with anyone. Though, as listed above you are welcome to talk with others about key concepts.
Your in-class exercise work will primarily be group work, usually with a single submission from your group.
Your project work must be your own work and not a copy of someone else's work, nor done in collaboration with anyone. That includes not using similar assignments from other offerings of similar classes at UMass or elsewhere. (Yes, nearly every IR course includes assignments similar to the ones in this class, so it would be easy to copy. However, you will not learn the material and thus be less well prepared for exams if you do not do your own work.) You also should not use LLMs/GPTs that support coding for anything other than simple parts of the assignment (e.g., getting the code for a QuickSort algorithm or something), in which case you should make it clear in your code where that code came from. You must take care that others are not able to make a copy of your work.