Homeworks will be posted here. There will be 3 homeworks [H1, H2, H3] and 3 programming assignments [P1, P2, P3] during the semester.  Note that the class project is different than these assignments.

Homework one (H1)

A smattering of questions related to things we've talked about so far.  This assignment is due before class on Thursday, October 8th.  Here is the solution [pdf].

Homework two (H2)

Another smattering of questions.  This assignment is due before class on Tuesday, November 2.

Clarifications:
    • In Question B you are asked to propose a way that you can stop term-at-a-time processing early but still be sure that you have the top k documents found (but not necessarily ranked correctly).  You may and probably should assume that the query processing is as discussed in class--i.e., that you're traversing an inverted list and adding its portion of a document's score to a set of accumulators. 

Homework three (H3)

Programming assignment one (P1)

In this assignment, you extract some statistical information about word occurrences from several corpora of text.  It is due on Wednesday, September 30, at 8:00pm.

Programming assignment two (P2)

This assignment requires running an IR experiment.  It is due on Tuesday, December 1st, at 8:00pm.

Programming assignment three (P3)

This project is expected to related to large-scale data processing.