You will select, design, carry-out, and present a project.

You may choose a project from some possibilities that will be suggested.  However, you are welcome to come up with your own idea if there is a topic of interest to you.  Projects will typically involve programming, but you may propose a project that does not.  All projects must be approved by the professor.

You are encouraged to propose a group project, though the overall effort should reflect the number of people in the group.

Phase I (October, due November 16)

  • Select a project and get it approved by 5:00pm on Monday, November 16
    • Send a 1/2 to full page description of the project to the professor, preferably by email, though paper is fine.  This description should list all students participating in the project and then sketch the key ideas of the project.  It should list any specifics about the project, whether or not they are resolved.  For example, if you need a test collection, what data, what queries, and what relevance judgments will you use and do you have them?  Do you need access to another program, data collection, or resource?  If so, what and do you have it?  On what computer system(s) will you do the work?
  • Present a sketch of the project to the class (2-10 minutes, depending on number of projects).  This will happen on Tuesday, November 17.
    • If you want to use slides for your presentation, please send them to the professor by 7am on Tuesday the 17th.  If you use something other than PowerPoint, it may be tricky, but we'll do our best.  Please bring a backup presentation mechanism (e.g., your own laptop) just in case.  Preferably these will all be on one laptop to minimize changes.

Phase II (November and December)

  • Do project
  • Present project and results to class (10-30 minutes, depending on number of projects).  These are tentatively scheduled for December 8 and 10.
  • Write up project (technically part of, so concurrent with, your final exam).
    • A draft writeup is due a week earlier (December 3rd).  This will almost certainly be incomplete, but it should show the general outline of the final report.

Project suggestions

You may suggest almost any project that you like, provided it touches on Information Retrieval as covered in this class.  Note, though, that your project must be approved, so you may not be able to actually do almost any project that you like.

Here are a few sample project ideas that you could consider.  Feel free to add to these, adjust them in subtle ways, twist them in impressive ways, merge them together, or come up with somethign just inspired by one, some, or all of these.

Some projects may require that a system be implemented.  In many cases, open source starting points may already exist.  You're welcome to use one, provided you make it clear that you did so, and provided you realize that implies more work on other parts of the project.  If you looked at all or part of another system to help you make substantial design decisions, you must remember to cite the sources of your inspiration.

  • Signature files.  Implement a signature file retrieval system for one (or more) of the collections used in P1.  Explore the tradeoffs involved in indexing the data (e.g., the size of the signature vector, the types of hash functions).  Explore how the impact on retrieval effectiveness.
  • Term discrimination model.  Implement a term discrimination model.  Evaluate the use of that feature in another IR model: can you add it to a probabilistic model or to a vector space model?  How does it interact with IDF?   Explore its impact on retrieval effectiveness.  
  • LM smoothing approaches.  Try implementing numerous smoothing techniques for a language modeling approach.  The lecture on text statistics talks about several, but you can probably find more.  Do the different techniques make a difference in terms of efficiency and/or effectiveness?  
  • IR experiments building on P2.  For programming assignment P2 you will be carrying out an IR experiment using Indri, Galago, Lucene, or another open source search engine.  The project will require that you index a collection, run some baseline retrieval experiments, modify the system in some way, and run new experiments.  (P2's description is pending.)  You could extend that work as part of a project.
  • Term paper.   For this type of project, you will pick a topic, read up on it, and write a paper of at least [tentatively 20] pages on the topic.  You must read serious research papers, not just wikipedia articles and fluff pieces from newspapers or trade magazines.  You will need to assimilate information on the topic and, to the exent possible, present it in relationship to what has been presented in class and/or to other topics covered in class.  
    • [Some sample topics pending]