Advanced Topics In Information Retrieval
CMPSCI 791H, Language Models (Advanced Topics in Information Retrieval)
Fall 2002
NOTE THE DATES SO YOU GET THE CORRECT WEEK
For December 9 (last meeting this semester)
The first paper is 15 pages long and second is 30 pages long, though is mostly pictures. Read the first paper and then the second as you can.
- Pinar Duygulu, Kobus Barnard, Nando de Freitas, and David Forsyth " Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," Seventh European Conference on Computer Vision, pp IV:97-112, 2002 (Awarded best paper in cognitive computer vision). Web page with information
- Kobus Barnard, Pinar Duygulu, Nando de Freitas, David Forsyth, David Blei, and Michael I. Jordan, "Matching Words and Pictures," Journal of Machine Learning Research, in press. Web page with information, pdf
For November 25
- "Maximum Entropy Markov Models for Information Extraction and Segmentation". Andrew McCallum, Dayne Freitag and Fernando Pereira. ps.gz ps
- "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data". John Lafferty, Andrew McCallum and Fernando Pereira. ps.gz ps
For November 18
- "A Maximum Entropy Approach to Natural Language Processing". Adam Berger, Stephen Della Pietra, Vincent Della Pietra. (Just first 11 pages) ps
- "Using Maximum Entropy for Text Classification". Kamal Nigam, John Lafferty, Andrew McCallum. ps.gz
- "A comparison of algorithms for maximum entropy parameter estimation". Robert Malouf. ps.gz
For November 4
- "Latent Dirichlet allocation". D. M. Blei, A. Y. Ng, and M. I. Jordan. Technical Report UCB//CSD-02-1194. This paper is available in two forms: read them both. Note that this paper may be particularly challenging, but do your best. PS.gz, 30 pages and PS.gz, 8 pages
- There is no additional paper this week; these will be challenging enough.
For October 28
- "Probabilistic Models of Text and Link Structure for Hypertext Classification", L. Getoor, E. Segal, B. Taskar, D. Koller. IJCAI01 Workshop on Text Learning: Beyond Supervision, Seattle, Washington, August 2001. PS That version seems to print badly (the margins are messed up). Here is a version in PDF that has the margins stripped off. Make sure when you print it you select "center page" or whatever the option is. PDF without margins
- Taskar, B., E. Segal and D. Koller. (2002). Discriminative Probabilistic Models for Relational Data, Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02), Edmonton, Canada. online
For October 21
- Djoerd Hiemstra, "Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval:The Importance of a Query Term ." In SIGIR 2002, pp 35-41 PDF
- John Canny, "Collaborative Filtering with Privacy via Factor Analysis", In SIGIR 2002, pp 238-245 PDF
For October 16 (Wednesday, but Monday class schedule)
- Si and Callan, "Using sampled data and regression to merge search engine results." In SIGIR 2002, pp 19-26. PS
- Si, Jin, Callan, and Ogilvie, "Language modeling framework for resource selection and results merging." To appear in CIKM 2002. PS
For October 14
- October 14th is a holiday. Class will meet on the 16th,which is a Wednesday, but a Monday class schedule.
For October 7
- Bennett, Dumais, and Horvitz. "Probabilistic combination of text classifiers using reliability indicators: models and results." In SIGIR 2002, pp 207-214. PDF
- Federico and Bertoldi. "Statistical cross-language information retrieval using N-best query translations." In SIGIR 2002, pp. 167-174. Get the PDF via the ACM portal
For September 30
- John Lafferty and Chengxiang Zhai. "Probabilistic relevance models based on document and query generation," In Proceedings of the Workshop on Language Modeling and Information Retrieval, Carnegie Mellon University, 2001, PS
(this is different from the document on the language modeling workshop page)
- S. Robertson, "On Bayesian models and event spaces in information retrieval." Presented at MF/IR Workshop of SIGIR 2002.PDF (older versions...PDF,[http://citeseer.nj.nec.com/529095.html][HTML]])
For September 23
- C.Zhai and J. Lafferty, "Two-Stage language models for information retrieval." Appears in SIGIR 2002, pp. 49-56. PS
- Y. Zhang, J. Callan, and T. Minka, "Novelty and redundancy detection in adaptive filtering." Appears in SIGIR 2002, pp. 81-88. PS
September 16
- Title Language Model for Information Retrieval by R. Jin, A.G. Hauptmann, and C. Zhai, Carnegie Mellon University. Appears in SIGIR 2002, pp. 42-47. PDF
- W. Kraaij, T. Westerveld, and D. Hiemstra, " The Importance of prior probabilities for entry page search ." SIGIR 2002, pp. 27-34. [[http://wwwhome.cs.utwente.nl/%7Ewesterve/sigirEPpriors.html][HTML]
Papers under consideration. Feel free to add suggestions, providing a link to the paper and preferably some thoughts about why you think it'd be interesting.
- "Expectation-propagation for the generative aspect model", John Lafferty and Thomas Minka. Uncertainty in Artificial Intelligence (UAI), 2002 PS
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar, "Rank Aggregation Methods for the Web." World Wide Web 2001. HTML
- Stephen Robertson and Djoerd Hiemstra,"Language Models and Probability of Relevance," In Proceedings of the Workshop on Language Modeling and Information Retrieval, Carnegie Mellon University, 2001,PDF
- Fernando Pereira. Formal Grammar and Information Theory: Together Again?. Philosophical Transactions of the Royal Society, 358(1769):1239-1253, April 2000.PDF
- Christopher J. C. Burges, (1998). A Tutorial on Support Vector Machines for Pattern Recognition CiteSeerLink
- "Optimal Mixture Models in IR", Lavrenko, V., in the Proceedings of the 24'th European Colloquium on IR Research (ECIR'02), Glasgow, Scotland, March 25-27, 2002. PDF
- A. Berger and J. Lafferty. Information Retrieval as Statistical Translation. In Proceedings of SIGIR-99, Berkeley, CA, August 1999. CiteSeerLink
James's page (now hopelessly out of date)
--
EricGalis - 07 Feb 2003
to top