Fernando Diaz diazf [at] yahoo-inc [dot] com currently at Yahoo! Montreal | |
Introduction | |
My primary research interest is information retrieval, the formal study of searching large collections of data for small bits of information. The most familiar instance of information retrieval is web search where users search a collection of webpages for one or a few relevant webpages. Information retrieval, however, goes beyond web search and includes topics such as cross-lingual retrieval, personalization, desktop search, and interactive retrieval. My research experience includes distributed information retrieval approaches to web search, interactive and faceted retrieval, mining of temporal patterns from news and query logs, cross-lingual information retrieval, graph-based retrieval methods, and exploiting information from multiple corpora. In my dissertation work, I studied the relationship between document clustering and document scoring for retrieval using methods from machine learning and statistics. As a result, I developed an algorithm for system self-assessment and self-tuning which significantly improves the performance of retrieval algorithms across a variety of corpora. At Yahoo, I study the incorporation of content from non-web corpora into web search. | |
Publications | |
Dissertation | F. Diaz, "Autocorrelation and Regularization of Query-Based Retrieval Scores," 2008. |
Journal | F. Diaz, "Regularizing Query-Based Retrieval Scores," Information Retrieval, December 2007. draft available here. |
R. Jones and F. Diaz, "Temporal profiles of queries," TOIS, July 2007. draft available here. | |
Conference | F. Diaz, "A Method for Transferring Retrieval Scores Between Collections with Non-Overlapping Vocabularies," SIGIR 2008 poster. |
F. Diaz, "Improving Relevance Feedback in Language Modeling Retrieval with Score Regularization," SIGIR 2008 poster. | |
F. Diaz, "Robustness of Score Regularization to Similarity Perturbation," SIGIR 2008 poster. | |
F. Diaz, "Performance prediction using spatial autocorrelation," SIGIR 2007. | |
F. Diaz and D. Metzler, "Pseudo-aligned multilingual corpora," IJCAI 2007. | |
F. Diaz and D. Metzler, "Improving the estimation of relevance models using large external corpora," SIGIR 2006. | |
F. Diaz, "Regularizing ad hoc retrieval scores," CIKM 2005. | |
D. Kelly, F. Diaz, N. J. Belkin, and J. Allan, "A user-centered approach to evaluating topic models.," ECIR 2004. | |
F. Diaz and R. Jones, "Using temporal profiles of queries for precision prediction," SIGIR 2004. | |
F. Diaz, "Using wearable computers to construct semantic representations of physical spaces.," ISWC 2002. | |
Unrefereed | D. Metzler, F. Diaz, T. Strohman, and W. B. Croft, "UMass at Robust 2005: Using mixtures of relevance models for query expansion," TREC 2005. |
F. Diaz and J. Allan, "When less is more: Relevance feedback falls short and term expansion succeeds at hard 2005," TREC 2005. | |
F. Diaz, M. D. Smucker, and J. Allan, "High precision retrieval via user interaction and metadata," University of Massachusetts Amherst, 2005. | |
F. Diaz and R. Jones, "Temporal profiles of queries," Yahoo! Research Labs, 2004. | |
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade, "UMass at TREC 2004: Novelty and HARD," TREC 2004. | |
F. Diaz and J. Allan, "Browsing-based user language models for information retrieval," University of Massachusetts Amherst, 2003. | |
Code available upon request | |
LSR: code for performing local score regularization and autocorrelation. | |
EE: code for performing external expansion. | |