Retrieval Performance Prediction
Retrieval performance prediction addresses the problem of automatically predicting the effectiveness of the search results in response to a use?¡¥s query. My initial attempt at addressing this task involved the development of the clarity score approach which was the earliest work in this area . Recently, I proposed another novel technique called the robustness score which measures how stable the ranking is in the presence of uncertainty in the top ranked documents. Both metrics demonstrate significant correlation with retrieval effectiveness on a variety of collections.Document Quality
The quality of document content, which is an issue that is usually ignored for the traditional ad hoc retrieval task, is a critical issue for Web search. Web pages have a huge variation in quality relative to, for example, newswire articles. To address this problem, I proposed a document quality language model approach that is incorporated into the basic query likelihood retrieval model in the form of a prior probability . The results demonstrate that, on average, the new model is significantly more effective better than the baseline. To further investigate the quality problem, I perform a detailed query analysis which provides some interesting insights on the limitations of the quality model and the relationship between document quality and relevance.Retrieval in Disruption-Tolerate-Network
In this work, I designed and evaluated a distributed information retrieval system that operates over a mobile network where a wireless infrastructure unavailable. Such networks are common in developing nations, disaster-stricken areas, and even in the rural areas of the technologically progressive countries. This poses a new challenge for distributed IR, which normally relies on a wired Internet or always-available wireless coverage among mobile peers. In our mobile system, queries are propagated among peers only as they intermittently are in wireless range of one another. For each query received, peers retrieve top-ranked documents from their local collection and send them to the source of the query. Intermediate peers on the path to the source have to manage a finite buffer filled with documents from multiple collections and multiple queries. When too many documents are in the system, the intermediate peers must drop documents that are either unlikely to be relevant or for which a successful path to the destination is unlikely. To enable such a system, I proposed a score normalization technique that works across queries and across multiple collections. The results show that this method returns more relevant documents in the mobile network than existing normalization methods, which are not intended for multiple queries.IR-636: (2007) Yun Zhou and Croft, W. B. , "Weighted Information Gain and User Clicks on Web Search Results," CIIR Technical Report. IR-635: (2007) Yun Zhou, "Retrieval Performance Prediction and Document Quality.," Ph.D. Dissertation. IR-573: (2007) Yun Zhou. and Croft, W. B. , "Query Performance Prediction in Web Search Environments ," in the Proceedings of the 30th Annual International ACM SIGIR Conference (SIGIR 07), pp. 543-550. IR-532: (2006) Yun Zhou. and Croft, W. B. , "Ranking Robustness: A Novel Framework to Predict Query Performance," in the Proceedings of the ACM 15th Conference on Information and Knowledge Management (CIKM 2006), pp. 567-574. IR-480: (2006) Metzler, D., Strohman, T., Yun Zhou. and Croft, W. B. , "Indri at TREC 2005: Terabyte Track," the Online Proceedings of 2005 Text REtrieval Conference (TREC 2005). For an earlier version, see IR-449. IR-449: (2005) Metzler, D., Strohman, T., Yun Zhou. and Croft, W. B. , "Indri at TREC 2005: Terabyte Track (Notebook Version)," the TREC 2005 Notebook, pp. 175-180. IR-432: (2005) Yun Zhou. and Croft, W. B. , "Document Quality Models for Web Ad Hoc Retrieval," the Proceedings of CIKM 2005, pp. 331-334. A longer version is also available. IR-412: (2005) Yun Zhou., Levine, B. and Croft, W. B. , "Distributed Information Retrieval For Disruption-Tolerant Mobile Networks," CIIR Technical Report. IR-375: (2004) Cronen-Townsend, S., Yun Zhou. and Croft, W. B. , "A Framework for Selective Query Expansion," Proceedings of CIKM '04, pp.236-237. IR-371: (2006) Cronen-Townsend, S., Yun Zhou. and Croft, W. B. , "Precision Prediction Based on Ranked List Coherence,"Journal Information Retrieval, Volume 9, Number 6 / December, 2006, pp.723-755. No electronic copy available. IR-367: (2004) Yun Zhou, Croft, W. B. and Levine, B., "Content-based search in peer-to-peer networks ," CIIR Technical Report. IR-338: (2004) Cronen-Townsend, S., Yun Zhou and Croft, W. B. , "A Language Modeling Framework for Selective Query Expansion.," CIIR Technical Report. IR-250: (2002) Cronen-Townsend, S., Yun Zhou and Croft, W. B. , "Predicting Query Performance," Proceedings of SIGIR 2002, pp. 299-306.Download my papers
I accepted an software engineer position at Google