Research Page
Contents By Year
-
2009
- Investigating Searcher Frustration (Spring--Summer)
- Popularity Ranking using Query Logs (Spring--Summer)
- Detecting Task Boundaries in Domain-Specific Query Logs (Fall)
- Spell Correction using Domain-Specific Query Logs (Fall)
- Query Log Mining (Summer)
- Retrieving Document "Hot Spots" (Fall '07 -- Spring '08)
- Testing of Text Retrieval Test Collection Generation Algorithms (Summer)
- Qauality Assessment using Language Processing (QALP) (Fall '04 -- Spring '07)
- Slicing Research (Summer)
Investigating Searcher Frustration
Spring--Summer 2009-
More to come...
(Top of page)
Popularity Ranking using Query Logs
Spring--Summer 2009-
This is joint work with Bob Armstrong,
James Allan, and
Jeff Dalton. We are investigating
learning-to-rank techniques for use with the medical publication search company
that provides us with query logs.
The high-level idea is that we would like to incorporate a document's click-though rate
for a specific query into where that document is ranked in the results list. Ideally,
the most popular documents will be placed closer to the top of the list.
(Top of page)
Detecting Task Boundaries and Tasks in Query Logs
Fall 2008-
As part of an independent study, I implemented the task boundary detection work
described by (Jones and Klinkner, 2007) for use on the medical publication search data
used in my previous projects (see spelling correction
and query log mining). Jones and Klinkner's work
uses logistic regression to learn 1) when any two queries from the same search session are
part of the same task and 2) when there is a boundary between two consecutive queries.
Our study focused just on the first one.
In the context of this work, a task can be defined as a information goal or as an information mission. Goals contain one or more queries from the same search session, while missions contain one or more goals. However, a goal is not allowed to be in more than one mission.
In our study, we asked graduate students to label pairs of queries as belonging to the same goal and mission. We then used a maximum entropy classifier and ten-fold cross validation to learn the labels. After normalizing the features, we attained classification accuracies comparable to those used by Jones and Klinkner on Yahoo! data.
Going beyond previous work, we also used clustering techniques to create goal and mission groups using the pair-wise classifications from above. We then attempted to use these to detect when the user was frustrated (this makes the unfounded assumption that frustration is contained within a task).
This was interesting work, but a little ahead of its time with respect to the frustration detection. I plan to revisit this work in the future, once I've investigated more closely what searcher frustration is and how it can be automatically detected.
References
-
R. Jones and K. L. Klinkner. Beyond the
session timeout: automatic hierarchical segmentation
of search topics in query logs. Proceedings of CIKM
2008, 2008.
(Top of page)
Spell Correction using Medical Domain Query Logs
Fall 2008-
Building off of the exploration from the query log
mining work, Xing Yi and I
created a novel spelling correction algorithm that uses same-session reformulation
information---similar to (Jones et a., 2006)---as well as language models---similar
to (Cucerzan and Brill, 2004)---and trusted domain-specific lexicons.
References
-
S. Cucerzan and E. Brill. Spelling correction as an iterative
process that exploits the collective knowledge of web users.
In EMNLP, pages 293-300, 2004.
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW, pages 387-396, 2006.
(Top of page)
Query Log Mining
Summer 2008-
This line of work was more just implementation and data exploration. The implementation part
had to do with writing code to use with TupleFlow, a kind of MapReduce framework developed
by Trevor Strohman (now at Google).
Marc Cartright,
Elif Aktolga,
and I wrote a Java package that uses TupleFlow to mine query logs.
The query logs were from a medical document search company. We explored query reformulation,
click-through data, and query similarity.
(Top of page)
Retrieving Document "Hot Spots" (a.k.a. Passage Retrieval)
Fall 2007 - Spring 2008-
O. de Krester and A. Moffat. "Effective Presentation with a Locality-Based
Similarity Heuristic". 22nd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, San Francisco, August
1999, 113-120.
[web]
Between Fall 2007 and Spring 2008, I worked towards developing a search algorithm to find the `hot spots' within a document relative to a given query.
The goal of this research is to return to a user a ranked list of text passages extracted from documents in a corpus. The passages are ranked according to the retrieval system's belief that a passage is relevant to the user's information need.
To do this, I created an algorithm to calculate the Query-Likelihood score for each document and then distributed that score across the document wherever a query term occurred. As can be seen below, the Query-Likelihood language model calculates the probability of a query, Q (which is made of of one or more terms), given a document, D. P(Q|D) is the product of the probability of each term, q, in the query given the document. This is done using a maximum likelihood estimation (MLE) of term frequencies in the document smoothed with a background model of term frequencies in the collection.
Now, think of a document as a giant array where the indices refer to term positions within the document and the value is a score associated with that position. My method computes each P(q|D) and then assigns a fair portion of that probability to each position in the document array where q occurs. So, if P(q|D) = .6 and q occurs 2 times, then both indices in the array that refer to the positions where q occurs in the document get a score of .3.
Once these scores are distributed, they are smoothed by placing a Gaussian kernel on top of each position in the array that has a non-zero score. All interference between kernel smoothings is additive, so that if two query terms are near by, once they are smoothed out, there should be several indices between the two that have a higher score than if only one of the query terms was present.
After smoothing, each group of contiguous indices with a score above a threshold are considered to be a 'hot spot'. I experimented with various automatic thresholding methods, including local (the threshold used for each document is different) and global (the threshold is constant over all documents for a particular query) methods. These generally involved setting the threshold to some percentage of the max score in the document or for all documents.
Initial results using the Text REtrieval Conference (TREC) Genomics 2006 corpus were not very promising. Because of these poor results and some higher priority projects, my work with locating document hot spots has been shelved for the time being.
References
(Top of page)
Testing of Text Retrieval Test Collection Generation Algorithms
Summer 2007During the Summer of 2007, I was a part of the the Summer Undergraduate Research Fellowship (SURF) program at the National Institute of Standards and Technology (NIST). I spent my time working for the Text REtrieval Conference (TREC) group testing the effectiveness of the Minimal Test Collection algorithm developed at the University of Massachusetts Amherst by Carterette, Allan, & Sitaraman. This was one of the two algorithms used to evaluate retrieval systems in the TREC Million Query Track in 2007 and 2008.
(Top of page)
Quality Assessment using Language Processing (QALP)
2004 - 2007The QALP team consists of:
Dr. David Binkley
Dr. Dawn Lawrie
Henry Feild
Dr. Chris Morrell (Statistical Consulting)
QALP Scores I
Winter-Spring 2005
Running off the idea that good code is accompanied by good comments, the QALP group put together a program to find the cosine similarity between code within a function and its comments. This similarity, referred to as a QALP Score, needed to be correlated to a quality rating for each function. To find a quality rating, a sample of functions were assembled into a survey, which was then given to undergraduate students. The functions in the survey were stripped of all comments and the students were asked to asses the each function's quality using a variety of criteria.
Identifier Splitting
Summer 2005 - Summer 2006
While many identifiers make use of some sort of division marker (e.g. an underscore or camelCasing) to break up concepts within an identifier, many do not, or at least not completely. The QALP group's favorite example of this is thenewestone. This particular identifier should be the_newest_one. Imagine having to analyze source code where identifiers all looked like that and you would have to mentally 'split' them. Wouldn't it be easier if there was a program that would split them for you? We thought so, so we looked into two methods -- a 'greedy' algorithm and a neural network. Turns out the greedy algorithm works better.
Sources
Identifier Makeup Study
The QALP group was curious to see what type of identifiers were mostly easily understood by programmers. Consider variations on a variable that references the index of an array; you might see i, idx, or index. A survey of 12 functions was assembled and put up on the net which asked those who took it to explain the purpose of each function. For each of the 12 functions, there were three versions -- one with single-character abbreviated variables, one with multi-character abbreviated variables, and one with full, non-abbreviated variables. The user was randomly given one of the three versions for each function.
QALP Scores II
Summer 2006
In an attempt to refine the study conducted in the Winter and Spring of 2005, this version upgraded the QALP Scoring process, scoring source code on a per-class basis.
In addition, instead of using a subjective measure of quality with which to correlate the QALP Score, this time defect rates were used.
(Top of page)
Slicing Research
Summer 2004My first summer assisting with research consisted of me finding source code for use with Dr. Binkley's Amorphous Program Slicer. I spent my time familiarizing myself with Linux, running the program slicer on open source programs, and then graphing results.
(Top of page)