Research Scientist Dr.-Ing. Laura Dietz
<lastname> at cs.umass.edu
Department of Computer Science
140 Governors Drive
Amherst MA 01003
PGP ID: 0xF5F7017F
I am a Research Scientist at the Center for Intelligent Information Retrieval (CIIR) working with Bruce Croft at University of Massachusetts. Before that I did a post-doc with Andrew McCallum. I graduated from Max Planck Insititute for Informatics in Saarbruecken, Germany in January 2011.
My research is placed in the intersection between Information Retrieval and Information Extraction, where I am striving towards a deep integration rather than a pipelined combination. My tool of choice are graphical models, often generative probabilistic models. This pattern underlies all the different facets of my research, where some are detailed in the following:
Together with Jeff Dalton, I am studying how to effectively leverage Knowledge Bases such as Wikipedia and Freebase in ad hoc document retrieval. In a first step, documents and queries are enriched with links to the knowledge base. During the retrieval stage, these links can be used as an additional vocabulary as well as in feedback-based query expansions. For instance entities that are linked from the query are expected to also be linked in relevant documents. However, we may compensate for errors in the entity linking stage by also considering terms from the entities’ article text, as well as name variants. An additional option are feedback methods, where documents retrieved in a preliminary pass are inspected for entity links to update the belief on which entities are relevant for the query. We also use the feedback documents to build an entity-context model to understand how each entity is related to the query.
This work is currently under submission.
Assuming the existence of a large corpus and a large general purpose knowledge base, we want to support a user to explore a question in terms three facets: entities, pertinent relationships and relevant text passages. We devise a solution that reasons about distributions over entities, relations, and documents in a unified manner. For instance, we can arrive at a prior distribution over entities by issueing a query against the knowledge base. The distribution over entities helps to identify relevant document passages. Applying Bayes-rule, we can update the distribution over entities, given retrieved document passages. This is formalized in a generative model, which includes factors comprising probabilistic retrieval models.
This work was presented at AKBC 2013.
Entity linking refers to a problem setting where the algorithm is given a string in a document and has to predict which Wikipedia entity it refers to. Our solution involved a retrieval model that incorporates the string itself, and surrounding entity mentions to predict entity candidates as a ranking. We show that this model is an approximation to state-of-the-art models which optimize a joint assignment of mentions to entities. This solution can be further refined with supervised re-rankers but also provides reasonable performance “out-of-the-box”.
The code is available as part of the KB-Bridge project.
In order to monitor a stream of news and social documents for stories involving one or more target entities. We tap on symmetric relationships in our Entity Linking approach both retrieve relevant documents (KB to text) and entity link them (text to KB) with the same underlying model. This requires to integrate low-level NLP algorithms into a retrieval framework.
We participate with this solution in TREC KBA 2012 and TREC KBA 2013. A paper on time-aware IR-based evaluation is published at TAIA 2013. The time-aware evaluation methods are used to analyze our KBA 2013 results with results presented at in our 2013 talk at TREC.
I further work on “senti-PRF”, a pseudo relevance feedback approach to optimize retrieval for opinionated questions. Published at CIKM 2013.
Relatedly, I am interested in “vague” Question Answering, such questions asking for opinions, advice, or research questions. Here I work both with general-purpose data sets and bio-medical question-answering.
I am still interested in unsupervised algorithms for identifying shared aspects and quantifying influence in social networks. Work on symmetric networks is published at ICWSM 2012 ( Code & Supplement ) and asymmetric networks at ICML 2007 (talk – Supplement).
My PhD thesis was mainly focused on topic models and other generative models for data with link structure.
Strepsirrhini, a modular composable toolkit in scala for retrieval, reranking, and expansion with and without entity annotations, Laura Dietz, 2014.
Riffle, open hardware and software for a water-quality sensor with data analysis software. Benjamin Gamari, Don Blair, Laura Dietz, 2014.
Stream-Eval, an evaluation framework for time-aware evaluation of cumulative citation recommendation systems. Laura Dietz, Jeffrey Dalton, Krizstian Balog, 2013.
KB-Bridge, a framework for entity linking. Jeffrey Dalton, Laura Dietz, 2013.
Hphoton and photon-tools - overview - walkthrough Open source hardware and software for single-molecule fluorescence analysis. Benjamin Gamari, Laura Dietz, Lori Goldner, 2013. (Received OSSI Award 2013 from UMass ICB3)
Bayes-Stack, a framework for inference on probabilistic graphical models. Laura Dietz, Benjamin Gamari, 2012.
Tikz-Bayesnet, open source latex add-on / TIKZ package for graphical model diagrams. Laura Dietz, 2010. (Forked and continued by Jaakko Luttinen, 2012).
Dalton, Jeffrey; Dietz, Laura; Allan, James: Entity Query Feature Expansion using Knowledge Base Links. In Proceedings of the 37th Annual International ACM SIGIR conference, Gold Coast, Queensland, Australia, July 6-11, 2014. .pdf – appendix
Dalton, Jeffrey; Dietz, Laura: UMass CIIR at TAC KBP 2013 Entity Linking: Query Expansion using Urban Dictionary. Text Analysis Conference (TAC), Gaithersburg, MD, USA, November 19-20, 2013. .pdf
Dietz, Laura; Dalton, Jeffrey: UMass at TREC 2013 Knowledge Base Acceleration Track: Bi-directional Entity Linking and Time-aware Evaluation. Text Retrieval Conference (TREC), Gaithersburg, MD, USA, November 20-22, 2013. .pdf
Dietz, Laura and Dalton, Jeffrey: Query-specific Knowledge Sketches: A Joint Retrieval Model for Text, Entities, and Relations. CIIR Technical Report, 2013.
Dietz, Laura; Wang, Ziqi; Huston, Samuel; Croft, W. Bruce: Retrieving Opinions from Discussion Forums. Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2013. .pdf
Dietz, Laura; Dalton, Jeffrey; Balog, Krisztian: Time-aware Evaluation of Cumulative Citation Recommendation Systems. Proceedings of SIGIR 2013 Workshop on Time-aware Information Access, TAIA, 2013 .pdf. code & supplement
Dalton, Jeffrey; Dietz, Laura: Constructing Query-Specific Knowledge Bases. Proceedings on the CIKM Workshop on Automated Knowledge Base Construction, 2013. .pdf.
Dalton, Jeffrey; Dietz, Laura: A Neighborhood Relevance Model for Entity Linking. Proceedings of the 10th International Conference in the RIAO series (OAIR), 2013. .pdf
Dietz, Laura: A Neighborhood-Relevance Model for Entity Linking. Mt Holyoke College, South Hadley, MA, USA, 20th of February, 2013. talk view in Web browser!
Dietz, Laura: A Neighborhood-Relevance Modelfor Entity Linking. Machine Learning and Friends Lunch Talk, University of Massachusetts, MA, USA, 14th of February, 2013. talk view in Web browser!
Dalton, Jeffrey; Dietz, Laura: Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track. In: Proceedings of Text REtrieval Conference (TREC), 2012. .pdf
Dietz, Laura; Gamari, Benjamin; Guiver, John; Snelson, Edward; Herbrich, Ralf: De-Layering Social Networks with Shared Tastes of Friendships. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (ICWSM), 2012 .pdf – Code & Supplement
Konietzny, Sebastian; Dietz, Laura; McHardy, Alice: Inferring functional modules of protein families with probabilistic topic models. In: BMC Bioinformatics, vol. 12, no. 1, 141+, 2011 .html
Dietz, Laura: Exploiting Graph-Structured Data in Generative Probabilistic Models. PhD Thesis, January 2011. Max Planck Institute for Informatics and Saarland University, 2011 .pdf
Dietz, Laura: Inferring Shared Interests from Social Networks”. In: NIPS Workshop on Computational Social Science and the Wisdom of Crowd : Text and Beyond, 2010 .pdf
Dietz, Laura: Directed Factor Graph Notation for Generative Models. Technical Report, 2010 .pdf – TIKZ macros and algorithms module: .zip - Thanks to Jaakko Luttinen for creating an improved version at github.com/jluttine/tikz-bayesnet !
Dietz, Laura: Modeling Shared Tastes in Online Communities. In: NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009 .pdf
Dietz, Laura ; Dallmeier, Valentin ; Zeller, Andreas ; Scheffer, Tobias: Localizing Bugs in Program Executions with Graphical Models. In: Advances in Neural Information Processing Systems, 2009 .pdf – supplement – project
Dietz, Laura; Bickel, Steffen;Scheffer Tobias : Unsupervised Prediction of Citation Influences. In: Proceedings of the 24th International Conference on Machine Learning. Corvallis, Oregon, USA, June 2007 .pdf – Watch the Talk – project
August 2012 - present: Post-doctoral researcher at Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (CIIR, Bruce Croft)
October 2010 - August 2012: Post-doctoral researcher at University of Massachusetts (IESL, Andrew McCallum).
January 2008 - January 2011: PhD Student at Max-Planck-Institute for Informatics (Databases and Information Systems, Prof Gerhard Weikum), Saarbruecken
January 2007 - December 2008: PhD Student at Max-Planck-Institute for Informatics (Machine Learning, Prof. Tobias Scheffer), Saarbruecken
October 2006 - December 2006: PhD Scholarship at Knowledge Management Group (Prof. Tobias Scheffer), Humboldt University, Berlin
December 2002 - September 2006: Research Associate at Concert Division and I-Info Division, Fraunhofer Institute for Publication and Information Systems (IPSI), Darmstadt
I am reviewing for different venues ranging from natural language processing (ACL, ACL, KDD, EMNLP, NAACL), machine learning (ICML, NIPS, UAI), information retrieval (SIGIR, CIKM), and data mining (KDD, CIKM).