Research Scientist Dr.-Ing. Laura Dietz

<lastname> at

Room 366

Department of Computer Science

140 Governors Drive

Amherst MA 01003

PGP ID: 0xF5F7017F

I am a Research Scientist at the Center for Intelligent Information Retrieval (CIIR) working with Bruce Croft at University of Massachusetts. Before that I did a post-doc with Andrew McCallum. I graduated from Max Planck Insititute for Informatics in Saarbruecken, Germany in January 2011.

Research Interests

  • Entity-Linking and Entity-Retrieval
  • Integration of Information Retrieval and Information Extraction (“Information-RetrAction”)
  • Biomedical NLP/IR
  • Question Answering and Passage Retrieval
  • Bayesian Probabilistic Models / Graphical Models
  • Probabilistic Topic Models
  • Graph Structured Data
  • Data from Social Networks with Text
  • Statistical Machine Learning

My research is placed in the intersection between Information Retrieval and Information Extraction, where I am striving towards a deep integration rather than a pipelined combination. My tool of choice are graphical models, often generative probabilistic models. This pattern underlies all the different facets of my research, where some are detailed in the following:

Entity-based Enrichment for Document Retrieval

Together with Jeff Dalton, I am studying how to effectively leverage Knowledge Bases such as Wikipedia and Freebase in ad hoc document retrieval. In a first step, documents and queries are enriched with links to the knowledge base. During the retrieval stage, these links can be used as an additional vocabulary as well as in feedback-based query expansions. For instance entities that are linked from the query are expected to also be linked in relevant documents. However, we may compensate for errors in the entity linking stage by also considering terms from the entities’ article text, as well as name variants. An additional option are feedback methods, where documents retrieved in a preliminary pass are inspected for entity links to update the belief on which entities are relevant for the query. We also use the feedback documents to build an entity-context model to understand how each entity is related to the query.

This work is currently under submission.

Knowledge Sketches

Assuming the existence of a large corpus and a large general purpose knowledge base, we want to support a user to explore a question in terms three facets: entities, pertinent relationships and relevant text passages. We devise a solution that reasons about distributions over entities, relations, and documents in a unified manner. For instance, we can arrive at a prior distribution over entities by issueing a query against the knowledge base. The distribution over entities helps to identify relevant document passages. Applying Bayes-rule, we can update the distribution over entities, given retrieved document passages. This is formalized in a generative model, which includes factors comprising probabilistic retrieval models.

This work was presented at AKBC 2013.

Entity Linking

Entity linking refers to a problem setting where the algorithm is given a string in a document and has to predict which Wikipedia entity it refers to. Our solution involved a retrieval model that incorporates the string itself, and surrounding entity mentions to predict entity candidates as a ranking. We show that this model is an approximation to state-of-the-art models which optimize a joint assignment of mentions to entities. This solution can be further refined with supervised re-rankers but also provides reasonable performance “out-of-the-box”.

We participate with this solution in TAC KBP 2012 and TAC KBP 2013 (talkposter). Also see our publication at OAIR 2012 (general-talktech-talk).

The code is available as part of the KB-Bridge project.

Entity Tracking and Retrieval

In order to monitor a stream of news and social documents for stories involving one or more target entities. We tap on symmetric relationships in our Entity Linking approach both retrieve relevant documents (KB to text) and entity link them (text to KB) with the same underlying model. This requires to integrate low-level NLP algorithms into a retrieval framework.

We participate with this solution in TREC KBA 2012 and TREC KBA 2013. A paper on time-aware IR-based evaluation is published at TAIA 2013. The time-aware evaluation methods are used to analyze our KBA 2013 results with results presented at in our 2013 talk at TREC.

… and more …

I further work on “senti-PRF”, a pseudo relevance feedback approach to optimize retrieval for opinionated questions. Published at CIKM 2013.

Relatedly, I am interested in “vague” Question Answering, such questions asking for opinions, advice, or research questions. Here I work both with general-purpose data sets and bio-medical question-answering.

I am still interested in unsupervised algorithms for identifying shared aspects and quantifying influence in social networks. Work on symmetric networks is published at ICWSM 2012 ( Code & Supplement ) and asymmetric networks at ICML 2007 (talkSupplement).

Other work revolved around localizing bugs in software, published at NIPS 2009. (supplementproject page)

Further, I am working on a scalable MCMC inference framework “bayes-stack”, available on GitHub.

My PhD thesis was mainly focused on topic models and other generative models for data with link structure.

Serving on PhD Committees

Open Source Releases

  • Strepsirrhini, a modular composable toolkit in scala for retrieval, reranking, and expansion with and without entity annotations, Laura Dietz, 2014.

  • Riffle, open hardware and software for a water-quality sensor with data analysis software. Benjamin Gamari, Don Blair, Laura Dietz, 2014.

  • Stream-Eval, an evaluation framework for time-aware evaluation of cumulative citation recommendation systems. Laura Dietz, Jeffrey Dalton, Krizstian Balog, 2013.

  • KB-Bridge, a framework for entity linking. Jeffrey Dalton, Laura Dietz, 2013.

  • Hphoton and photon-tools - overview - walkthrough Open source hardware and software for single-molecule fluorescence analysis. Benjamin Gamari, Laura Dietz, Lori Goldner, 2013. (Received OSSI Award 2013 from UMass ICB3)

  • Bayes-Stack, a framework for inference on probabilistic graphical models. Laura Dietz, Benjamin Gamari, 2012.

  • Tikz-Bayesnet, open source latex add-on / TIKZ package for graphical model diagrams. Laura Dietz, 2010. (Forked and continued by Jaakko Luttinen, 2012).

Selected Publications & Talks

  • Dalton, Jeffrey; Dietz, Laura; Allan, James: Entity Query Feature Expansion using Knowledge Base Links. In Proceedings of the 37th Annual International ACM SIGIR conference, Gold Coast, Queensland, Australia, July 6-11, 2014. .pdfappendix

  • Dietz, Laura; Dalton, Jeffrey; Croft, W. Bruce: A Graphical Model for Entity-based Document Retrieval. Poster at New England Machine Learning Day, Boston, USA, May 13, 2014. .pdf.svg

  • Dietz, Laura: Tutorial on Entity Linking. American University of Beirut. March 21st, 2014. Watch the video: .ogv or on youtube, Slides: .svg or .pdf.

  • Dalton, Jeffrey; Dietz, Laura: UMass CIIR at TAC KBP 2013 Entity Linking: Query Expansion using Urban Dictionary. Text Analysis Conference (TAC), Gaithersburg, MD, USA, November 19-20, 2013. .pdf

  • Dietz, Laura; Dalton, Jeffrey: UMass at TREC 2013 Knowledge Base Acceleration Track: Bi-directional Entity Linking and Time-aware Evaluation. Text Retrieval Conference (TREC), Gaithersburg, MD, USA, November 20-22, 2013. .pdf

  • Dietz, Laura and Dalton, Jeffrey: Query-specific Knowledge Sketches: A Joint Retrieval Model for Text, Entities, and Relations. CIIR Technical Report, 2013.

  • Dietz, Laura; Wang, Ziqi; Huston, Samuel; Croft, W. Bruce: Retrieving Opinions from Discussion Forums. Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2013. .pdf

  • Dietz, Laura; Dalton, Jeffrey; Balog, Krisztian: Time-aware Evaluation of Cumulative Citation Recommendation Systems. Proceedings of SIGIR 2013 Workshop on Time-aware Information Access, TAIA, 2013 .pdf. code & supplement

  • Dalton, Jeffrey; Dietz, Laura: Constructing Query-Specific Knowledge Bases. Proceedings on the CIKM Workshop on Automated Knowledge Base Construction, 2013. .pdf.

  • Dalton, Jeffrey; Dietz, Laura: A Neighborhood Relevance Model for Entity Linking. Proceedings of the 10th International Conference in the RIAO series (OAIR), 2013. .pdf

  • Dietz, Laura: A Neighborhood-Relevance Model for Entity Linking. Mt Holyoke College, South Hadley, MA, USA, 20th of February, 2013. talk view in Web browser!

  • Dietz, Laura: A Neighborhood-Relevance Modelfor Entity Linking. Machine Learning and Friends Lunch Talk, University of Massachusetts, MA, USA, 14th of February, 2013. talk view in Web browser!

  • Gamari, Ben; Dietz, Laura: Bayes-stack. In: Twenty-third edition of Haskell Communities and Activities Report, November, 2012. .htmlCodeProject

  • Dietz, Laura; Dalton, Jeffrey: Across-Document Neighborhood Expansion: UMass at TAC KBP 2012 Entity Linking. In: Proceedings of the Text Analysis Conference (TAC), 2012. .pdftalkposter

  • Dalton, Jeffrey; Dietz, Laura: Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track. In: Proceedings of Text REtrieval Conference (TREC), 2012. .pdf

  • Dietz, Laura; Gamari, Benjamin; Guiver, John; Snelson, Edward; Herbrich, Ralf: De-Layering Social Networks with Shared Tastes of Friendships. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (ICWSM), 2012 .pdfCode & Supplement

  • Konietzny, Sebastian; Dietz, Laura; McHardy, Alice: Inferring functional modules of protein families with probabilistic topic models. In: BMC Bioinformatics, vol. 12, no. 1, 141+, 2011 .html

  • Dietz, Laura: Exploiting Graph-Structured Data in Generative Probabilistic Models. PhD Thesis, January 2011. Max Planck Institute for Informatics and Saarland University, 2011 .pdf

  • Dietz, Laura: Inferring Shared Interests from Social Networks”. In: NIPS Workshop on Computational Social Science and the Wisdom of Crowd : Text and Beyond, 2010 .pdf

  • Dietz, Laura: Directed Factor Graph Notation for Generative Models. Technical Report, 2010 .pdf – TIKZ macros and algorithms module: .zip - Thanks to Jaakko Luttinen for creating an improved version at !

  • Dietz, Laura: Modeling Shared Tastes in Online Communities. In: NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009 .pdf

  • Dietz, Laura ; Dallmeier, Valentin ; Zeller, Andreas ; Scheffer, Tobias: Localizing Bugs in Program Executions with Graphical Models. In: Advances in Neural Information Processing Systems, 2009 .pdfsupplementproject

  • Dietz, Laura; Bickel, Steffen;Scheffer Tobias : Unsupervised Prediction of Citation Influences. In: Proceedings of the 24th International Conference on Machine Learning. Corvallis, Oregon, USA, June 2007 .pdfWatch the Talkproject


Recent Positions

August 2012 - present: Post-doctoral researcher at Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (CIIR, Bruce Croft)

October 2010 - August 2012: Post-doctoral researcher at University of Massachusetts (IESL, Andrew McCallum).

January 2008 - January 2011: PhD Student at Max-Planck-Institute for Informatics (Databases and Information Systems, Prof Gerhard Weikum), Saarbruecken

January 2007 - December 2008: PhD Student at Max-Planck-Institute for Informatics (Machine Learning, Prof. Tobias Scheffer), Saarbruecken

October 2006 - December 2006: PhD Scholarship at Knowledge Management Group (Prof. Tobias Scheffer), Humboldt University, Berlin

December 2002 - September 2006: Research Associate at Concert Division and I-Info Division, Fraunhofer Institute for Publication and Information Systems (IPSI), Darmstadt


I am reviewing for different venues ranging from natural language processing (ACL, ACL, KDD, EMNLP, NAACL), machine learning (ICML, NIPS, UAI), information retrieval (SIGIR, CIKM), and data mining (KDD, CIKM).


  • Playing Board Games
  • Knitting
  • Origami, Paper crafts
  • Sketching / Painting
  • Tap Dancing
  • Organizing BBQs