Hamed Bonab, James Allan, and Ramesh Sitaraman
This data is prepared based on 200 queries of the Cross-Language Evaluation Forum (CLEF) 2000-2003 campaign for bilingual ad-hoc retrieval tracks (http://catalog.elra.info/en-us/repository/browse/ELRA-E0008/). The Swahili and Somali queries are the translation of English queries from C001-C200 topic set. We hired a translation organization to translate the title and description of each topic into Somali and Swahili. For more information please refer to the published paper.
Bonab, H., Allan, J. and Sitaraman, R., "Simulating CLIR Translation Resource Scarcity using High-resource Languages," in the Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR), Santa Clara, CA, USA, October 2-5, 2019, pp 129-136.
Link to the paper: https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1357
Link to the dataset: https://ciir.cs.umass.edu/downloads/ictir19_simulate_low_resource/ictir1...