Index of /downloads/Antique/tf-ranking

Name	Last modified	Size

Parent Directory		-
test.tfrecords	2019-07-08 10:48	12M
train.tfrecords	2019-07-08 10:49	149M
vocab.txt	2019-07-08 10:49	226K
ELWC/	2019-10-25 10:09	-
EIE/	2019-10-25 10:10	-
antique_test_seq_64_..>	2021-07-10 22:50	676K
antique_train_seq_64..>	2021-07-10 22:50	8.3M

TensorFlow models and datasets for ANTIQUE.

Introduction

This folder contains the ANTIQUE dataset in a format compatible for using with TensorFlow and TensorFlow Ranking, in particular.

TF-Ranking

TF-Ranking is a library for solving large scale ranking problems using deep learning. TF-Ranking can handle heterogeneous dense and sparse features, and scales up to millions of data points. For more details, please look at the github repo or the technical paper published on arXiv.

Data Format

For representing ranking data, protobuffers are extensible structures suitable for storing data in a serialized format, either locally or in a distributed manner.

Ranking usually consists of features corresponding to each of the examples being sorted. In addition, features related to query, user or session are also useful for ranking. We refer to these as context features, as these are independent of the examples.

We use the popular tf.Example proto to represent the features for context, and each of the examples. We use the protobuffer, **ExampleListWithContext** (ELWC), to store context as a tf.Example proto and the list of examples to be ranked as a list of tf.Example protos.

ExampleListWithContext protbuffer is defined here.

We also support a new format for ranking data, Example in Example (EIE), to store context as a serialized tf.Example proto and the list of examples to be ranked as a list of serialized tf.Example protos.

Data in EIE Format

This folder contains train, test files in ELWC format, encoded in TFRecords. Similar train and test files are available for EIE format. The vocabulary file contains the frequent tokens present in the queries and documents.

TF-Ranking Demo on ANTIQUE

A TF-Ranking client for handling sparse features in ANTIQUE dataset is presented as a colaboratory notebook in an interactive Python environment. The demo is available at git.io/tf-ranking-demo.

The colab notebook demonstrates how to:

Use sparse/embedding features
Use tensorflow.serving.ExampleListWithContext as input data
Process data in TFRecord format
Tensorboard integration in colab, for Estimator API.

Citation

This data and the colaboratory notebook are publicly available for research purposes. If you find them useful, in addition to the original dataset, please cite the following article:

Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, Stephan Wolf. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank. KDD 2019.

In bibtex format:

@inproceedings{TensorflowRankingKDD2019,
   author = {Rama Kumar Pasumarthi and Sebastian Bruch and Xuanhui Wang and Cheng Li and Michael Bendersky and Marc Najork and Jan Pfeifer and Nadav Golbandi and Rohan Anil and Stephan Wolf},
   title = {TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank},
   booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
   year = {2019},
   pages = {(to appear)}
   location = {Anchorage, AK}}