diazf / Trec Data
scripts to download and standardize trec query and document sets
Stars: ✭ 42
Labels
Projects that are alternatives of or similar to Trec Data
Ananas
This is an Arduino based program for step motor controller,Ananas.
Stars: ✭ 38 (-9.52%)
Mutual labels: makefile
Exopenwrt
Extended OpenWrt repository. Note: Latest dnscrypt-proxy merged to upstream (Designated Driver).
Stars: ✭ 39 (-7.14%)
Mutual labels: makefile
Acris Download
Download NYC real estate transaction data and drop it in a database
Stars: ✭ 38 (-9.52%)
Mutual labels: makefile
Lakka Libreelec
Lakka is a lightweight Linux distribution that transforms a small computer into a full blown game console.
Stars: ✭ 1,007 (+2297.62%)
Mutual labels: makefile
Docker Unix 1st Ed
A Docker image that drops you into 1st Edition Unix
Stars: ✭ 37 (-11.9%)
Mutual labels: makefile
Avian Pack
Avian all-inclusive. Everything needed to build Avian with (or without) Android classpath.
Stars: ✭ 36 (-14.29%)
Mutual labels: makefile
The Ooc Language
📘 The definitive manual on the ooc programming language
Stars: ✭ 38 (-9.52%)
Mutual labels: makefile
Coreos Nvidia
Yet another NVIDIA driver container for Container Linux (aka CoreOS)
Stars: ✭ 36 (-14.29%)
Mutual labels: makefile
Cloverleaf
A hydrodynamics mini-app to solve the compressible Euler equations in 2D, using an explicit, second-order method.
Stars: ✭ 39 (-7.14%)
Mutual labels: makefile
Docker Bitcoin Regtest
A way to experiment with Bitcoin.
Stars: ✭ 35 (-16.67%)
Mutual labels: makefile
Twemoji Color Font
Twitter Unicode 13 emoji color OpenType-SVG font for Linux/MacOS/Windows
Stars: ✭ 1,006 (+2295.24%)
Mutual labels: makefile
trec-data
A simple package to download and standardize TREC experiment data. It generates or includes,
- standardized title, description, and narrative queries
- qrels for the full corpus and popular "news-only" subsets
- standard stopword list (indri)
The build process will download and process qrels from NIST and other servers for the following datasets,
- trec12: topics 51-200 associated with all documents on Tipster Disks 1 and 2.
- trec12-news: topics 51-200 associated with only news documents on Tipster Disks 1 and 2. This only includes the AP, WSJ, and Ziff-Davis documents in the qrels.
- trec45: the Robust 2004 topics associated with documents on TREC Disks 4 and 5 except the Congressional Record.
- trec45-news: the Robust 2004 topics associated with only news documents on TREC Disks 4 and 5. This only includes the FBIS, FT, and LA Times documents in the qrels.
- nyt: the Common Core 2017 topics associated with all documents in the New York Times Annotated Corpus.
- msmarco: the topics associated with documents in the MS MARCO dataset.
- msmarco-docs: the topics associated with documents in the MS MARCO document ranking dataset.
- mq: 60k unjudged queries associated with the Million Query Track.
- aol: ~7.5M unique unjudged queries associated with a filtered version of the AOL Query Log. Please be prepared to deal with the ethical issues raised in using this dataset.
For each set of queries (i.e. trec12, trec45, and nyt), we generate title, description, and narrative queries. The MS Marco dataset only has title queries.
Datasets
qrels/ relevance judgements for test collections
queries/ queries for tests collections
qlogs/ queries with no associated relevance judgments
misc/ miscellaneous data for experiments
Building the Dataset
make
Dependencies
Related
Citation
@online{diaz:trec-data,
author = {Diaz, Fernando},
title = {trec-data},
year = {2018},
url = {https://github.com/diazf/trec-data}
}
Notes
Thanks to Hamed Zamani and Mostafa Dehghani for help with AOL processing logic.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].