google-research-datasets / query-wellformedness

Licence: other

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

Projects that are alternatives of or similar to query-wellformedness

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+352.5%)

Mutual labels: search-engine, information-retrieval

Relevancyfeedback

Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search

Stars: ✭ 19 (-76.25%)

Mutual labels: search-engine, information-retrieval

Lucene Solr

Apache Lucene and Solr open-source search software

Stars: ✭ 4,217 (+5171.25%)

Mutual labels: search-engine, information-retrieval

patzilla

PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.

Stars: ✭ 71 (-11.25%)

Mutual labels: search-engine, information-retrieval

Sf1r Lite

Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search

Stars: ✭ 158 (+97.5%)

Mutual labels: search-engine, information-retrieval

see

Search Engine in Erlang

Stars: ✭ 27 (-66.25%)

Mutual labels: search-engine, information-retrieval

Resin

Hardware-accelerated vector-based search engine. Available as a HTTP service or as an embedded library.

Stars: ✭ 529 (+561.25%)

Mutual labels: search-engine, information-retrieval

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (+55%)

Mutual labels: information-retrieval, nlp-machine-learning

Rated Ranking Evaluator

Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures

Stars: ✭ 134 (+67.5%)

Mutual labels: search-engine, information-retrieval

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+4161.25%)

Mutual labels: search-engine, information-retrieval

evildork

Evildork targeting your fiancee👁️

Stars: ✭ 46 (-42.5%)

Mutual labels: search-engine, information-retrieval

Aquiladb

Drop in solution for Decentralized Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.

Stars: ✭ 222 (+177.5%)

Mutual labels: search-engine, information-retrieval

lucene

Apache Lucene open-source search software

Stars: ✭ 1,009 (+1161.25%)

Mutual labels: search-engine, information-retrieval

Search Engine

A math-aware search engine.

Stars: ✭ 278 (+247.5%)

Mutual labels: search-engine, information-retrieval

solr

Apache Solr open-source search software

Stars: ✭ 651 (+713.75%)

Mutual labels: search-engine, information-retrieval

Pisa

PISA: Performant Indexes and Search for Academia

Stars: ✭ 489 (+511.25%)

Mutual labels: search-engine, information-retrieval

kex

Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.

Stars: ✭ 46 (-42.5%)

Mutual labels: information-retrieval, nlp-machine-learning

ake-datasets

Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.

Stars: ✭ 125 (+56.25%)

Mutual labels: information-retrieval, nlp-machine-learning

Vectorsinsearch

Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015

Stars: ✭ 71 (-11.25%)

Mutual labels: search-engine, information-retrieval

Bm25

A Python implementation of the BM25 ranking function.

Stars: ✭ 159 (+98.75%)

Mutual labels: search-engine, information-retrieval

View All Similar Projects ➔

Query-wellformedness Dataset

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

http://goo.gl/language/query-wellformedness

Description

Google's query wellformedness dataset was created by crowdsourcing well-formedness annotations for 25,100 queries from the Paralex corpus. Every query was annotated by five raters each with 1/0 rating of whether or not the query is well-formed. For further details please read our paper: Identifying Well-formed Natural Language Questions

For each query we provide the average of the 5 binary judgements as the wellformedness score for the query. Following are some examples of queries present in the dataset:

Query	Wellformedness rating
Which form of government is still in place in greece ?	1.0
Population of owls just in north america ?	0.0
Is johnny depp a celtic fan ?	0.8
Where did Roald Dahl live in his teenaged years ?	0.6

The dataset is divided into three files: train.tsv, dev.tsv and test.tsv each containing rated queries. The size of the files is as follows:

File	No. of queries
train.tsv	17,500
dev.tsv	3,750
test.tsv	3,850

Examples

The examples in each file are tab separated containing the following columns:

Column	Content
1	The European Union includes how many ?
2	0.2

Reference

If you use or discuss this dataset in your work, please cite our paper:

@InProceedings{FaruquiDas2018,
  title = {{Identifying Well-formed Natural Language Questions}},
  author = {Faruqui, Manaal and Das, Dipanjan},
  booktitle = {Proc. of EMNLP},
  year = {2018}
}

License

Query-wellformedness dataset is licensed under CC BY-SA 4.0. Any third party content or data is provided “As Is” without any warranty, express or implied.

Contact

If you have a technical question regarding the dataset or publication, please create an issue in this repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

google-research-datasets / query-wellformedness

Labels

Projects that are alternatives of or similar to query-wellformedness

Query-wellformedness Dataset

Description

Examples

Reference

License

Contact