All Projects → synapse-developpement → DiscEval

synapse-developpement / DiscEval

Licence: other
Discourse Based Evaluation of Language Understanding

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to DiscEval

GLUE-bert4keras
基于bert4keras的GLUE基准代码
Stars: ✭ 59 (+227.78%)
Mutual labels:  glue, bert, natural-language-understanding
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+13372.22%)
Mutual labels:  benchmark, glue, bert
Chineseglue
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Stars: ✭ 1,548 (+8500%)
Mutual labels:  glue, datasets, bert
Awesome Semantic Segmentation
🤘 awesome-semantic-segmentation
Stars: ✭ 8,831 (+48961.11%)
Mutual labels:  benchmark, evaluation
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (+22.22%)
Mutual labels:  benchmark, bert
FewCLUE
FewCLUE 小样本学习测评基准,中文版
Stars: ✭ 251 (+1294.44%)
Mutual labels:  benchmark, bert
word-benchmarks
Benchmarks for intrinsic word embeddings evaluation.
Stars: ✭ 45 (+150%)
Mutual labels:  benchmark, evaluation
Nas Benchmark
"NAS evaluation is frustratingly hard", ICLR2020
Stars: ✭ 126 (+600%)
Mutual labels:  benchmark, evaluation
Evalne
Source code for EvalNE, a Python library for evaluating Network Embedding methods.
Stars: ✭ 67 (+272.22%)
Mutual labels:  benchmark, evaluation
Hpatches Benchmark
Python & Matlab code for local feature descriptor evaluation with the HPatches dataset.
Stars: ✭ 129 (+616.67%)
Mutual labels:  benchmark, evaluation
Blue benchmark
BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora.
Stars: ✭ 159 (+783.33%)
Mutual labels:  benchmark, natural-language-understanding
Indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Stars: ✭ 198 (+1000%)
Mutual labels:  benchmark, datasets
KLUE
📖 Korean NLU Benchmark
Stars: ✭ 420 (+2233.33%)
Mutual labels:  benchmark, bert
KAREN
KAREN: Unifying Hatespeech Detection and Benchmarking
Stars: ✭ 18 (+0%)
Mutual labels:  benchmark, bert
Superpixel Benchmark
An extensive evaluation and comparison of 28 state-of-the-art superpixel algorithms on 5 datasets.
Stars: ✭ 275 (+1427.78%)
Mutual labels:  benchmark, evaluation
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+2005.56%)
Mutual labels:  benchmark, evaluation
Evo
Python package for the evaluation of odometry and SLAM
Stars: ✭ 1,373 (+7527.78%)
Mutual labels:  benchmark, evaluation
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Stars: ✭ 13,870 (+76955.56%)
Mutual labels:  evaluation, datasets
AIODrive
Official Python/PyTorch Implementation for "All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds"
Stars: ✭ 32 (+77.78%)
Mutual labels:  evaluation, datasets
bert extension tf
BERT Extension in TensorFlow
Stars: ✭ 29 (+61.11%)
Mutual labels:  bert, natural-language-understanding

PragmEval : A Pragmatics-Centered Evaluation Framework for Natural Language Understanding

This is a data/code release accompanying this paper:

  • Title: "A Pragmatics-Centered Evaluation Framework for Natural Language Understanding"
  • Authors: Damien Sileo, Tim Van de Cruys, Camille Pradel and Philippe Muller
  • Accepted at LREC2022
  • https://arxiv.org/abs/1907.08672

Contents

PragmEval is a compilation of 11 evaluation datasets with a focus on discourse, that can be used for evaluation of English Natural Language Understanding, or as auxiliary training tasks for NLP models.

While the idea of meaning as use permeates NLP, it's not clear that current evaluations fully account for that aspect. Previous evaluation frameworks have no clear way to evaluate how models deal with implicatures or different kinds of speech acts, and are arguably focussing on semantics (Natural Language Inference of Semantic Similarity) rather than use. We propose a discourse-centered evaluation with a focus on meaning as use.

dataset categories example class #train
PDTB discourse relation it was censorship / it was outrageous conjunction 13k
STAC discourse relation what ? / i literally lost question-answer-pair 11k
GUM discourse relation do not drink / if underage in your country condition 2k
Emergent stance a meteorite landed in nicaragua. / small meteorite hits managua for 2k
SarcasmV2 presence of sarcasm don't quit your day job / [...] i was going to sell this joke. [...] sarcasm 9k
SwitchBoard speech act well , a little different , actually , hedge 19k
MRDA speect act yeah that 's that 's that 's what i meant . acknowledge-answer 14k
Verifiability verifiability I've been a physician for 20 years. verifiable-experiential 6k
Persuasion C/E/P/S/S/R Co-operation is essential for team work / lions hunt in a team low specificity 0.6k
Squinky I/I/F boo ya. uninformative, high implicature, unformal, 4k
EmoBank V/A/D I wanted to be there.. low valence, high arousal, low dominance 5k

Instructions

Recommended usage

from datasets import load_dataset
dataset = load_dataset('pragmeval','gum')

Evaluate your model with this script: https://colab.research.google.com/drive/1sg--LF4z7XR1wxAOfp0-3d4J6kQ9nj_A?usp=sharing

Building from sources:

git clone https://github.com/synapse-developpement/PragmEval.git

The preprocessed datasets are available in the pragmeval folder in tsv format.

Run the bash get_data.bash in data to download dataset from the sources

Run the notebook Make PragmEval 1.0 after having specified pragmeval_base_path in the third cell to perform preprocessing and exports.

Citation

Accepted at LREC2022

@misc{sileo2022pragmeval,
      title={A Pragmatics-Centered Evaluation Framework\\ for Natural Language Understanding}, 
      author={Damien Sileo and Tim Van-de-Cruys and Camille Pradel and Philippe Muller},
      year={2022},
      eprint={1907.08672},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

For further information, you can contact:

damien dot sileo at gmail dot com

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].