All Projects → phosseini → CREST

phosseini / CREST

Licence: other
A Causal Relation Schema for Text

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
perl
6916 projects

Projects that are alternatives of or similar to CREST

Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+1389.47%)
Mutual labels:  relation-extraction, natural-language-understanding
Relation-Classification
Relation Classification - SEMEVAL 2010 task 8 dataset
Stars: ✭ 46 (+142.11%)
Mutual labels:  semeval, relation-extraction
VERSE
Vancouver Event and Relation System for Extraction
Stars: ✭ 13 (-31.58%)
Mutual labels:  relation-extraction, bionlp
WSDM-Cup-2019
[ACM-WSDM] 3rd place solution at WSDM Cup 2019, Fake News Classification on Kaggle.
Stars: ✭ 62 (+226.32%)
Mutual labels:  natural-language-understanding
Recurrent Interaction Network EMNLP2020
Here is the code for the paper ``Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations'' accepted by EMNLP2020.
Stars: ✭ 13 (-31.58%)
Mutual labels:  relation-extraction
OpenNRE for Chinese
OpenNRE for Chinese open relation extraction task in pytorch
Stars: ✭ 32 (+68.42%)
Mutual labels:  relation-extraction
adversarial-relation-classification
Unsupervised domain adaptation method for relation extraction
Stars: ✭ 18 (-5.26%)
Mutual labels:  relation-extraction
causal-learn
Causal Discovery for Python. Translation and extension of the Tetrad Java code.
Stars: ✭ 428 (+2152.63%)
Mutual labels:  causality
TOEFL-QA
A question answering dataset for machine comprehension of spoken content
Stars: ✭ 61 (+221.05%)
Mutual labels:  natural-language-understanding
huner
Named Entity Recognition for biomedical entities
Stars: ✭ 44 (+131.58%)
Mutual labels:  bionlp
Causal-Deconvolution-of-Networks
Causal Deconvolution of Networks by Algorithmic Generative Models
Stars: ✭ 25 (+31.58%)
Mutual labels:  causality
MetaLifelongLanguage
Repository containing code for the paper "Meta-Learning with Sparse Experience Replay for Lifelong Language Learning".
Stars: ✭ 21 (+10.53%)
Mutual labels:  relation-extraction
Discovery
Mining Discourse Markers for Unsupervised Sentence Representation Learning
Stars: ✭ 48 (+152.63%)
Mutual labels:  natural-language-understanding
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (+68.42%)
Mutual labels:  semeval
MemNet ABSA
No description or website provided.
Stars: ✭ 20 (+5.26%)
Mutual labels:  semeval
ntua-slp-semeval2018
Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.
Stars: ✭ 79 (+315.79%)
Mutual labels:  semeval
turingadvice
Evaluating Machines by their Real-World Language Use
Stars: ✭ 23 (+21.05%)
Mutual labels:  natural-language-understanding
IE Paper Notes
Paper notes for Information Extraction, including Relation Extraction (RE), Named Entity Recognition (NER), Entity Linking (EL), Event Extraction (EE), Named Entity Disambiguation (NED).
Stars: ✭ 14 (-26.32%)
Mutual labels:  relation-extraction
Manhattan-LSTM
Keras and PyTorch implementations of the MaLSTM model for computing Semantic Similarity.
Stars: ✭ 28 (+47.37%)
Mutual labels:  natural-language-understanding
DiagnoseRE
Source code and dataset for the CCKS201 paper "On Robustness and Bias Analysis of BERT-based Relation Extraction"
Stars: ✭ 23 (+21.05%)
Mutual labels:  relation-extraction

CREST: A Causal Relation Schema for Text 🚀

CREST is a machine-readable format/schema that is created to help researchers who work on causal/counterfactual relation extraction and commonsense causal reasoning, to use and leverage the scattered data resources around these topics more easily. CREST-formatted data are stored as pandas DataFrame.

How to convert dataset(s) to CREST:

  • Clone this repository and go to the /CREST directory.
  • Install the requirements: pip install -r requirements.txt
  • Download spaCy's model: python -m spacy download en_core_web_sm
  • Run the /crest/convert.py:
    • python convert.py -i: printing the full list of currently supported datasets
    • python convert.py [DATASET_ID_0] ... [DATASET_ID_n] [OUTPUT_FILE_NAME]
      • DATASET_ID_*: id of a dataset.
      • OUTPUT_FILE_NAME: name of the output file that should be in .xlsx format
  • Examples:
    • Converting datasets 1 and 2: python convert.py 1 2 output.xlsx
    • Converting dataset 5: python convert.py 5 output.xlsx

CREST format

Each relation in a CREST-formatted DataFrame has the following fields/values:

  • original_id: the id of a relation in the original dataset, if such an id exists.
  • span1: a list of strings of the first span/argument of the relation.
  • span2: a list of strings of the second span/argument of the relation
  • signal: a list of strings of signals/markers of the relation in context, if any.
  • context: a text string of the context in which the relation appears.
  • idx: indices of span1, span2, and signal tokens/spans in context stored in 3 lines, each line in the form of span_type start_1:end_1 ... start_n:end_n. For example, if span1 has multiple tokens/spans with start:end indices 2:5 and 10:13, respectively, span1's line value in idx is span1 2:5 10:13. Indices are sorted based on the start indexes of tokens/spans.
  • label: label of the relation, 0: non-causal, 1: causal
  • direction: direction between span1 and span2. 0: span1 => span2, 1: span1 <= span2, -1: not-specified
  • source: id of the source dataset (ids are listed in a table below)
  • split: 0: train, 1: dev, 2: test. This is the split to which the relation belongs in the original dataset. If there is no split specified for a relation in the original dataset, we assign the relation to the train split by default.

Note: The reason we save a list of strings instead of a single string for span1, span2, and signal is that these text spans may contain multiple non-consecutive sub-spans in context.

Available Data Resources

List of data resources already converted to CREST format:

Id Data resource Samples Causal Non-causal Document
1 SemEval 2007 Task 4 1,529 114 1,415 -
2 SemEval 2010 Task 8 10,717 1,331 9,386 -
3 EventCausality 583 583 - Paper
4 Causal-TimeBank 318 318 - Paper
5 EventStoryLine v1.5 2,608 2,608 - Paper
6 CaTeRS 2,502 308 2,194 Paper
7 BECauSE v2.1 ⚠️ 729 554 175 Paper
8 Choice of Plausible Alternatives (COPA) 2,000 1,000 1,000 Paper
9 The Penn Discourse Treebank (PDTB) 3.0 ⚠️ 7,991 7,991 - Manual
10 BioCause Corpus 844 844 - Paper
11 Temporal and Causal Reasoning (TCR) 172 172 - Paper
12 Benchmark Corpus for Adverse Drug Effects 5,671 5,671 - Paper
13 SemEval 2020 Task 5 :atom: 5,501 5,501 - Paper

⚠️ The data is either not publicly available or partially available. You can still use CREST for conversion if you have full access to this dataset.

:atom:  Counterfactual Relations

CREST conversion

We provide helper methods to convert CREST-formatted data to popular formats and annotation schemes, mainly formats that are used across relation extraction/classification tasks. In the following, there is a list of formats for which we have already developed CREST converter methods:

  • brat: we have provided helper methods for two-way conversion of CREST data frames to brat (see example here). brat is a popular web-based annotation tool that has been used for a variety of relation extraction NLP tasks. We use brat for two main reasons: 1) better visualization of causal and non-causal relations and their arguments, and 2) modifying annotations if needed and adding new annotations to provided context. In the following, there is a sample of a converted version of CREST-formatted relation to brat (example is taken from CaTeRS dataset):

  • TACRED: TACRED is a large-scale relation extraction dataset. We convert samples from CREST to TACRED since TACRED-formatted data can be easily used as input to many transformers-based language models (e.g. for Relation Classification/Extraction). You can find an example of converting CREST-formatted data to TACRED in this notebook.

How you can contribute:

  • Are there any related datasets you don’t see in the list? Let us know or feel free to submit a Pull Request (PR), we actively check the PRs and appreciate it ☺️
  • Is there a well-known or widely-used machine-readable format you think can be added? We can add the helper methods for conversion or we appreciate PRs.

How to cite CREST?

For now, please cite our arXiv paper:

@article{hosseini2021predicting,
  title={Predicting Directionality in Causal Relations in Text},
  author={Hosseini, Pedram and Broniatowski, David A and Diab, Mona},
  journal={arXiv preprint arXiv:2103.13606},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].