All Projects → phueb → CHILDES-SRL

phueb / CHILDES-SRL

Licence: other
Research code for generating semantic role labels for CHILDES

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects

Projects that are alternatives of or similar to CHILDES-SRL

ssmtool
Simple sentence mining tool for language learning
Stars: ✭ 92 (+557.14%)
Mutual labels:  language-learning
ankimaker
Automatically generates Anki decks from many sources
Stars: ✭ 43 (+207.14%)
Mutual labels:  language-learning
fluentcards
Flashcards from dictionary look-ups
Stars: ✭ 41 (+192.86%)
Mutual labels:  language-learning
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (+192.86%)
Mutual labels:  srl
allennlp imdb
AllenNLP Startup Guide
Stars: ✭ 13 (-7.14%)
Mutual labels:  allennlp
wallpaper-learn
Learn languages, facts, schoolwork, and more from your wallpaper by setting a cycling background with computer-generated images.
Stars: ✭ 22 (+57.14%)
Mutual labels:  language-learning
kotlin-learn
솔직히 코틀린 하면서 모르는 거 있을 걸요?
Stars: ✭ 63 (+350%)
Mutual labels:  language-learning
immersive
Language learning mpv script for looking up words within mpv and generating Anki cards
Stars: ✭ 43 (+207.14%)
Mutual labels:  language-learning
naacl2019-select-pretraining-data-for-ner
BiLSTM-CRF model for NER
Stars: ✭ 15 (+7.14%)
Mutual labels:  allennlp
san
The official PyTorch implementation of "Context Matters: Self-Attention for sign Language Recognition"
Stars: ✭ 17 (+21.43%)
Mutual labels:  language-learning
paper
Computer Foundations Practices
Stars: ✭ 17 (+21.43%)
Mutual labels:  language-learning
salty bot
Twitch chat bot
Stars: ✭ 15 (+7.14%)
Mutual labels:  srl
youtube-flashcards
Extract screenshots & audio clips from YouTube videos into Anki cards
Stars: ✭ 52 (+271.43%)
Mutual labels:  language-learning
wandb-allennlp
Utilities and boilerplate code to use wandb with allennlp
Stars: ✭ 20 (+42.86%)
Mutual labels:  allennlp
optuna-allennlp
🚀 A demonstration of hyperparameter optimization using Optuna for models implemented with AllenNLP.
Stars: ✭ 17 (+21.43%)
Mutual labels:  allennlp
hsk-vocabulary
🇨🇳Open source Chinese HSK vocabulary list with example sentences
Stars: ✭ 27 (+92.86%)
Mutual labels:  language-learning
athnlp-labs
Athens NLP Summer School Labs
Stars: ✭ 41 (+192.86%)
Mutual labels:  allennlp
subadub
Chrome+Firefox extension for studying foreign languages using Netflix subtitles
Stars: ✭ 103 (+635.71%)
Mutual labels:  language-learning
omnilingo
Listening-based language learning
Stars: ✭ 31 (+121.43%)
Mutual labels:  language-learning
allennlp-optuna
⚡️ AllenNLP plugin for adding subcommands to use Optuna, making hyperparameter optimization easy
Stars: ✭ 33 (+135.71%)
Mutual labels:  allennlp

CHILDES-SRL

A corpus of semantic role labels auto-generated for 5M words of American-English child-directed speech.

Purpose

The purpose of this repository is to:

  • host the CHILDES-SRL corpus, and code to generate it, and
  • suggest recipes for training BERT on CHILDES-SRL for classifying token spans into semantic role arguments.

Inspiration and code for a BERT-based semantic role labeler comes from the AllenNLP toolkit. A SRL demo can be found here.

The code is for research purpose only.

Data

There are 2 manually annotated ("human-based") datasets, named after the year of their release:

  • data/pre_processed/human-based-2018_srl.txt
  • data/pre_processed/human-based-2008_srl.txt

The latter is an extended version of the former, which also includes SRL annotation for prepositions.

Further, this repository contains SRL labels generated by an automatic SRL tagger, applied to a custom corpus of approximately 5M words of American-English child-directed language, which can be found in data/pre_processed/childes-20191206_mlm.txt. The data file that contains both utterances and SRL annotation is in data/pre_processed/childes-20191206_srl.txt.

History

  • 2008: The BabySRL project started as a collaboration between Cynthia Fisher, Dan Roth, Michael Connor and Yael Gertner, whose published work is available here.

  • 2016: The most recent work, prior to this, can be found here

  • 2019: Under the supervision of Cynthia Fisher at the Department of Psychology at UIUC, explorations into the ability of BERT to perform SRL tagging began. In particular, we experimented with joint training on SRL and MLM. The joint training procedure is similar to what is proposed in https://arxiv.org/pdf/1901.11504.pdf.

  • 2020 (Summer): Having found little benefit for joint SRL and MLM training BERT on CHILDES, a new line of research into the grammatical capability of RoBERTa began. Development moved here.

Generating the CHILDES-SRL corpus

To annotate 5M words of child-directed speech using a semantic role tagger, trained by AllenNLP, execute data_tools/make_srl_training_data_from_model.py

To generate a corpus of human-labeled semantic role labels for a small section of CHILDES, execute data_tools/make_srl_training_data_from_human.py

Quality of auto-generated tags

How well does AllenNLP SRL tagger perform on CHILDES 2008 SRL data? Below is a list of f1 scores, comparing its performance with that of trained human annotators.

      ARG-A1 f1= 0.00
      ARG-A4 f1= 0.00
     ARG-LOC f1= 0.00
        ARG0 f1= 0.95
        ARG1 f1= 0.93
        ARG2 f1= 0.79
        ARG3 f1= 0.44
        ARG4 f1= 0.80
    ARGM-ADV f1= 0.70
    ARGM-CAU f1= 0.84
    ARGM-COM f1= 0.00
    ARGM-DIR f1= 0.48
    ARGM-DIS f1= 0.68
    ARGM-EXT f1= 0.38
    ARGM-GOL f1= 0.00
    ARGM-LOC f1= 0.68
    ARGM-MNR f1= 0.68
    ARGM-MOD f1= 0.78
    ARGM-NEG f1= 0.99
    ARGM-PNC f1= 0.03
    ARGM-PPR f1= 0.00
    ARGM-PRD f1= 0.15
    ARGM-PRP f1= 0.39
    ARGM-RCL f1= 0.00
    ARGM-REC f1= 0.00
    ARGM-TMP f1= 0.84
      ARGRG1 f1= 0.00
      R-ARG0 f1= 0.00
      R-ARG1 f1= 0.00
  R-ARGM-CAU f1= 0.00
  R-ARGM-LOC f1= 0.00
  R-ARGM-TMP f1= 0.00
     overall f1= 0.88

Compatibility

Tested on Ubuntu 16.04, Python 3.6, and torch==1.2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].