phueb / CHILDES-SRL

Licence: other

Research code for generating semantic role labels for CHILDES

Programming Languages

python

139335 projects - #7 most used programming language

perl

6916 projects

Projects that are alternatives of or similar to CHILDES-SRL

ssmtool

Simple sentence mining tool for language learning

Stars: ✭ 92 (+557.14%)

Mutual labels: language-learning

ankimaker

Automatically generates Anki decks from many sources

Stars: ✭ 43 (+207.14%)

Mutual labels: language-learning

fluentcards

Flashcards from dictionary look-ups

Stars: ✭ 41 (+192.86%)

Mutual labels: language-learning

VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Stars: ✭ 41 (+192.86%)

Mutual labels: srl

allennlp imdb

AllenNLP Startup Guide

Stars: ✭ 13 (-7.14%)

Mutual labels: allennlp

wallpaper-learn

Learn languages, facts, schoolwork, and more from your wallpaper by setting a cycling background with computer-generated images.

Stars: ✭ 22 (+57.14%)

Mutual labels: language-learning

kotlin-learn

솔직히 코틀린 하면서 모르는 거 있을 걸요?

Stars: ✭ 63 (+350%)

Mutual labels: language-learning

immersive

Language learning mpv script for looking up words within mpv and generating Anki cards

Stars: ✭ 43 (+207.14%)

Mutual labels: language-learning

naacl2019-select-pretraining-data-for-ner

BiLSTM-CRF model for NER

Stars: ✭ 15 (+7.14%)

Mutual labels: allennlp

san

The official PyTorch implementation of "Context Matters: Self-Attention for sign Language Recognition"

Stars: ✭ 17 (+21.43%)

Mutual labels: language-learning

paper

Computer Foundations Practices

Stars: ✭ 17 (+21.43%)

Mutual labels: language-learning

salty bot

Twitch chat bot

Stars: ✭ 15 (+7.14%)

Mutual labels: srl

youtube-flashcards

Extract screenshots & audio clips from YouTube videos into Anki cards

Stars: ✭ 52 (+271.43%)

Mutual labels: language-learning

wandb-allennlp

Utilities and boilerplate code to use wandb with allennlp

Stars: ✭ 20 (+42.86%)

Mutual labels: allennlp

optuna-allennlp

🚀 A demonstration of hyperparameter optimization using Optuna for models implemented with AllenNLP.

Stars: ✭ 17 (+21.43%)

Mutual labels: allennlp

hsk-vocabulary

🇨🇳Open source Chinese HSK vocabulary list with example sentences

Stars: ✭ 27 (+92.86%)

Mutual labels: language-learning

athnlp-labs

Athens NLP Summer School Labs

Stars: ✭ 41 (+192.86%)

Mutual labels: allennlp

subadub

Chrome+Firefox extension for studying foreign languages using Netflix subtitles

Stars: ✭ 103 (+635.71%)

Mutual labels: language-learning

omnilingo

Listening-based language learning

Stars: ✭ 31 (+121.43%)

Mutual labels: language-learning

allennlp-optuna

⚡️ AllenNLP plugin for adding subcommands to use Optuna, making hyperparameter optimization easy

Stars: ✭ 33 (+135.71%)

Mutual labels: allennlp

View All Similar Projects ➔

CHILDES-SRL

A corpus of semantic role labels auto-generated for 5M words of American-English child-directed speech.

Purpose

The purpose of this repository is to:

host the CHILDES-SRL corpus, and code to generate it, and
suggest recipes for training BERT on CHILDES-SRL for classifying token spans into semantic role arguments.

Inspiration and code for a BERT-based semantic role labeler comes from the AllenNLP toolkit. A SRL demo can be found here.

The code is for research purpose only.

Data

There are 2 manually annotated ("human-based") datasets, named after the year of their release:

data/pre_processed/human-based-2018_srl.txt
data/pre_processed/human-based-2008_srl.txt

The latter is an extended version of the former, which also includes SRL annotation for prepositions.

Further, this repository contains SRL labels generated by an automatic SRL tagger, applied to a custom corpus of approximately 5M words of American-English child-directed language, which can be found in data/pre_processed/childes-20191206_mlm.txt. The data file that contains both utterances and SRL annotation is in data/pre_processed/childes-20191206_srl.txt.

History

2008: The BabySRL project started as a collaboration between Cynthia Fisher, Dan Roth, Michael Connor and Yael Gertner, whose published work is available here.
2016: The most recent work, prior to this, can be found here
2019: Under the supervision of Cynthia Fisher at the Department of Psychology at UIUC, explorations into the ability of BERT to perform SRL tagging began. In particular, we experimented with joint training on SRL and MLM. The joint training procedure is similar to what is proposed in https://arxiv.org/pdf/1901.11504.pdf.
2020 (Summer): Having found little benefit for joint SRL and MLM training BERT on CHILDES, a new line of research into the grammatical capability of RoBERTa began. Development moved here.

Generating the CHILDES-SRL corpus

To annotate 5M words of child-directed speech using a semantic role tagger, trained by AllenNLP, execute data_tools/make_srl_training_data_from_model.py

To generate a corpus of human-labeled semantic role labels for a small section of CHILDES, execute data_tools/make_srl_training_data_from_human.py

Quality of auto-generated tags

How well does AllenNLP SRL tagger perform on CHILDES 2008 SRL data? Below is a list of f1 scores, comparing its performance with that of trained human annotators.

      ARG-A1 f1= 0.00
      ARG-A4 f1= 0.00
     ARG-LOC f1= 0.00
        ARG0 f1= 0.95
        ARG1 f1= 0.93
        ARG2 f1= 0.79
        ARG3 f1= 0.44
        ARG4 f1= 0.80
    ARGM-ADV f1= 0.70
    ARGM-CAU f1= 0.84
    ARGM-COM f1= 0.00
    ARGM-DIR f1= 0.48
    ARGM-DIS f1= 0.68
    ARGM-EXT f1= 0.38
    ARGM-GOL f1= 0.00
    ARGM-LOC f1= 0.68
    ARGM-MNR f1= 0.68
    ARGM-MOD f1= 0.78
    ARGM-NEG f1= 0.99
    ARGM-PNC f1= 0.03
    ARGM-PPR f1= 0.00
    ARGM-PRD f1= 0.15
    ARGM-PRP f1= 0.39
    ARGM-RCL f1= 0.00
    ARGM-REC f1= 0.00
    ARGM-TMP f1= 0.84
      ARGRG1 f1= 0.00
      R-ARG0 f1= 0.00
      R-ARG1 f1= 0.00
  R-ARGM-CAU f1= 0.00
  R-ARGM-LOC f1= 0.00
  R-ARGM-TMP f1= 0.00
     overall f1= 0.88

Compatibility

Tested on Ubuntu 16.04, Python 3.6, and torch==1.2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

phueb / CHILDES-SRL

Programming Languages

Labels

Projects that are alternatives of or similar to CHILDES-SRL

CHILDES-SRL

Purpose

Data

History

Generating the CHILDES-SRL corpus

Quality of auto-generated tags

Compatibility