All Projects → emorynlp → Character Mining

emorynlp / Character Mining

Licence: other
Mining individual characters in multiparty dialogue

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Character Mining

Chinese Xlnet
Pre-Trained Chinese XLNet(中文XLNet预训练模型)
Stars: ✭ 1,213 (+1262.92%)
Mutual labels:  natural-language-processing
Simplednn
SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectures in natural language processing tasks
Stars: ✭ 81 (-8.99%)
Mutual labels:  natural-language-processing
Semantic Texual Similarity Toolkits
Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text.
Stars: ✭ 87 (-2.25%)
Mutual labels:  natural-language-processing
Practical 3
Oxford Deep NLP 2017 course - Practical 3: Text Classification with RNNs
Stars: ✭ 78 (-12.36%)
Mutual labels:  natural-language-processing
Typenovel
A simple markup language to write novel with types.
Stars: ✭ 80 (-10.11%)
Mutual labels:  natural-language-processing
Scanrefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Stars: ✭ 84 (-5.62%)
Mutual labels:  natural-language-processing
Abigsurvey
A collection of 500+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML)
Stars: ✭ 1,203 (+1251.69%)
Mutual labels:  natural-language-processing
Spark Nlp Models
Models and Pipelines for the Spark NLP library
Stars: ✭ 88 (-1.12%)
Mutual labels:  natural-language-processing
Spacy Graphql
🤹‍♀️ Query spaCy's linguistic annotations using GraphQL
Stars: ✭ 81 (-8.99%)
Mutual labels:  natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+1326.97%)
Mutual labels:  natural-language-processing
Deepmoji
State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc.
Stars: ✭ 1,215 (+1265.17%)
Mutual labels:  natural-language-processing
Opennmt Tf
Neural machine translation and sequence learning using TensorFlow
Stars: ✭ 1,223 (+1274.16%)
Mutual labels:  natural-language-processing
Practical Open
Oxford Deep NLP 2017 course - Open practical
Stars: ✭ 84 (-5.62%)
Mutual labels:  natural-language-processing
Text Dependency Parser
🏄 依存关系分析,NLP,自然语言处理
Stars: ✭ 78 (-12.36%)
Mutual labels:  natural-language-processing
Spf
Cornell Semantic Parsing Framework
Stars: ✭ 87 (-2.25%)
Mutual labels:  natural-language-processing
Multimodal Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Stars: ✭ 78 (-12.36%)
Mutual labels:  natural-language-processing
Greek Bert
A Greek edition of BERT pre-trained language model
Stars: ✭ 84 (-5.62%)
Mutual labels:  natural-language-processing
Virtual Assistant
A linux based Virtual assistant on Artificial Intelligence in C
Stars: ✭ 88 (-1.12%)
Mutual labels:  natural-language-processing
Neural kbqa
Knowledge Base Question Answering using memory networks
Stars: ✭ 87 (-2.25%)
Mutual labels:  natural-language-processing
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-4.49%)
Mutual labels:  natural-language-processing

Character Mining

The Character Mining project challenges machine comprehension on multiparty dialogue. The objective of this project is to infer explicit and implicit contexts about individual characters through their conversations. This is an open-source project led by the Emory NLP research group that provides resources for the following tasks:

We welcome feedbacks and contributions from the community. Most of our annotation are crowdsourced; implying that, errors are expected to be found. Please make pull requests if you wish to fix errors in our datasets.

Dataset

Our dataset is based on the popular TV show called Friends. Transcripts for all 10 seasons of the show as well as manual and crowdsourced annotation for subparts of the show are provided. All text data are available in the JSON files; please visit the individual task pages to retrieve datasets specifically designed for those tasks.

Statistics

Each season consists of episodes, each episode is divided into scenes, each scene comprises utterances, each utterance is a list of sentences where tokens are split.

Season ID Episodes Scenes Utterances Sentences Tokens Speakers
s01 24 326 5,968 10,790 81,453 107
s02 24 293 5,747 9,337 81,910 107
s03 25 348 6,495 10,858 90,753 108
s04 24 338 6,318 10,889 87,289 100
s05 24 311 6,220 11,133 83,907 107
s06 25 350 6,458 11,496 90,384 112
s07 24 332 6,314 11,340 84,974 94
s08 24 288 6,220 11,714 86,164 107
s09 24 302 6,322 11,831 93,773 99
s10 18 219 5,247 9,345 69,493 78
Total 236 3,107 61,309 108,733 850,100 700

Some utterances include action notes. In the following example, extracted from s01_e01_c01_u028, the speaker is talking to Ross, which is indicated by the action note:

"transcript": "Let me get you some coffee.",
"transcript_with_note": "(to Ross) Let me get you some coffee.",

The followings show the statistics including action notes:

Season ID Utterances Sentences Tokens
s01 6,626 12,088 100,773
s02 6,048 10,565 97,763
s03 7,267 12,288 117,912
s04 7,119 12,811 116,703
s05 7,082 13,540 118,509
s06 7,235 13,506 120,471
s07 7,019 13,363 116,341
s08 6,845 13,321 109,984
s09 6,653 13,548 119,090
s10 5,479 11,029 93,390
Total 67,373 126,059 1,110,936

Documentations

References

Contact

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].