Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → emorynlp → Character Mining

emorynlp / Character Mining

Licence: other

Mining individual characters in multiparty dialogue

Programming Languages

139335 projects - #7 most used programming language

Labels

natural-language-processing

Projects that are alternatives of or similar to Character Mining

Pre-Trained Chinese XLNet（中文XLNet预训练模型）

Stars: ✭ 1,213 (+1262.92%)

Mutual labels: natural-language-processing

SimpleDNN is a machine learning lightweight open-source library written in Kotlin designed to support relevant neural network architectures in natural language processing tasks

Stars: ✭ 81 (-8.99%)

Mutual labels: natural-language-processing

Semantic Texual Similarity Toolkits

Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text.

Stars: ✭ 87 (-2.25%)

Mutual labels: natural-language-processing

Oxford Deep NLP 2017 course - Practical 3: Text Classification with RNNs

Stars: ✭ 78 (-12.36%)

Mutual labels: natural-language-processing

A simple markup language to write novel with types.

Stars: ✭ 80 (-10.11%)

Mutual labels: natural-language-processing

[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Stars: ✭ 84 (-5.62%)

Mutual labels: natural-language-processing

A collection of 500+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML)

Stars: ✭ 1,203 (+1251.69%)

Mutual labels: natural-language-processing

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (-1.12%)

Mutual labels: natural-language-processing

🤹‍♀️ Query spaCy's linguistic annotations using GraphQL

Stars: ✭ 81 (-8.99%)

Mutual labels: natural-language-processing

A high-level machine learning and deep learning library for the PHP language.

Stars: ✭ 1,270 (+1326.97%)

Mutual labels: natural-language-processing

State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc.

Stars: ✭ 1,215 (+1265.17%)

Mutual labels: natural-language-processing

Neural machine translation and sequence learning using TensorFlow

Stars: ✭ 1,223 (+1274.16%)

Mutual labels: natural-language-processing

Oxford Deep NLP 2017 course - Open practical

Stars: ✭ 84 (-5.62%)

Mutual labels: natural-language-processing

Text Dependency Parser

🏄 依存关系分析，NLP，自然语言处理

Stars: ✭ 78 (-12.36%)

Mutual labels: natural-language-processing

Cornell Semantic Parsing Framework

Stars: ✭ 87 (-2.25%)

Mutual labels: natural-language-processing

Multimodal Toolkit

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

Stars: ✭ 78 (-12.36%)

Mutual labels: natural-language-processing

A Greek edition of BERT pre-trained language model

Stars: ✭ 84 (-5.62%)

Mutual labels: natural-language-processing

Virtual Assistant

A linux based Virtual assistant on Artificial Intelligence in C

Stars: ✭ 88 (-1.12%)

Mutual labels: natural-language-processing

Knowledge Base Question Answering using memory networks

Stars: ✭ 87 (-2.25%)

Mutual labels: natural-language-processing

Turkish Bert Nlp Pipeline

Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.

Stars: ✭ 85 (-4.49%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

Character Mining

The Character Mining project challenges machine comprehension on multiparty dialogue. The objective of this project is to infer explicit and implicit contexts about individual characters through their conversations. This is an open-source project led by the Emory NLP research group that provides resources for the following tasks:

Character Identification (since May 2016).
Emotion Detection (since May 2017).
Reading Comprehension (since May 2018).
Questiong Answering (since May 2019).
Personality Detection (since Sep 2019).

We welcome feedbacks and contributions from the community. Most of our annotation are crowdsourced; implying that, errors are expected to be found. Please make pull requests if you wish to fix errors in our datasets.

Dataset

Our dataset is based on the popular TV show called Friends. Transcripts for all 10 seasons of the show as well as manual and crowdsourced annotation for subparts of the show are provided. All text data are available in the JSON files; please visit the individual task pages to retrieve datasets specifically designed for those tasks.

Statistics

Each season consists of episodes, each episode is divided into scenes, each scene comprises utterances, each utterance is a list of sentences where tokens are split.

Season ID	Episodes	Scenes	Utterances	Sentences	Tokens	Speakers
s01	24	326	5,968	10,790	81,453	107
s02	24	293	5,747	9,337	81,910	107
s03	25	348	6,495	10,858	90,753	108
s04	24	338	6,318	10,889	87,289	100
s05	24	311	6,220	11,133	83,907	107
s06	25	350	6,458	11,496	90,384	112
s07	24	332	6,314	11,340	84,974	94
s08	24	288	6,220	11,714	86,164	107
s09	24	302	6,322	11,831	93,773	99
s10	18	219	5,247	9,345	69,493	78
Total	236	3,107	61,309	108,733	850,100	700

Some utterances include action notes. In the following example, extracted from s01_e01_c01_u028, the speaker is talking to Ross, which is indicated by the action note:

"transcript": "Let me get you some coffee.",
"transcript_with_note": "(to Ross) Let me get you some coffee.",

The followings show the statistics including action notes:

Season ID	Utterances	Sentences	Tokens
s01	6,626	12,088	100,773
s02	6,048	10,565	97,763
s03	7,267	12,288	117,912
s04	7,119	12,811	116,703
s05	7,082	13,540	118,509
s06	7,235	13,506	120,471
s07	7,019	13,363	116,341
s08	6,845	13,321	109,984
s09	6,653	13,548	119,090
s10	5,479	11,029	93,390
Total	67,373	126,059	1,110,936

Documentations

How to retrieve information from the JSON files: load_json.ipynb.

References

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering. Changmao Li and Jinho D. Choi. In Proceedings of the Conference of the Association for Computational Linguistics, ACL'20, 2020.
Modeling Personality with Attentive Networks and Contextual Embeddings. Hang Jiang, Xianzhe Zhang, and Jinho D. Choi. In Proceedings of the AAAI Student Abstract and Poster Program, AAAI:SAP'20, 2020 (poster).
FriendsQA: Open-Domain Question Answering on TV Show Transcripts. Zhengzhe Yang and Jinho D. Choi. In Proceedings of the Annual Conference of the ACL Special Interest Group on Discourse and Dialogue, SIGDIAL'19, 2019 (slides).
They Exist! Introducing Plural Mentions to Coreference Resolution and Entity Linking. Ethan Zhou and Jinho D. Choi. In Proceedings of the 27th International Conference on Computational Linguistics, COLING'18, 2018 (slides).
SemEval 2018 Task 4: Character Identification on Multiparty Dialogues, Jinho D. Choi and Henry Y. Chen, Proceedings of the International Workshop on Semantic Evaluation, SemEval'18, 2018 (slides).
Challenging Reading Comprehension on Daily Conversation: Passage Completion on Multiparty Dialog. Kaixin Ma, Tomasz Jurczyk, and Jinho D. Choi. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL'18, 2018 (poster, source).
Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks. Sayyed Zahiri and Jinho D. Choi. In The AAAI Workshop on Affective Content Analysis, AFFCON'18, 2018.
Cross-domain Document Retrieval: Matching between Conversational and Formal Writings. Tomasz Jurczyk and Jinho D. Choi. In Proceedings of the EMNLP Workshop on Building Linguistically Generalizable NLP Systems, of BLGNLP'17, 2017 (slides).
Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts, Henry Y. Chen, Ethan Zhou, and Jinho D. Choi. Proceedings of the 21st Conference on Computational Natural Language Learning, CoNLL'17, 2017 (slides).
Text-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks. Kaixin Ma, Catherine Xiao, and Jinho D. Choi. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, ACL:SRW'17, 2017 (poster).
Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows, Henry Y. Chen and Jinho D. Choi. Proceedings of the 17th Annual SIGdial Meeting on Discourse and Dialogue, SIGDIAL'16, 2016 (poster).

Contact

Jinho D. Choi.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 89

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗