All Projects → napsternxg → TwitterNER

napsternxg / TwitterNER

Licence: GPL-3.0 license
Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
Roff
2310 projects
Makefile
30231 projects

Projects that are alternatives of or similar to TwitterNER

watchman
Watchman: An open-source social-media event-detection system
Stars: ✭ 18 (-86.57%)
Mutual labels:  social-media, named-entity-recognition
meta-coronavirus-dataset
MetaCOVID: META-Coronavrius dataset repository
Stars: ✭ 37 (-72.39%)
Mutual labels:  social-media
ner-tagger-dynet
See http://github.com/onurgu/joint-ner-and-md-tagger This repository is basically a Bi-LSTM based sequence tagger in both Tensorflow and Dynet which can utilize several sources of information about each word unit like word embeddings, character based embeddings and morphological tags from an FST to obtain the representation for that specific wor…
Stars: ✭ 23 (-82.84%)
Mutual labels:  named-entity-recognition
fastai sequence tagging
sequence tagging for NER for ULMFiT
Stars: ✭ 21 (-84.33%)
Mutual labels:  named-entity-recognition
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (-32.09%)
Mutual labels:  named-entity-recognition
zapread.com
Website for zapread.com
Stars: ✭ 19 (-85.82%)
Mutual labels:  social-media
PersianNER
Named-Entity Recognition in Persian Language
Stars: ✭ 48 (-64.18%)
Mutual labels:  named-entity-recognition
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (-48.51%)
Mutual labels:  named-entity-recognition
Twitter
[READ ONLY] Subtree split of the SocialiteProviders/Twitter Provider (see SocialiteProviders/Providers)
Stars: ✭ 21 (-84.33%)
Mutual labels:  social-media
LinkedIn Scraper
🙋 A Selenium based automated program that scrapes profiles data,stores in CSV,follows them and saves their profile in PDF.
Stars: ✭ 25 (-81.34%)
Mutual labels:  social-media
sequence labeling tf
Sequence Labeling in Tensorflow
Stars: ✭ 18 (-86.57%)
Mutual labels:  named-entity-recognition
felfele
Decentralized social application that respects your privacy
Stars: ✭ 30 (-77.61%)
Mutual labels:  social-media
Devise-Omniauth-Multiple-Providers
Devise Multiple Omniauth Providers
Stars: ✭ 34 (-74.63%)
Mutual labels:  social-media
big-data-upf
RECSM-UPF Summer School: Social Media and Big Data Research
Stars: ✭ 21 (-84.33%)
Mutual labels:  social-media
Nallagram
Nallagram is an open source social networking platform where users can share their views on various topics and interact among people in which they create, share, and/or exchange information and ideas in virtual communities and networks.
Stars: ✭ 30 (-77.61%)
Mutual labels:  social-media
namaco
Character Based Named Entity Recognition.
Stars: ✭ 41 (-69.4%)
Mutual labels:  named-entity-recognition
social-media-hacker-list
Growing list of apps and tools for enhancing social media experiences.
Stars: ✭ 198 (+47.76%)
Mutual labels:  social-media
molminer
Python library and command-line tool for extracting compounds from scientific literature. Written in Python.
Stars: ✭ 38 (-71.64%)
Mutual labels:  named-entity-recognition
CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Stars: ✭ 87 (-35.07%)
Mutual labels:  named-entity-recognition
banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
Stars: ✭ 186 (+38.81%)
Mutual labels:  named-entity-recognition

TwitterNER DOI

Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html and the corresponding workshop paper at WNUT COLING 2016, titled Semi-supervised Named Entity Recognition in noisy-text by Shubhanshu Mishra and Jana Diesner

Please cite as:

@inproceedings{mishra-diesner-2016-semi,
    title = "Semi-supervised Named Entity Recognition in noisy-text",
    author = "Mishra, Shubhanshu  and
      Diesner, Jana",
    booktitle = "Proceedings of the 2nd Workshop on Noisy User-generated Text ({WNUT})",
    month = dec,
    year = "2016",
    address = "Osaka, Japan",
    publisher = "The COLING 2016 Organizing Committee",
    url = "https://aclanthology.org/W16-3927",
    pages = "203--212",
}

model architechture

Installation

pip install -r requirements.txt
cd data
wget http://nlp.stanford.edu/data/glove.twitter.27B.zip
unzip glove.twitter.27B.zip
cd ..

Usage

$ cd NoisyNLP
$ python
>>> from run_ner import TwitterNER
>>> from twokenize import tokenizeRawTweetText
>>> ner = TwitterNER()
>>> tweet = "Beautiful day in Chicago! Nice to get away from the Florida heat."
>>> tokens = tokenizeRawTweetText(tweet)
>>> ner.get_entities(tokens)
[(3, 4, 'LOCATION'), (11, 12, 'LOCATION')]
>>> " ".join(tokens[3:4])
'Chicago'
>>> " ".join(tokens[11:12])
'Florida'

Data download

The dataset used in this repository can bs downloaded from https://github.com/aritter/twitter_nlp/tree/master/data/annotated/wnut16

Submitted Solution [ST]

See Word2Vec.ipynb for details on the original submitted solution for the task.

Improved model

See Run Experiments.ipynb for the details on the improved system. See Run Experiment.ipynb for the details on the improved system with test data.

Using the API

The final system is packaged as an API specified in the folder NoisyNLP. More updates will be made to the API in upcoming days. See Run Experiment.ipynb for API usage.

Downloading Gazetteers

See Updated Gazetteers.ipynb, Extra Gazetteers.ipynb, Download Wikidata.ipynb

Generating word clusters

See Gen new clusters.ipynb

Data Pre-processing

See Data preprocessing.ipynb

Preliminary comparison with RNN models

See KerasCharRNN.ipynb, and KerasWordRNN.ipynb

Acknowledgements

  • George Cooper - Making the model available as a python library.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].