All Projects → ruanchaves → hashformers

ruanchaves / hashformers

Licence: MIT License
Hashformers is a framework for hashtag segmentation with transformers.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to hashformers

sentistrength id
Sentiment Strength Detection in Bahasa Indonesia
Stars: ✭ 32 (+77.78%)
Mutual labels:  sentiment-analysis, sentiment-polarity, sentiment-classification
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+17727.78%)
Mutual labels:  sentiment-analysis, transformers, sentiment-classification
Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (+444.44%)
Mutual labels:  twitter, sentiment-analysis, sentiment-classification
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (+33.33%)
Mutual labels:  sentiment-analysis, transformers, sentiment-classification
Senti4SD
An emotion-polarity classifier specifically trained on developers' communication channels
Stars: ✭ 41 (+127.78%)
Mutual labels:  sentiment-analysis, sentiment-polarity, sentiment-classification
arabic-sentiment-analysis
Sentiment Analysis in Arabic tweets
Stars: ✭ 64 (+255.56%)
Mutual labels:  sentiment-analysis, sentiment-classification
Text tone analyzer
Система, анализирующая тональность текстов и высказываний.
Stars: ✭ 15 (-16.67%)
Mutual labels:  sentiment-analysis, sentiment-classification
Dataset-Sentimen-Analisis-Bahasa-Indonesia
Repositori ini merupakan kumpulan dataset terkait analisis sentimen Berbahasa Indonesia. Apabila Anda menggunakan dataset-dataset yang ada pada repositori ini untuk penelitian, maka cantumkanlah/kutiplah jurnal artikel terkait dataset tersebut. Dataset yang tersedia telah diimplementasikan dalam beberapa penelitian dan hasilnya telah dipublikasi…
Stars: ✭ 38 (+111.11%)
Mutual labels:  sentiment-analysis, sentiment-classification
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+716.67%)
Mutual labels:  sentiment-analysis, transformers
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+738.89%)
Mutual labels:  transformers, word-segmentation
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (+266.67%)
Mutual labels:  sentiment-analysis, sentiment-classification
pysentimiento
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
Stars: ✭ 274 (+1422.22%)
Mutual labels:  sentiment-analysis, transformers
german-sentiment
A data set and model for german sentiment classification.
Stars: ✭ 37 (+105.56%)
Mutual labels:  sentiment-analysis, sentiment-classification
german-sentiment-lib
An easy to use python package for deep learning-based german sentiment classification.
Stars: ✭ 33 (+83.33%)
Mutual labels:  sentiment-analysis, sentiment-classification
NewsMTSC
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Stars: ✭ 54 (+200%)
Mutual labels:  sentiment-analysis, sentiment-classification
brand-sentiment-analysis
Scripts utilizing Heartex platform to build brand sentiment analysis from the news
Stars: ✭ 21 (+16.67%)
Mutual labels:  sentiment-analysis, sentiment-classification
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (+77.78%)
Mutual labels:  sentiment-analysis, sentiment-classification
wink-sentiment
Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉
Stars: ✭ 51 (+183.33%)
Mutual labels:  sentiment-analysis, sentiment-classification
sentiment-thermometer
Measure the sentiment towards a word, name or sentence on social networks
Stars: ✭ 56 (+211.11%)
Mutual labels:  twitter, sentiment-analysis
sentiment analysis dict
sentiment analysis、情感分析、文本分类、基于字典、python、classification
Stars: ✭ 111 (+516.67%)
Mutual labels:  sentiment-analysis, sentiment-classification

Open In Colab PyPi license

Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag.

Hashformers applies Transformer models to hashtag segmentation. It is built on top of the transformers library and the lm-scorer and mlm-scoring packages.

Try it right now on Google Colab.

Paper: Zero-shot hashtag segmentation for multilingual sentiment analysis

Basic usage

from hashformers import WordSegmenter

ws = WordSegmenter(
    segmenter_model_name_or_path="gpt2",
    reranker_model_name_or_path="bert-base-uncased",
    use_reranker=True
)

segmentations = ws.segment([
    "#myoldphonesucks",
    "#latinosinthedeepsouth",
    "#weneedanationalpark",
    "#LandoftheLost",
    "#icecold",
    "#Heartbreaker",
    "#TheRiseGuys"
])

print(segmentations)

# ['my old phone sucks',
# 'latinos in the deep south',
# 'we need a national park',
# 'Land of the Lost',
# 'ice cold',
# 'Heartbreaker',
# 'The Rise Guys']

Installation

Installation steps are described on this notebook. A Docker image is coming soon.

Examples

Applications of hashtag segmentation to tweet sentiment analysis and the automatic translation of tweets can be found on the examples folder.

Contributing

Pull requests are welcome! We need to improve on the documentation and code quality of this repository. It's also a good idea to implement more sophisticated ensembling techniques. Read our paper for more details on the inner workings of our framework.

Citation

@misc{rodrigues2021zeroshot,
      title={Zero-shot hashtag segmentation for multilingual sentiment analysis}, 
      author={Ruan Chaves Rodrigues and Marcelo Akira Inuzuka and Juliana Resplande Sant'Anna Gomes and Acquila Santos Rocha and Iacer Calixto and Hugo Alexandre Dantas do Nascimento},
      year={2021},
      eprint={2112.03213},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].