All Projects → shibing624 → pysenti

shibing624 / pysenti

Licence: Apache-2.0 license
Chinese Sentiment Classification Tool. 情感极性分类,基于知网、清华、BosonNLP情感词典,易扩展,基准方法,开箱即用。

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pysenti

analyzing-reddit-sentiment-with-aws
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Stars: ✭ 40 (+29.03%)
Mutual labels:  sentiment-classification
ACAN
Code for NAACL 2019 paper: Adversarial Category Alignment Network for Cross-domain Sentiment Classification
Stars: ✭ 23 (-25.81%)
Mutual labels:  sentiment-classification
bert-movie-reviews-sentiment-classifier
Build a Movie Reviews Sentiment Classifier with Google's BERT Language Model
Stars: ✭ 12 (-61.29%)
Mutual labels:  sentiment-classification
german-sentiment-lib
An easy to use python package for deep learning-based german sentiment classification.
Stars: ✭ 33 (+6.45%)
Mutual labels:  sentiment-classification
arabic-sentiment-analysis
Sentiment Analysis in Arabic tweets
Stars: ✭ 64 (+106.45%)
Mutual labels:  sentiment-classification
Text tone analyzer
Система, анализирующая тональность текстов и высказываний.
Stars: ✭ 15 (-51.61%)
Mutual labels:  sentiment-classification
empythy
Automated NLP sentiment predictions- batteries included, or use your own data
Stars: ✭ 17 (-45.16%)
Mutual labels:  sentiment-classification
domain-attention
codes for paper "Domain Attention Model for Multi-Domain Sentiment Classification"
Stars: ✭ 22 (-29.03%)
Mutual labels:  sentiment-classification
cnn-text-classification
Text classification with Convolution Neural Networks on Yelp, IMDB & sentence polarity dataset v1.0
Stars: ✭ 108 (+248.39%)
Mutual labels:  sentiment-classification
Deep-learning-And-Paper
【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等
Stars: ✭ 62 (+100%)
Mutual labels:  sentiment-classification
german-sentiment
A data set and model for german sentiment classification.
Stars: ✭ 37 (+19.35%)
Mutual labels:  sentiment-classification
StockerBot
Twitter Bot to follow financial trends in publicly traded companies
Stars: ✭ 77 (+148.39%)
Mutual labels:  sentiment-classification
Dataset-Sentimen-Analisis-Bahasa-Indonesia
Repositori ini merupakan kumpulan dataset terkait analisis sentimen Berbahasa Indonesia. Apabila Anda menggunakan dataset-dataset yang ada pada repositori ini untuk penelitian, maka cantumkanlah/kutiplah jurnal artikel terkait dataset tersebut. Dataset yang tersedia telah diimplementasikan dalam beberapa penelitian dan hasilnya telah dipublikasi…
Stars: ✭ 38 (+22.58%)
Mutual labels:  sentiment-classification
brand-sentiment-analysis
Scripts utilizing Heartex platform to build brand sentiment analysis from the news
Stars: ✭ 21 (-32.26%)
Mutual labels:  sentiment-classification
keras-with-google-cloud-ml-engine
Build A Chinese Movie Sentiment Classifier with Keras and Google Cloud ML Engine
Stars: ✭ 16 (-48.39%)
Mutual labels:  sentiment-classification
sentiment analysis dict
sentiment analysis、情感分析、文本分类、基于字典、python、classification
Stars: ✭ 111 (+258.06%)
Mutual labels:  sentiment-classification
HierarchicalAttentionNetworks
Hierarchical Attention Networks for Document Classification in Keras
Stars: ✭ 70 (+125.81%)
Mutual labels:  sentiment-classification
sentistrength id
Sentiment Strength Detection in Bahasa Indonesia
Stars: ✭ 32 (+3.23%)
Mutual labels:  sentiment-classification
HSSC
Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)
Stars: ✭ 23 (-25.81%)
Mutual labels:  sentiment-classification
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-22.58%)
Mutual labels:  sentiment-classification

PyPI version License Apache 2.0 Language

pysenti

Chinese Sentiment Classification Tool for Python. 中文情感极性分析工具。

pysenti基于规则词典的情感极性分析,扩展性强,可作为调研用的基准方法。

Question

文本情感极性(倾向)分析咋做?

Solution

规则的解决思路

  1. 中文情感极性分析,文本切分为段落,再切词,通过情感词标识出各个词语的情感极性,包括积极、中立、消极。
  2. 结合句子结构(包括连词、否定词、副词、标点等)给各情感词语的情感极性赋予权重,然后加权求和得到文本的情感极性得分。
  3. 优点:泛化性好,规则可扩展性强,所有领域通用。
  4. 缺点:规则词典收集困难,专家系统的权重设定有局限,单一领域准确率相比模型方法低。

模型的解决思路

  1. 常见的NLP文本分类模型均可,包括经典文本分类模型(LR、SVM、Xgboost等)和深度文本分类模型(TextCNN、Bi-LSTM、BERT等)。
  2. 优点:单一领域准召率高。
  3. 缺点:不通用,有标注数据的样本收集困难,扩展性弱。

Feature

规则

模型

  • bayes 文本分类模型
  • 样本数据来自商品评论数据,分为积极、消极两类。

Demo

http://42.193.145.218/product/sentiment_classify/

Install

  • 全自动安装:pip3 install pysenti
  • 半自动安装:
git clone https://github.com/shibing624/pysenti.git
cd pysenti
python3 setup.py install

Usage

规则方法

import pysenti

texts = ["苹果是一家伟大的公司",
         "土豆丝很好吃",
         "土豆丝很难吃"]
for i in texts:
    r = pysenti.classify(i)
    print(i, r['score'], r)

output:

苹果是一家伟大的公司 3.4346924811096997 {'score': 3.4346924811096997, 'sub_clause0': {'score': 3.4346924811096997, 'sentiment': [{'key': '苹果', 'adverb': [], 'denial': [], 'value': 1.37846341627, 'score': 1.37846341627}, {'key': '是', 'adverb': [], 'denial': [], 'value': -0.252600480826, 'score': -0.252600480826}, {'key': '一家', 'adverb': [], 'denial': [], 'value': 1.48470161748, 'score': 1.48470161748}, {'key': '伟大', 'adverb': [], 'denial': [], 'value': 1.14925252286, 'score': 1.14925252286}, {'key': '的', 'adverb': [], 'denial': [], 'value': 0.0353323193687, 'score': 0.0353323193687}, {'key': '公司', 'adverb': [], 'denial': [], 'value': -0.360456914043, 'score': -0.360456914043}], 'conjunction': []}}
土豆丝很好吃 2.294311221077 {'score': 2.294311221077, 'sub_clause0': {'score': 2.294311221077, 'sentiment': [{'key': '土豆丝', 'adverb': [], 'denial': [], 'value': 0.294892711165, 'score': 0.294892711165}, {'key': '很', 'adverb': [], 'denial': [], 'value': 0.530242664632, 'score': 0.530242664632}, {'key': '好吃', 'adverb': [], 'denial': [], 'value': 1.46917584528, 'score': 1.46917584528}], 'conjunction': []}}
土豆丝很难吃 -2.381874203563 {'score': -2.381874203563, 'sub_clause0': {'score': -2.381874203563, 'sentiment': [{'key': '土豆丝', 'adverb': [], 'denial': [], 'value': 0.294892711165, 'score': 0.294892711165}, {'key': '很', 'adverb': [], 'denial': [], 'value': 0.530242664632, 'score': 0.530242664632}, {'key': '难吃', 'adverb': [], 'denial': [], 'value': -3.20700957936, 'score': -3.20700957936}], 'conjunction': []}}

score: 正值是积极情感;负值是消极情感。

模型方法

from pysenti import model_classifier

texts = ["苹果是一家伟大的公司",
         "土豆丝很好吃",
         "土豆丝很难吃"]
for i in texts:
    result = model_classifier.classify(i)
    print(i, result)

output:

苹果是一家伟大的公司 {'positive_prob': 0.682, 'negative_prob': 0.318}
土豆丝很好吃 {'positive_prob': 0.601, 'negative_prob': 0.399}
土豆丝很难吃 {'positive_prob': 0.283, 'negative_prob': 0.717}

延迟加载机制

pysenti 采用延迟加载,import pysentifrom pysenti import rule_classifier 不会立即触发词典的加载,一旦有必要才开始加载词典。如果你想手工初始 pysenti,也可以手动初始化。

import pysenti
pysenti.rule_classifier.init()  # 手动初始化(可选)

有了延迟加载机制后,你可以改变主词典的路径:

pysenti.rule_classifier.init('data/sentiment_dict.txt')

命令行

使用示例: python -m pysenti news.txt > news_result.txt

命令行选项(翻译):

使用: python -m pysenti [options] filename

命令行界面

固定参数:
  filename              输入文件

可选参数:
  -h, --help            显示此帮助信息并退出
  -d DICT, --dict DICT  使用 DICT 代替默认词典
  -u USER_DICT, --user-dict USER_DICT
                        使用 USER_DICT 作为附加词典,与默认词典或自定义词典配合使用
  -a, --output-all      输出句子及词级别情感分析详细信息
  -V, --version         显示版本信息并退出

如果没有指定文件名,则使用标准输入。

--help选项输出:

$> python -m pysenti --help

usage: python3 -m pysenti [options] filename

pysenti command line interface.

positional arguments:
  filename              input file

optional arguments:
  -h, --help            show this help message and exit
  -d DICT, --dict DICT  use DICT as dictionary
  -u USER_DICT, --user-dict USER_DICT
                        use USER_DICT together with the default dictionary or
                        DICT (if specified)
  -a, --output-all      output text sentiment score and word sentiment info
  -V, --version         show program's version number and exit

If no filename specified, use STDIN instead.

Reference

  • snownlp
  • SentimentPolarityAnalysis
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].