Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → SrinidhiRaghavan → AI-Sentiment-Analysis-on-IMDB-Dataset

SrinidhiRaghavan / AI-Sentiment-Analysis-on-IMDB-Dataset

Licence: other

Sentiment Analysis using Stochastic Gradient Descent on 50,000 Movie Reviews Compiled from the IMDB Dataset

Programming Languages

139335 projects - #7 most used programming language

Labels

sentiment-analysis artificial-intelligence nlp-machine-learning imdb-sentiment-analysis

Projects that are alternatives of or similar to AI-Sentiment-Analysis-on-IMDB-Dataset

💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA

Stars: ✭ 19 (-65.45%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Awesome Sentiment Analysis

Repository with all what is necessary for sentiment analysis and related areas

Stars: ✭ 459 (+734.55%)

Mutual labels: sentiment-analysis, nlp-machine-learning

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

Stars: ✭ 32 (-41.82%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Customer satisfaction analysis

基于在线民宿 UGC 数据的意见挖掘项目，包含数据挖掘和NLP 相关的处理，负责数据采集、主题抽取、情感分析等任务。目的是克服用户打分和评论不一致，实时对在线民宿的满意度评测，包含在线评论采集和情感可视化分析。搭建了百度地图POI查询入口，可以进行自动化的批量查询 POI 信息的功能；构建了基于在线民宿语料的 LDA 自动主题聚类模型，利用主题中心词能找出对应的主题属性字典；以用户打分作为标注，然后 litNlp 自带的字符级 TextCNN 进行情感分析，将情感分类概率分布作为情感趋势，最后通过 POI 热力图的方式对不同地域的民宿满意度进行展示。软件版本请见链接。

Stars: ✭ 262 (+376.36%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Stars: ✭ 143 (+160%)

Mutual labels: sentiment-analysis, nlp-machine-learning

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (-7.27%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (+550.91%)

Mutual labels: sentiment-analysis, nlp-machine-learning

brand-sentiment-analysis

Scripts utilizing Heartex platform to build brand sentiment analysis from the news

Stars: ✭ 21 (-61.82%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (+125.45%)

Mutual labels: sentiment-analysis, nlp-machine-learning

📓 Long(er) text representation and classification using Doc2Vec embeddings

Stars: ✭ 92 (+67.27%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Text Classification Keras

📚 Text classification library with Keras

Stars: ✭ 53 (-3.64%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Datastories Semeval2017 Task4

Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

Stars: ✭ 184 (+234.55%)

Mutual labels: sentiment-analysis, nlp-machine-learning

Pytorch Sentiment Neuron

Stars: ✭ 178 (+223.64%)

Mutual labels: sentiment-analysis, nlp-machine-learning

paribhasha.herokuapp.com/

Stars: ✭ 21 (-61.82%)

Mutual labels: sentiment-analysis, nlp-machine-learning

An emotion-polarity classifier specifically trained on developers' communication channels

Stars: ✭ 41 (-25.45%)

Mutual labels: sentiment-analysis

DeepLearningReading

Deep Learning and Machine Learning mini-projects. Current Project: Deepmind Attentive Reader (rc-data)

Stars: ✭ 78 (+41.82%)

Mutual labels: nlp-machine-learning

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (+0%)

Mutual labels: sentiment-analysis

Long-term analysis of emotion, age, and sentiment using Lifeslice and text records.

Stars: ✭ 23 (-58.18%)

Mutual labels: sentiment-analysis

deep-semantic-code-search

Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application

Stars: ✭ 63 (+14.55%)

Mutual labels: nlp-machine-learning

⚽ Notebook feito para analisar o case do Sentibol

Stars: ✭ 18 (-67.27%)

Mutual labels: sentiment-analysis

View All Similar Projects ➔

AI-Sentiment-Analysis-on-IMDB-Dataset

Introduction

Given the availability of a large volume of online review data (Amazon, IMDB, etc.), sentiment analysis becomes increasingly important. In this project, a sentiment classifier is built which evaluates the polarity of a piece of text being either positive or negative.

Getting the Dataset

The "Large Movie Review Dataset"(*) shall be used for this project. The dataset is compiled from a collection of 50,000 reviews from IMDB on the condition there are no more than 30 reviews per movie. The numbers of positive and negative reviews are equal. Negative reviews have scores less or equal than 4 out of 10 while a positive review have score greater or equal than 7 out of 10. Neutral reviews are not included. The 50,000 reviews are divided evenly into the training and test set.

The Training Dataset used is stored in the zipped folder: aclImbdb.tar file. This can also be downloaded from: http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz.

The Test Dataset is stored in the folder named 'test'

Data Preprocessing

The training dataset in aclImdb folder has two sub-directories pos/ for positive texts and neg/ for negative ones. Use only these two directories. The first task is to combine both of them to a single csv file, “imdb_tr.csv”. The csv file has three columns,"row_number" and “text” and “polarity”. The column “text” contains review texts from the aclImdb database and the column “polarity” consists of sentiment labels, 1 for positive and 0 for negative. The file imdb_tr.csv is an output of this preprocessing. In addition, common English stopwords should be removed. An English stopwords reference ('stopwords.en') is given in the code for reference.

Data Representations Used

Unigram , Bigram , TfIdf

Algorithmic Overview

In this project, we will train a Stochastic Gradient Descent Classifier. This is used instead of gradient descent as gradient descent is prohibitively expensive when the dataset is extremely large because every single data point needs to be processed. SGD algorithm performs just as good with a small random subset of the original data. This is the central idea of Stochastic SGD and particularly handy for the text data since text corpus are often humongous.

A good description of this algorithm can be found at: https://en.wikipedia.org/wiki/Stochastic_gradient_descent.

Functions used in the driver_3 file

imdb_data_preprocess : Explores the neg and pos folders from aclImdb/train and creates a imdb_tr.csv file in the required format

remove_stopwords : Takes a sentence and the stopwords as inputs and returns the sentence without any stopwords

unigram_process : Takes the data to be fit as the input and returns a vectorizer of the unigram as output

bigram_process : Takes the data to be fit as the input and returns a vectorizer of the bigram as output

tfidf_process : Takes the data to be fit as the input and returns a vectorizer of the tfidf as output

retrieve_data : Takes a CSV file as the input and returns the corresponding arrays of labels and data as output

stochastic_descent : Applies Stochastic on the training data and returns the predicted labels

accuracy : Finds the accuracy in percentage given the training and test labels

write_txt : Writes the given data to a text file

Environment

Language : Python 3

Libraries : Scikit, Pandas

How to Execute?

Run python driver_3.py

Results

Output files are:

unigram.output

unigramtfidf.output

bigram.output

bigramtfidf.output

Here, 1 is given for positive labels and 0 is for negative labels

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 55

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗