All Projects → linanqiu → Word2vec Sentiments

linanqiu / Word2vec Sentiments

Tutorial for Sentiment Analysis using Doc2Vec in gensim (or "getting 87% accuracy in sentiment analysis in under 100 lines of code")

Projects that are alternatives of or similar to Word2vec Sentiments

Practical Deep Learning For Coders 2.0
Notebooks for the "A walk with fastai2" Study Group and Lecture Series
Stars: ✭ 638 (-3.19%)
Mutual labels:  jupyter-notebook
Food 101 Keras
Food Classification with Deep Learning in Keras / Tensorflow
Stars: ✭ 646 (-1.97%)
Mutual labels:  jupyter-notebook
Tutorials
Ipython notebooks for math and finance tutorials
Stars: ✭ 654 (-0.76%)
Mutual labels:  jupyter-notebook
Funcat
Funcat 将同花顺、通达信、文华财经麦语言等的公式写法移植到了 Python 中。
Stars: ✭ 642 (-2.58%)
Mutual labels:  jupyter-notebook
Tsfresh
Automatic extraction of relevant features from time series:
Stars: ✭ 6,077 (+822.15%)
Mutual labels:  jupyter-notebook
Saliency
TensorFlow implementation for SmoothGrad, Grad-CAM, Guided backprop, Integrated Gradients and other saliency techniques
Stars: ✭ 648 (-1.67%)
Mutual labels:  jupyter-notebook
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (-2.88%)
Mutual labels:  jupyter-notebook
Tensorslow
Re-implementation of TensorFlow in pure python, with an emphasis on code understandability
Stars: ✭ 657 (-0.3%)
Mutual labels:  jupyter-notebook
Tensorflow 101
TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow
Stars: ✭ 642 (-2.58%)
Mutual labels:  jupyter-notebook
Architectureplaybook
The Open Architecture Playbook. Use it to create better and faster (IT)Architectures. OSS Tools, templates and more for solving IT problems using real open architecture tools that work!
Stars: ✭ 652 (-1.06%)
Mutual labels:  jupyter-notebook
Py4fi2nd
Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.
Stars: ✭ 640 (-2.88%)
Mutual labels:  jupyter-notebook
Aima Python
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
Stars: ✭ 6,129 (+830.05%)
Mutual labels:  jupyter-notebook
Stat479 Machine Learning Fs19
Course material for STAT 479: Machine Learning (FS 2019) taught by Sebastian Raschka at University Wisconsin-Madison
Stars: ✭ 650 (-1.37%)
Mutual labels:  jupyter-notebook
Me bot
Build a bot that speaks like you!
Stars: ✭ 641 (-2.73%)
Mutual labels:  jupyter-notebook
Eigentechno
Principal Component Analysis on music loops
Stars: ✭ 655 (-0.61%)
Mutual labels:  jupyter-notebook
Pytorch Normalizing Flows
Normalizing flows in PyTorch. Current intended use is education not production.
Stars: ✭ 641 (-2.73%)
Mutual labels:  jupyter-notebook
Goodbooks 10k
Ten thousand books, six million ratings
Stars: ✭ 646 (-1.97%)
Mutual labels:  jupyter-notebook
Deep Recommender System
深度学习在推荐系统中的应用及论文小结。
Stars: ✭ 657 (-0.3%)
Mutual labels:  jupyter-notebook
Batchgenerators
A framework for data augmentation for 2D and 3D image classification and segmentation
Stars: ✭ 655 (-0.61%)
Mutual labels:  jupyter-notebook
Tf Estimator Tutorials
This repository includes tutorials on how to use the TensorFlow estimator APIs to perform various ML tasks, in a systematic and standardised way
Stars: ✭ 649 (-1.52%)
Mutual labels:  jupyter-notebook

Sentiment Analysis using Doc2Vec

Word2Vec is dope. In short, it takes in a corpus, and churns out vectors for each of those words. What's so special about these vectors you ask? Well, similar words are near each other. Furthermore, these vectors represent how we use the words. For example, v_man - v_woman is approximately equal to v_king - v_queen, illustrating the relationship that "man is to woman as king is to queen". This process, in NLP voodoo, is called word embedding. These representations have been applied widely. This is made even more awesome with the introduction of Doc2Vec that represents not only words, but entire sentences and documents. Imagine being able to represent an entire sentence using a fixed-length vector and proceeding to run all your standard classification algorithms. Isn't that amazing?

However, Word2Vec documentation is shit. The C-code is nigh unreadable (700 lines of highly optimized, and sometimes weirdly optimized code). I personally spent a lot of time untangling Doc2Vec and crashing into ~50% accuracies due to implementation mistakes. This tutorial aims to help other users get off the ground using Word2Vec for their own research. We use Word2Vec for sentiment analysis by attempting to classify the Cornell IMDB movie review corpus (http://www.cs.cornell.edu/people/pabo/movie-review-data/). The specific data set used is available for download at http://ai.stanford.edu/~amaas/data/sentiment/.

Show Me The Code

The IPython Notebook (code + tutorial) can be found in word2vec-sentiments.ipynb

The code to just run the Doc2Vec and save the model as imdb.d2v can be found in run.py. Should be useful for running on computer clusters.

What Does This Repo Contain

  • test-neg.txt test-pos.txt train-neg.txt train-pos.txt train-unsup.txt Training and testing data. Explained in more detail in the notebook.
  • word2vec-sentiment.ipynb The notebook (code + tutorial)
  • run.py Just the code

License

Copyright (c) 2015 Linan Qiu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].