All Projects → sindbach → doc2vec_pymongo

sindbach / doc2vec_pymongo

Licence: other
Machine learning prediction of movies genres using Gensim's Doc2Vec and PyMongo - (Python, MongoDB)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to doc2vec pymongo

Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-16.67%)
Mutual labels:  doc2vec, doc2vec-model
Mongoengine
MongoEngine is a Python Object-Document Mapper for working with MongoDB. Documentation is available at https://mongoengine-odm.readthedocs.io - there is currently a tutorial, a user guide, and an API reference.
Stars: ✭ 3,632 (+9988.89%)
Mutual labels:  pymongo
axiol
🚀 An advanced Python Discord bot for everyone
Stars: ✭ 39 (+8.33%)
Mutual labels:  pymongo
Python-MongoDB-Example
A Live working Example Application of Python, Qt, PySide2, MongoDB, PyMongo, QTreeView, QAbstractTableModel
Stars: ✭ 41 (+13.89%)
Mutual labels:  pymongo
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (-22.22%)
Mutual labels:  pymongo
buscaimoveis
Agregador de anúncios de imóveis a venda
Stars: ✭ 15 (-58.33%)
Mutual labels:  pymongo
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+566.67%)
Mutual labels:  prediction
RVM-MATLAB
MATLAB code for Relevance Vector Machine using SB2_Release_200.
Stars: ✭ 38 (+5.56%)
Mutual labels:  prediction
flask-admin-boilerplate
Flask Admin Boilerplate with MongoDB
Stars: ✭ 63 (+75%)
Mutual labels:  pymongo
Deploy-ML-model
No description or website provided.
Stars: ✭ 57 (+58.33%)
Mutual labels:  pymongo
quart-motor
Motor support for Quart applications
Stars: ✭ 14 (-61.11%)
Mutual labels:  pymongo
megadlbot oss
Megatron was a telegram file management bot that helped a lot of users, specially movie channel managers to upload their files to telegram by just providing a link to it. The project initially started as roanuedhuru_bot which lately retired and came back as Megatron which was a side project of the famous Maldivian Telegram community - @baivaru u…
Stars: ✭ 151 (+319.44%)
Mutual labels:  pymongo
ask-hadith
🔎 A Hadith search engine
Stars: ✭ 33 (-8.33%)
Mutual labels:  pymongo
pymongo inmemory
A mongo mocking library with an ephemeral MongoDB running in memory.
Stars: ✭ 25 (-30.56%)
Mutual labels:  pymongo
python-neuron
Neuron class provides LNU, QNU, RBF, MLP, MLP-ELM neurons
Stars: ✭ 38 (+5.56%)
Mutual labels:  prediction
iHealth crawler
iHealth 项目的内容爬虫(一个基于 python 和 MongoDB 的医疗咨询爬虫)
Stars: ✭ 24 (-33.33%)
Mutual labels:  pymongo
mongu
🌱 Yet another Python Object-Document Mapper on top of PyMongo. It's lightweight, intuitive to use and easy to understand.
Stars: ✭ 15 (-58.33%)
Mutual labels:  pymongo
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-58.33%)
Mutual labels:  pymongo
Data-Science
Using Kaggle Data and Real World Data for Data Science and prediction in Python, R, Excel, Power BI, and Tableau.
Stars: ✭ 15 (-58.33%)
Mutual labels:  prediction
GPT2-Telegram-Chatbot
GPT-2 Telegram Chat bot
Stars: ✭ 67 (+86.11%)
Mutual labels:  prediction

Predicting movie genres with PyMongo and Doc2Vec.

Utilising:

Tested with :

  • Python v3.9
  • PyMongo v4.1.1
  • MongoDB v5.0
  • GenSim v4.1

Data

A very small set of data is provided with this repository for example purposes. There are two json files that are ready to import into a MongoDB deployment.

To import the files into MongoDB you can use mongoimport:

mongoimport --db topics --collection movies --file ./data/training.json
mongoimport --db topics --collection test --file ./data/test.json 

Custom Data

Essentially you need a MongoDB collection with document structure as below example:

{
  "_id": ObjectId("57ff3452b62f007fe3d033b9"),
  "Title": "Circle",
  "Plot": "In a massive, mysterious chamber, fifty strangers awaken to find themselves ...",
  "Actors": "Michael Nardelli, Allegra Masters, Molly Jackson, Jordi Vilasuso",
  "Year": "2015",
  "Genre": "Drama, Horror, Mystery",  
  "Language": "English",
  ...
}

You can either construct the document yourself, or fetch existing information from movies' sites.

The example data was collected by fetching movies data from : MovieLens Latest Datasets. There's a file called ./ml-latest-small/links.csv that contains movieId. This ID can be used to fetch the related movie information from omdbapi.com. You would need to register and activate an API key. The site provides 1000 API calls per day for free.

Use build_dataset.py script as an example to fetch more movies data from omdbapi.com. The script will output a json file that could be imported to MongoDB using mongoimport.

Building a Model

The prediction model utilises movie's Title, Plot and Actors fields. You can create a doc2vec model file using modeller.py command line. See modeller.py --help for more information. Below is an example command to read from database topics and collection movies to create a model file called example.model:

./modeller.py --db topics --coll movies --model example.model

Use the Model

Provide the generated doc2vec model file as input to analyser.py to predict the genres of movie(s). See analyser.py --help for more information. Below is an example command to read documents from database topics and collection test and predict the genres using example.model:

./analyser.py --db topics --coll test --limit 3 --model example.model

Output example:

INFO : Title: Terminator Genisys
INFO : Plots: When John Connor (Jason Clarke), leader of the human resistance, sends Sgt. Kyle Reese (Jai Courtney) back to 1984 to protect Sarah Connor (Emilia Clarke) and safeguard the future, an unexpected turn of events creates a fractured timeline. Now, Sgt. Reese finds himself in a new and unfamiliar version of the past, where he is faced with unlikely allies, including the Guardian (Arnold Schwarzenegger), dangerous new enemies, and an unexpected new mission: To reset the future...
INFO : Actual Genres: [u'Action', u'Adventure', u'Sci-Fi']
INFO : precomputing L2-norms of doc weight vectors
INFO : Most similar:  [(u'Adventure', 0.5624773502349854), (u'Action', 0.5235205292701721), (u'Animation', 0.5159382820129395)]
INFO :   
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].