All Projects → Medha11 → Twitter-Trends

Medha11 / Twitter-Trends

Licence: other
Twitter Trends is a web-based application that automatically detects and analyzes emerging topics in real time through hashtags and user mentions in tweets. Twitter being the major microblogging service is a reliable source for trends detection. The project involved extracting live streaming tweets, processing them to find top hashtags and user …

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
HTML
75241 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to Twitter-Trends

contextualLSTM
Contextual LSTM for NLP tasks like word prediction and word embedding creation for Deep Learning
Stars: ✭ 28 (-65.85%)
Mutual labels:  topic-modeling
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (-32.93%)
Mutual labels:  topic-modeling
KGE-LDA
Knowledge Graph Embedding LDA. AAAI 2017
Stars: ✭ 35 (-57.32%)
Mutual labels:  topic-modeling
stmprinter
Print multiple stm model dashboards to a pdf file for inspection
Stars: ✭ 34 (-58.54%)
Mutual labels:  topic-modeling
BTM
Biterm Topic Modelling for Short Text with R
Stars: ✭ 78 (-4.88%)
Mutual labels:  topic-modeling
TopicNet
Interface for easier topic modelling.
Stars: ✭ 127 (+54.88%)
Mutual labels:  topic-modeling
ml
machine learning
Stars: ✭ 29 (-64.63%)
Mutual labels:  topic-modeling
tassal
Tree-based Autofolding Software Summarization Algorithm
Stars: ✭ 38 (-53.66%)
Mutual labels:  topic-modeling
amazon-reviews
Sentiment Analysis & Topic Modeling with Amazon Reviews
Stars: ✭ 26 (-68.29%)
Mutual labels:  topic-modeling
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-67.07%)
Mutual labels:  topic-modeling
hlda
Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
Stars: ✭ 138 (+68.29%)
Mutual labels:  topic-modeling
PyLDA
A Latent Dirichlet Allocation implementation in Python.
Stars: ✭ 51 (-37.8%)
Mutual labels:  topic-modeling
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+79.27%)
Mutual labels:  topic-modeling
tomoto-ruby
High performance topic modeling for Ruby
Stars: ✭ 49 (-40.24%)
Mutual labels:  topic-modeling
gensimr
📝 Topic Modeling for Humans
Stars: ✭ 35 (-57.32%)
Mutual labels:  topic-modeling
ml-nlp-services
机器学习、深度学习、自然语言处理
Stars: ✭ 23 (-71.95%)
Mutual labels:  topic-modeling
ctpfrec
Python implementation of "Content-based recommendations with poisson factorization", with some extensions
Stars: ✭ 31 (-62.2%)
Mutual labels:  topic-modeling
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-63.41%)
Mutual labels:  topic-modeling
learning-stm
Learning structural topic modeling using the stm R package.
Stars: ✭ 103 (+25.61%)
Mutual labels:  topic-modeling
twic
Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models
Stars: ✭ 51 (-37.8%)
Mutual labels:  topic-modeling

Plan

  • First extractor extracts tweets as usual.
  • Tweets are cleaned and dumped into MongoDB.
  • Aggregation is done for the whole day.
  • Based on the aggregation, top 100 entities are found and the respective tweets are clubbed into one collection.
  • Before it is dumped into the collection, sentiment analysis is done on them.
  • Using each of the 100 collections as a separate document, LDA is performed. If 100 documents is too low, we can split the big documents into smaller ones.
  • The tweets are iterated individually to find the topic to which it belongs.
  • URLs are extracted for each topic which seem to be most relevant.
  • Webpages corresponding to the URLs are downloaded and parsed.
  • A portion of the main content can be displayed after extraction.
  • The graph is approximated as usual but the time span has to be discussed upon.
  • The graph, related tweets and summarizations of the URLs along with the hyperlinks is displayed for each topic on the portal.

Workflow

  • Control of engine starts with manager.py
  • manager.py makes us of multiprocess and subprocess to spawn extractor, preprocessor and postprocessor as separate processes
  • config.py in the utilities package stores tuning parameters such as 'alarm' times, file limit etc.
  • Refer to this .ppt for further information.

Dataset

  • Download dataset(s) from the Drive folder
    • The full_dataset.rar contains all 2 Million tweets
    • Optionally, you can download parts of this dataset from the Parts folder, each (dataset*.rar) containing 200,000 tweets
    • Each .json file contains 10,000 tweets

init

  • Clone the git repository
  • Run python_path.bat to add PYTHONPATH env variable. This needs to be done only once
  • Make necessary changes in the config.py file in *engine\utilities*
  • Run python init.py in Command Prompt to start engine
  • To stop, close all Command Prompt and Python windows

Portal

  • The portal folder is the django project for the web portal
  • Create a database called 'trends'
  • In the settings file, change password for mysql root, in case it is different
  • Run createsuperuser to create an admin
  • Create some top trends using the admin site. I have included a screenshot for UI after creating some sample topics(with ranks). It will redirect to the details page after clicking(see screenshots).
  • Homepage can be opened using the url: 127.0.0.1:8000 or localhost:8000
  • TopTrends model has a topic object and a rank object. Will be modified to include graphs n all when implementation is done.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].