All Projects → thepanacealab → SMMT

thepanacealab / SMMT

Licence: GPL-3.0 license
Social Media Mining Toolkit (SMMT) main repository

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to SMMT

Twitter Scraper
Scrape the Twitter Frontend API without authentication.
Stars: ✭ 3,037 (+2518.1%)
Mutual labels:  tweets, twitter-api
Twitter Post Fetcher
Fetch your twitter posts without using the new Twitter 1.1 API. Pure JavaScript! By Jason Mayes
Stars: ✭ 886 (+663.79%)
Mutual labels:  tweets, twitter-api
Tweetie
Simple jQuery Twitter feed plugin
Stars: ✭ 314 (+170.69%)
Mutual labels:  tweets, twitter-api
TwitterPiBot
A Python based bot for Raspberry Pi that grabs tweets with a specific hashtag and reads them out loud.
Stars: ✭ 85 (-26.72%)
Mutual labels:  tweets, twitter-api
TwitterAutoReplyBot
This is a tiny Python script that replies to a specified number of tweets containing a specified hashtag.
Stars: ✭ 33 (-71.55%)
Mutual labels:  tweets, twitter-api
tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
Stars: ✭ 24 (-79.31%)
Mutual labels:  tweets, twitter-api
Twitter
Twitter API for Laravel 5.5+, 6.x, 7.x & 8.x
Stars: ✭ 755 (+550.86%)
Mutual labels:  tweets, twitter-api
Prodigy Recipes
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
Stars: ✭ 229 (+97.41%)
Mutual labels:  annotation, spacy
inception-external-recommender
Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-service compatible with the external recommender API of INCEpTION.
Stars: ✭ 36 (-68.97%)
Mutual labels:  annotation, spacy
Twitterdelete
💀 Delete your old, unpopular tweets.
Stars: ✭ 231 (+99.14%)
Mutual labels:  tweets, twitter-api
Archive-Tweets
Archive and Delete Liked and Posted Tweets
Stars: ✭ 28 (-75.86%)
Mutual labels:  tweets, twitter-api
nitter scraper
Scrape Twitter API without authentication using Nitter.
Stars: ✭ 31 (-73.28%)
Mutual labels:  tweets, twitter-api
discord-twitter-webhooks
🤖 Stream tweets to Discord
Stars: ✭ 47 (-59.48%)
Mutual labels:  tweets, twitter-api
archive-explorer-web
Browse your Twitter archive with a friendly, responsive, full experience, and quickly delete the tweets you don't want.
Stars: ✭ 19 (-83.62%)
Mutual labels:  tweets, twitter-api
trumptweets
Download data on all of Donald Trump's (@RealDonaldTrump) tweets
Stars: ✭ 39 (-66.38%)
Mutual labels:  tweets, twitter-api
Linqtotwitter
LINQ Provider for the Twitter API (C# Twitter Library)
Stars: ✭ 401 (+245.69%)
Mutual labels:  tweets, twitter-api
Tageditor
🏖TagEditor - Annotation tool for spaCy
Stars: ✭ 92 (-20.69%)
Mutual labels:  annotation, spacy
Jupyterlab Prodigy
🧬 A JupyterLab extension for annotating data with Prodigy
Stars: ✭ 97 (-16.38%)
Mutual labels:  annotation, spacy
Birdseed
🐦 🎲 Use Twitter's Search API to get random numbers
Stars: ✭ 81 (-30.17%)
Mutual labels:  tweets, twitter-api
stweet
Advanced python library to scrap Twitter (tweets, users) from unofficial API
Stars: ✭ 287 (+147.41%)
Mutual labels:  tweets, twitter-api

Social Media Mining Toolkit (SMMT)

Aphrodite Sticker The set of tools collected and presented here are designed with the purpose of facilitating the acquisition, preprocessing, and initial exploration of social media data (mostly Twitter for now). This centralized repository depends on other widely available libraries that need to be installed.

We separated this toolkit in three categories (each one on an individual folder):

1. Data Acquisition Tools: Utilities to gather data from social media sites
2. Data Preprocessing Tools: Utilities to parse social media 'raw' data and to separate by terms
3. Data Annotation and Standardization Tools: Utilities to make automatic NER annotations on preprocessed tweets, plugins to use popular annotation tools and NER systems

Usage

  1. Install dependencies (below)
  2. Clone repository
  3. Make sure you have your Twitter API keys handy if you are gathering any Twitter Data
  4. Each tool and their usage is described on the README file on each category of tools folder.

All the libraries used in this toolkit can be installed using the following command.

sh requirements.sh

Note: If you would like to setup headless browsing automation tasks, please install additional dependencies given below.

Dependencies and versions used

  1. Python 3+

  2. Spacy v2.2

pip install spacy 
python -m spacy download en
python -m spacy download en_core_web_sm
  1. Twarc pip install twarc

  2. Tweepy v3.8.0 pip install tweepy

  3. argparse - v3.2 pip install argparse

  4. xtract - v0.1a3 pip install xtract

NOTE: If you are using the scraping utility, install the following dependencies. These dependencies are needed for the headless browsing automation tasks (no need to have a screen open for them). Configuration of these items is very finicky but there is plenty of documentation online.

  1. Xvf sudo yum install Xvfb

  2. Firefox sudo yum install firefox

  3. selenium pip install -U selenium

  4. pyvirtualdisplay - v0.25 pip install pyvirtualdisplay

  5. GeckoDriver - v0.26.0 sudo yum install jq

and then use the provided utility:

bash SMMT/data_acquisition/geckoDriverInstall.sh

If you still have issues or the Firefox window is popping up through your X11, follow this: https://www.tienle.com/2016/09-20/run-selenium-firefox-browser-centos.html

Twitter Keys

This is a very important step, if you do not have any Twitter API keys, none of the software that uses Twitter API will work without it

How to cite our work:

If you used SMMT and liked it, please cite the following paper:

R Tekumalla and JM Banda. "Social Media Mining Toolkit (SMMT)". Genomics & Informatics, 18, (2), 2020. https://doi.org/10.5808/GI.2020.18.2.e16

Social Media Mining Toolkit (SMMT) Extra Information

Data Acquisition Tools:

  1. Twitter hydration tool - This script will hydrate tweet ID’s provided by others.
  2. Twitter gathering tool - This script will allow users to specify hashtags and capture from the twitter faucet new tweets with the given hashtag.

Data Preprocessing Tools:

  1. Twitter JSON extraction tool - While seemingly trivial, most biomedical researchers do not want to work with JSON objects. This tool will take the fields the researcher wants and output a simple to use CSV file created from the provided data.

Data Annotation and Standardization Tools:

  1. Spacy dictionary-based annotation pipeline This is the tool that will require the most work during the hackathon. This pipeline will be available as a service as well, with the user providing their dictionaries and feeding data directly.
  2. Dictionary generation tool This tool will transform ontologies or provided dictionary files into spacy compliant dictionaries to use with the previous pipeline.
  3. Manual annotation hooks to tools like brat annotation tools

This work was conceptualized for/and (mostly) carried out while at the Biomedical Linked Annotation Hackathon 6 in Tokyo, Japan.

BLAH

We are very grateful for the support on this work.

Proposed functionality of SMMT V1.0

Architecture

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].