All Projects → schedutron → chirps

schedutron / chirps

Licence: MIT license
Twitter bot powering @arichduvet

Programming Languages

python
139335 projects - #7 most used programming language
TSQL
950 projects

Projects that are alternatives of or similar to chirps

Socialreaper
Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 338 (+865.71%)
Mutual labels:  twitter, scraping
Social Media Profiles Regexs
📇 Extract social media profiles and more with regular expressions
Stars: ✭ 324 (+825.71%)
Mutual labels:  twitter, scraping
schedule-tweet
Schedules tweets using TweetDeck
Stars: ✭ 14 (-60%)
Mutual labels:  twitter, scraping
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+585.71%)
Mutual labels:  twitter, scraping
feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
Stars: ✭ 23 (-34.29%)
Mutual labels:  scraping
chesf
CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-48.57%)
Mutual labels:  scraping
rubium
Rubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby
Stars: ✭ 65 (+85.71%)
Mutual labels:  scraping
ogpParser
Open Graph Protocol Parser for Node.js
Stars: ✭ 43 (+22.86%)
Mutual labels:  scraping
Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-28.57%)
Mutual labels:  scraping
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+1254.29%)
Mutual labels:  scraping
ferenda
Transform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-37.14%)
Mutual labels:  scraping
document-dl
Command line program to download documents from web portals.
Stars: ✭ 14 (-60%)
Mutual labels:  scraping
subscene scraper
Library to download subtitles from subscene.com
Stars: ✭ 14 (-60%)
Mutual labels:  scraping
go-scrapy
Web crawling and scraping framework for Golang
Stars: ✭ 17 (-51.43%)
Mutual labels:  scraping
web-clipper
Easily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-57.14%)
Mutual labels:  scraping
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+48.57%)
Mutual labels:  scraping
proxi
Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (-8.57%)
Mutual labels:  scraping
AngleParse
HTML parsing and processing tool for PowerShell.
Stars: ✭ 35 (+0%)
Mutual labels:  scraping
internet-affordability
🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-62.86%)
Mutual labels:  scraping
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+8.57%)
Mutual labels:  scraping

chirps

Lines of code

Twitter bot powering www.twitter.com/arichduvet

Uses @sixohsix's Python-based Twitter API for posting and other actions. Scraping done with the help of Kenneth Reitz's requests module and some rudimentary regular expressions.

A poster on this project was presented at PyCon US 2019 by me and at EuroPython 2019 by my friend Parth (because of my absence due to ongoing internship):

Chirps: A Twitter Bot Framework Written in Python


Prerequisites

This bot framework is built in Python, so make sure Python 3.x is installed on your system. Once Python is installed, create a virtual environment in the root directory of this repo using the following command:

$ python3 -m venv bot

Then activate this virtual environment using:

$ source bot/bin/activate (for Windows users this can look like workon bot, see the relevant virtualenv documentation for exact usage)

Now install the dependencies using the following command:

$ pip install -r requirements.txt

You will need a PostgreSQL database service ready, a good free service is ElephantSQL. Once you've set up an empty database, save its url (it'll be needed while running init_script below).

For bot deployment, this framework uses Heroku, so you'll also need a Heroku account.

Setting It Up

After creating a new app on Heroku dashboard, install the Heroku CLI on your machine. Then use the following commands to add a new remote to this repository:

$ heroku login
<enter your Heroku credentials>
...

$ heroku git:remote -a <your Heroku app name>

(You can setup a GitHub pipeline on Heroku, but instructions on setting it up are beyond the scope of this README.)

Now create a new branch for this repository, name it "deploy" and check it out:

$ git checkout -b deploy

Remove the chirps/credentials.py and chirps/screen_name.py entries from the .gitignore file. The file should now look like:

[.gitignore]
.DS_Store
.env
bot/
.vscode/
chirps/__pycache__/

Next, run the bot initialization script and enter the required information very carefully:

$ python -m chirps.init_script <your database URL>

The bot setup is essentially complete once this script executes successfully. Now you just need to "tune" certain options of the bot in the Heroku Procfile. Create a file named "Procfile" in the root of this repository and provide the configuration options as per your needs:

[Procfile] # Do not include this in file
worker: python3 -m chirps.main --rate=300 --fav --retweet --follow --follow_limit=6000 --scrape scrape_thenewstack get_tech_news

For example, the above Procfile says "Tweet every 5 minutes (300 seconds), like (favorite) and follow tweets (those tweets that have keywords specified in init_script.py), keep following people tweeting about those keywords till your following count reaches 6000, and use scraper functions scrape_thenewstack() and get_tech_news() for aggregating content to be tweeted by your bot. You can build your own tweet-er functions (usually scrapers) in scrapers.py (they should return or yield strings, which will be tweeted by your bot) and tune other parameters as per your requirements.

Finally, deploy your bot using the following command:

$ git push heroku deploy:master

Once the deployment completes, "switch on" the bot as follows:

$ heroku ps:scale worker=1

Now your bot should be up and running!


If you want to dig deeper into the codebase and know more about the implementation of "generator-of-generators" function in chirps/functions.py, see my tutorial on DigitalOcean which explains that part in detail.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].