All Projects → dmuhs → Pastebin Scraper

dmuhs / Pastebin Scraper

Live-scraping pastebin to fight boredom.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pastebin Scraper

Tbls
tbls is a CI-Friendly tool for document a database, written in Go.
Stars: ✭ 940 (+1324.24%)
Mutual labels:  mysql, sqlite
Laravel
Muito conteúdo sobre o framework Laravel. Controllers, Models, Views, Blade, Migrations, Seeders, Middlewares, Autenticação, Autorização, Providers, pacotes, laravel 8, etc.
Stars: ✭ 43 (-34.85%)
Mutual labels:  mysql, sqlite
Botvid 19
Messenger Bot that scrapes for COVID-19 data and periodically updates subscribers via Facebook Messages. Created using Python/Flask, MYSQL, HTML, Heroku
Stars: ✭ 34 (-48.48%)
Mutual labels:  scraper, mysql
Delphi Orm
Delphi ORM
Stars: ✭ 16 (-75.76%)
Mutual labels:  mysql, sqlite
Hunt Entity
An object-relational mapping (ORM) framework for D language (Similar to JPA / Doctrine), support PostgreSQL and MySQL.
Stars: ✭ 51 (-22.73%)
Mutual labels:  mysql, sqlite
Pecee Pixie
Lightweight, easy-to-use querybuilder for PHP inspired by Laravel Eloquent - but with less overhead.
Stars: ✭ 19 (-71.21%)
Mutual labels:  mysql, sqlite
Iobroker.sql
Store history data in SQL Database: MySQL, PostgreSQL or SQLite
Stars: ✭ 37 (-43.94%)
Mutual labels:  mysql, sqlite
Bookshelf
A simple Node.js ORM for PostgreSQL, MySQL and SQLite3 built on top of Knex.js
Stars: ✭ 6,252 (+9372.73%)
Mutual labels:  mysql, sqlite
Pop
A Tasty Treat For All Your Database Needs
Stars: ✭ 1,045 (+1483.33%)
Mutual labels:  mysql, sqlite
Admin
AutoQuery + Admin UI for ServiceStack Projects
Stars: ✭ 47 (-28.79%)
Mutual labels:  mysql, sqlite
Diesel
A safe, extensible ORM and Query Builder for Rust
Stars: ✭ 7,702 (+11569.7%)
Mutual labels:  mysql, sqlite
East
node.js database migration tool
Stars: ✭ 53 (-19.7%)
Mutual labels:  mysql, sqlite
Xorm
Simple and Powerful ORM for Go, support mysql,postgres,tidb,sqlite3,mssql,oracle, Moved to https://gitea.com/xorm/xorm
Stars: ✭ 6,464 (+9693.94%)
Mutual labels:  mysql, sqlite
Eosio sql plugin
EOSIO sql database plugin
Stars: ✭ 21 (-68.18%)
Mutual labels:  mysql, sqlite
Smartsql
SmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics
Stars: ✭ 775 (+1074.24%)
Mutual labels:  mysql, sqlite
Goqu
SQL builder and query library for golang
Stars: ✭ 984 (+1390.91%)
Mutual labels:  mysql, sqlite
Mybb
MyBB is a free and open source forum software.
Stars: ✭ 750 (+1036.36%)
Mutual labels:  mysql, sqlite
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 7,712 (+11584.85%)
Mutual labels:  mysql, sqlite
Ensembl Hive
EnsEMBL Hive - a system for creating and running pipelines on a distributed compute resource
Stars: ✭ 44 (-33.33%)
Mutual labels:  mysql, sqlite
Dbbench
🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts
Stars: ✭ 52 (-21.21%)
Mutual labels:  mysql, sqlite

pastebin-scraper

This is a multithreaded scraping script for Pastebin. It scrapes the main site for new pastes, downloads their raw content and processes them by a user-defined output format.

WHY?

Fun.

Installation

The usual dance.

pip install -r requirements.txt

Define all required specs in settings.ini. Should you decide to go with a database output, make sure the respective connector is installed. At the moment MySQL with pymysql and SQLite with the standard built in Python 3 connector are supported.

Also note that the file output creates a subdirectory output and dumps every paste as a separate file into it.

Settings

ini is a highly underrated file format. Here are some definitions on what the settings parameter actually do.

GENERAL

  • PasteLimit Stop after having scraped n pastes. Set to 0 for indefinite scraping
  • PBLink URL to Pastebin or another equivalent site
  • DownloadWorkers Number of workers that download the raw paste content and further process it
  • NewPasteCheckInterval Time to wait before checking the main site for new pastes again
  • IPBlockedWaitTime Time to wait until checking the main site again after the scraper's IP has been blocked

LOGGING

  • RotationLog Location of log file that contains debug output
  • MaxRotationSize Size in bytes before another log file is created
  • RotationBackupCount Maximum number of log files to keep

STDOUT/ FILE

  • Enable Enable formatted stdout output of paste data
  • ContentDisplayLimit Maximum amount of characters to show before content is cut off (0 to display all)
  • ShowName Display the paste name
  • ShowLang Display the paste language
  • ShowLink Display the complete paste link
  • ShowData Display the raw paste content
  • DataEncoding Encoding of the raw paste data

MYSQL

  • Enable Enable MySQL output
  • TableName Main table name to insert data into
  • Host MySQL server host
  • Port MySQL server port
  • Username MySQL server user
  • Password User password

SQLITE

  • Enable Enable SQLite output
  • Filename Filename the db should be saved as (usually ends with .db)
  • TableName Main table name to insert data into

If you use this thing for some cool data analysis or even research, let me know if I can help!

Inspiration for this scraper was taken from here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].