TabulaTabula is a tool for liberating data tables trapped inside PDF files
Stars: ✭ 5,420 (+12504.65%)
BablerData Collection System For NLP/Speech Recognition
Stars: ✭ 21 (-51.16%)
CoronadatascraperCOVID-19 Coronavirus data scraped from government and curated data sources.
Stars: ✭ 372 (+765.12%)
TorScrapperA Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (-44.19%)
Instagram ScraperScrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
Stars: ✭ 903 (+2000%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-65.12%)
Comic DlComic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
Stars: ✭ 365 (+748.84%)
humanparserParse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (+81.4%)
Gazpacho🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+1120.93%)
dustArchive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (-55.81%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (+41.86%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-13.95%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-48.84%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+9381.4%)
Facebook data analyzerAnalyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
Stars: ✭ 515 (+1097.67%)
powerapi-scalaPowerAPI is a middleware toolkit for building software-defined power meters
Stars: ✭ 70 (+62.79%)
homeassistant-powercalcCustom component to calculate estimated power consumption of lights and other appliances
Stars: ✭ 261 (+506.98%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+1734.88%)
image-collectorDownload images from Google Image Search
Stars: ✭ 38 (-11.63%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+618.6%)
naos📉 Uptime and error monitoring CLI
Stars: ✭ 30 (-30.23%)
NickjsWeb scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)
Stars: ✭ 494 (+1048.84%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+1002.33%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-6.98%)
Clean Text🧹 Python package for text cleaning
Stars: ✭ 284 (+560.47%)
ferendaTransform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-48.84%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+11148.84%)
internet-affordability🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-69.77%)
LambdasoupFunctional HTML scraping and rewriting with CSS in OCaml
Stars: ✭ 280 (+551.16%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-11.63%)
Imagescraper✂️ High performance, multi-threaded image scraper
Stars: ✭ 630 (+1365.12%)
document-dlCommand line program to download documents from web portals.
Stars: ✭ 14 (-67.44%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+7234.88%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-60.47%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+960.47%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+20.93%)
instagram explorer📷 An app to scrap instagram posts and analyze data.
Stars: ✭ 17 (-60.47%)
sg-food-mlThis script is used to scrap images from the Internet to classify 5 common noodle "mee" dishes in Singapore. Wanton Mee, Bak Chor Mee, Lor Mee, Prawn Mee and Mee Siam.
Stars: ✭ 18 (-58.14%)
Auto CpufreqAutomatic CPU speed & power optimizer for Linux
Stars: ✭ 843 (+1860.47%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-25.58%)
jazzThe Scripting Engine that Combines Speed, Safety, and Simplicity
Stars: ✭ 132 (+206.98%)
scavengerScrape and take screenshots of dynamic and static webpages
Stars: ✭ 14 (-67.44%)
MechanizeMechanize is a ruby library that makes automated web interaction easy.
Stars: ✭ 4,158 (+9569.77%)
ScrappingMastering the art of scrapping 🎓
Stars: ✭ 24 (-44.19%)
bots-zooNo description or website provided.
Stars: ✭ 59 (+37.21%)
copycatA PHP Scraping Class
Stars: ✭ 70 (+62.79%)
NewcrawlerFree Web Scraping Tool with Java
Stars: ✭ 589 (+1269.77%)
scraperNodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Stars: ✭ 37 (-13.95%)
X Cube Usb PdUSB-C Power Delivery Firmware for STM32 microcontroller (ARM Cortex M0 & M4)
Stars: ✭ 41 (-4.65%)
Usb EspHow to make a tiny USB powered ESP-12S
Stars: ✭ 39 (-9.3%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+2041.86%)
LookylooLookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
Stars: ✭ 381 (+786.05%)