TabulaTabula is a tool for liberating data tables trapped inside PDF files
Stars: ✭ 5,420 (+8933.33%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-75%)
humanparserParse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (+30%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+1435%)
dustArchive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (-68.33%)
KatanaA Python Tool For google Hacking
Stars: ✭ 355 (+491.67%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-63.33%)
Gazpacho🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+775%)
shupA POSIX shell script to parse HTML
Stars: ✭ 28 (-53.33%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+6695%)
image-collectorDownload images from Google Image Search
Stars: ✭ 38 (-36.67%)
naos📉 Uptime and error monitoring CLI
Stars: ✭ 30 (-50%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+690%)
Facebook data analyzerAnalyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
Stars: ✭ 515 (+758.33%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-33.33%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+415%)
WebhereHTML scraping for Objective-C.
Stars: ✭ 16 (-73.33%)
ferendaTransform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-63.33%)
internet-affordability🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-78.33%)
NickjsWeb scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)
Stars: ✭ 494 (+723.33%)
gunaydinYour good mornings ☀️
Stars: ✭ 16 (-73.33%)
Clean Text🧹 Python package for text cleaning
Stars: ✭ 284 (+373.33%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-70%)
MtntCode for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-20%)
rubiumRubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby
Stars: ✭ 65 (+8.33%)
LambdasoupFunctional HTML scraping and rewriting with CSS in OCaml
Stars: ✭ 280 (+366.67%)
ogpParserOpen Graph Protocol Parser for Node.js
Stars: ✭ 43 (-28.33%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+7961.67%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+5156.67%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+18.33%)
Imagescraper✂️ High performance, multi-threaded image scraper
Stars: ✭ 630 (+950%)
ScrappingMastering the art of scrapping 🎓
Stars: ✭ 24 (-60%)
instagram explorer📷 An app to scrap instagram posts and analyze data.
Stars: ✭ 17 (-71.67%)
copycatA PHP Scraping Class
Stars: ✭ 70 (+16.67%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+660%)
scrapScrapping Facebook with JavaScript.
Stars: ✭ 25 (-58.33%)
jazzThe Scripting Engine that Combines Speed, Safety, and Simplicity
Stars: ✭ 132 (+120%)
Instagram-to-discordMonitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (+88.33%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-38.33%)
ha-multiscrapeHome Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (+71.67%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-1.67%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-65%)
MechanizeMechanize is a ruby library that makes automated web interaction easy.
Stars: ✭ 4,158 (+6830%)
scrapmanRetrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-65%)
scraperNodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Stars: ✭ 37 (-38.33%)
puppeteer-botcheck🕵♂ Bot detection tests for Puppeteer. Hide and seek!
Stars: ✭ 42 (-30%)
NewcrawlerFree Web Scraping Tool with Java
Stars: ✭ 589 (+881.67%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-71.67%)
Awesome Python Primer自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-5%)
Artooartoo.js - the client-side scraping companion.
Stars: ✭ 1,029 (+1615%)
PypatentSearch for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-48.33%)
Undetected ChromedriverCustom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Stars: ✭ 365 (+508.33%)