Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-96.67%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (-92.9%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-97.56%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (-5.69%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-98.85%)
JvppeteerHeadless Chrome For Java (Java 爬虫)
Stars: ✭ 193 (-96.24%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (-38.51%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (-66.31%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+202.89%)
PhantomasHeadless Chromium-based web performance metrics collector and monitoring tool
Stars: ✭ 2,191 (-57.28%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (-84.62%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-91.42%)
CupriteHeadless Chrome/Chromium driver for Capybara
Stars: ✭ 743 (-85.51%)
Puppeteer DeepPuppeteer, Headless Chrome;爬取《es6标准入门》、自动推文到掘金、站点性能分析;高级爬虫、自动化UI测试、性能分析;
Stars: ✭ 1,033 (-79.86%)
Url To Pdf ApiWeb page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
Stars: ✭ 6,544 (+27.59%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (-94.42%)
FlaresolverrProxy server to bypass Cloudflare protection
Stars: ✭ 241 (-95.3%)
Cdp4jcdp4j - Chrome DevTools Protocol for Java
Stars: ✭ 232 (-95.48%)
Secret AgentThe web browser that's built for scraping.
Stars: ✭ 151 (-97.06%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (-20.51%)
PhpchrometopdfA slim PHP wrapper around google-chrome to convert url to pdf or to take screenshots , easy to use and clean OOP interface
Stars: ✭ 127 (-97.52%)
CrawlergoA powerful dynamic crawler for web vulnerability scanners
Stars: ✭ 1,088 (-78.79%)
GeziyorGeziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (-75.71%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+125.09%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (-96.14%)
Html Pdf ChromeHTML to PDF converter via Chrome/Chromium
Stars: ✭ 629 (-87.74%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-98.97%)
FerrumHeadless Chrome Ruby API
Stars: ✭ 1,009 (-80.33%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-94.6%)
Serverless Chrome🌐 Run headless Chrome/Chromium on AWS Lambda
Stars: ✭ 2,625 (-48.82%)
Puppeteer Extra💯 Teach puppeteer new tricks through plugins.
Stars: ✭ 3,397 (-33.77%)
PuppetronPuppeteer (Headless Chrome Node API)-based rendering solution.
Stars: ✭ 429 (-91.64%)
Goose ParserUniversal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (-95.89%)
Ruiji.netcrawler framework, distributed crawler extractor
Stars: ✭ 220 (-95.71%)
puppet-masterPuppeteer as a service hosted on Saasify.
Stars: ✭ 25 (-99.51%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (-87.58%)
throughout🎪 End-to-end testing made simple (using Jest and Puppeteer)
Stars: ✭ 16 (-99.69%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (-91.11%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-99.01%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-98.99%)
PychromelessPython Lambda Chrome Automation (naming pending)
Stars: ✭ 219 (-95.73%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+725.56%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-98.05%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (-97.6%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-99.71%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (-94.99%)
Weibo terminator workflowUpdate Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
Stars: ✭ 259 (-94.95%)
Playwright GoPlaywright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
Stars: ✭ 272 (-94.7%)
lightnovel epub🍭 epub generator for (light)novels (轻) 小说 epub 生成器,支持站点:轻之国度、轻小说文库
Stars: ✭ 89 (-98.26%)
Androidchromiumchrome browser of android version from chromium open project
Stars: ✭ 2,911 (-43.24%)
RcrawlerAn R web crawler and scraper
Stars: ✭ 274 (-94.66%)