Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+264%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+71%)
puppet-masterPuppeteer as a service hosted on Saasify.
Stars: ✭ 25 (-75%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+1628%)
Cdp4jcdp4j - Chrome DevTools Protocol for Java
Stars: ✭ 232 (+132%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (+25%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+3054%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-41%)
Instagram BotAn Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (+38%)
MarionetteSelenium alternative for Crystal. Browser manipulation without the Java overhead.
Stars: ✭ 119 (+19%)
teesUniversal test framework for front-end with WebDriver, Puppeteer and Enzyme
Stars: ✭ 23 (-77%)
page-modeller⚙️ Browser DevTools extension for modelling web pages for automation.
Stars: ✭ 66 (-34%)
WebdrivermanagerWebDriverManager (Copyright © 2015-2021) is a project created and maintained by Boni Garcia and licensed under the terms of the Apache 2.0 License.
Stars: ✭ 1,808 (+1708%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+23%)
N2h4네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (+77%)
Pdf downloaderA Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (-82%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+537%)
softestRecording Browser Interactions And Generating Test Scripts.
Stars: ✭ 225 (+125%)
Holiday Cn📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (+57%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+4737%)
MassivedlDownload a large list of files concurrently
Stars: ✭ 141 (+41%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+340%)
ArachnidPowerful web scraping framework for Crystal
Stars: ✭ 68 (-32%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+2177%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+689%)
Isp Data PollutionISP Data Pollution to Protect Private Browsing History with Obfuscation
Stars: ✭ 425 (+325%)
clusteerClusteer is a Puppeteer wrapper written for Laravel, with the super-power of parallelizing pages across multiple browser instances.
Stars: ✭ 81 (-19%)
BaiduSpider项目已经移动至:https://github.com/BaiduSpider/BaiduSpider !! 一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Stars: ✭ 29 (-71%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+11445%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+186%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+450%)
CrawlerGo process used to crawl websites
Stars: ✭ 147 (+47%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+356%)
aws-pdf-textract-pipeline🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+41%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+209%)
Instagram-Like-Comment-Bot📷 An Instagram bot written in Python using Selenium on Google Chrome. It will go through posts in hashtag(s) and like and comment on them.
Stars: ✭ 53 (-47%)
Bhban rpa6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.
Stars: ✭ 124 (+24%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+177%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+148%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (+157%)
pupflareA webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)
Stars: ✭ 183 (+83%)
Skycaiji蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+1414%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-32%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-52%)
img-cliAn interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-85%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+42243%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+15435%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+0%)
talospidertalospider - A simple,lightweight scraping micro-framework
Stars: ✭ 57 (-43%)