Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+3054%)

Mutual labels: crawling, puppeteer

bots-zoo

No description or website provided.

Stars: ✭ 59 (-41%)

Mutual labels: crawling, puppeteer

Instagram Bot

An Instagram bot developed using the Selenium Framework

Stars: ✭ 138 (+38%)

Mutual labels: crawling, selenium-webdriver

Marionette

Selenium alternative for Crystal. Browser manipulation without the Java overhead.

Stars: ✭ 119 (+19%)

Mutual labels: selenium-webdriver, puppeteer

tees

Universal test framework for front-end with WebDriver, Puppeteer and Enzyme

Stars: ✭ 23 (-77%)

Mutual labels: selenium-webdriver, puppeteer

page-modeller

⚙️ Browser DevTools extension for modelling web pages for automation.

Stars: ✭ 66 (-34%)

Mutual labels: selenium-webdriver, puppeteer

Webdrivermanager

Stars: ✭ 1,808 (+1708%)

Mutual labels: selenium-webdriver, geckodriver

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+23%)

Mutual labels: crawling, puppeteer

Python Crawling Tutorial

Python crawling tutorial

Stars: ✭ 57 (-43%)

Mutual labels: crawling

N2h4

네이버 뉴스 수집을 위한 도구

Stars: ✭ 177 (+77%)

Mutual labels: crawling

Pdf downloader

A Scrapy Spider for downloading PDF files from a webpage.

Stars: ✭ 18 (-82%)

Mutual labels: crawling

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+537%)

Mutual labels: crawling

softest

Recording Browser Interactions And Generating Test Scripts.

Stars: ✭ 225 (+125%)

Mutual labels: puppeteer

Holiday Cn

📅🇨🇳 中国法定节假日数据自动每日抓取国务院公告

Stars: ✭ 157 (+57%)

Mutual labels: crawling

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+483%)

Mutual labels: crawling

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+4737%)

Mutual labels: crawling

Massivedl

Download a large list of files concurrently

Stars: ✭ 141 (+41%)

Mutual labels: crawling

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (+340%)

Mutual labels: crawling

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-32%)

Mutual labels: crawling

Nutch

Apache Nutch is an extensible and scalable web crawler

Stars: ✭ 2,277 (+2177%)

Mutual labels: crawling

Crawling Projects

Web scraping and automation using python

Stars: ✭ 49 (-51%)

Mutual labels: crawling

web-scraping

Web Scraping using puppeteer

Stars: ✭ 21 (-79%)

Mutual labels: puppeteer

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+689%)

Mutual labels: crawling

Isp Data Pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscation

Stars: ✭ 425 (+325%)

Mutual labels: crawling

clusteer

Clusteer is a Puppeteer wrapper written for Laravel, with the super-power of parallelizing pages across multiple browser instances.

Stars: ✭ 81 (-19%)

Mutual labels: puppeteer

BaiduSpider

项目已经移动至：https://github.com/BaiduSpider/BaiduSpider ！！一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Stars: ✭ 29 (-71%)

Mutual labels: crawling

macaca-puppeteer

Macaca puppeteer driver

Stars: ✭ 39 (-61%)

Mutual labels: puppeteer

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+11445%)

Mutual labels: crawling

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (+186%)

Mutual labels: crawling

Scrapy Selenium

Scrapy middleware to handle javascript pages using selenium

Stars: ✭ 550 (+450%)

Mutual labels: crawling

Crawler

Go process used to crawl websites

Stars: ✭ 147 (+47%)

Mutual labels: crawling

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+356%)

Mutual labels: crawling

aws-pdf-textract-pipeline

🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

Stars: ✭ 141 (+41%)

Mutual labels: puppeteer

Spidermon

Scrapy Extension for monitoring spiders execution.

Stars: ✭ 309 (+209%)

Mutual labels: crawling

Instagram-Like-Comment-Bot

📷 An Instagram bot written in Python using Selenium on Google Chrome. It will go through posts in hashtag(s) and like and comment on them.

Stars: ✭ 53 (-47%)

Mutual labels: selenium-webdriver

Stopstalk Deployment

Stop stalking and start StopStalking 😉

Stars: ✭ 276 (+176%)

Mutual labels: crawling

Bhban rpa

6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.

Stars: ✭ 124 (+24%)

Mutual labels: crawling

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+177%)

Mutual labels: crawling

Memorious

Distributed crawling framework for documents and structured data.

Stars: ✭ 248 (+148%)

Mutual labels: crawling

Corpuscrawler

Crawler for linguistic corpora

Stars: ✭ 127 (+27%)

Mutual labels: crawling

Spidy

The simple, easy to use command line web crawler.

Stars: ✭ 257 (+157%)

Mutual labels: crawling

pupflare

A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)

Stars: ✭ 183 (+83%)

Mutual labels: puppeteer

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

Stars: ✭ 1,514 (+1414%)

Mutual labels: crawling

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-32%)

Mutual labels: crawling

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-52%)

Mutual labels: crawling

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

Stars: ✭ 15 (-85%)

Mutual labels: crawling

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+42243%)

Mutual labels: crawling

popular restaurants from officials

서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트

Stars: ✭ 22 (-78%)

Mutual labels: crawling

Colly

Elegant Scraper and Crawler Framework for Golang

Stars: ✭ 15,535 (+15435%)

Mutual labels: crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (+0%)

Mutual labels: crawling

serverless-instagram-crawler

serverless, instagram hashtag crawler with lambda, dynamoDB

Stars: ✭ 33 (-67%)

Mutual labels: crawling

talospider

talospider - A simple,lightweight scraping micro-framework

Stars: ✭ 57 (-43%)

Mutual labels: crawling

1-60 of 459 similar projects

›

next*5