All Projects → Colly → Similar Projects or Alternatives

2266 Open source projects that are alternatives of or similar to Colly

Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-98.9%)
Mutual labels:  crawler, spider, scraper, scraping, crawling
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-97.17%)
Mutual labels:  crawler, spider, scraper, scraping, crawling
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (-91.98%)
Mutual labels:  crawler, spider, scraper, scraping
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-98.22%)
Mutual labels:  crawler, spider, scraping, crawling
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-99.67%)
Mutual labels:  scraper, spider, scraping, crawling
bots-zoo
No description or website provided.
Stars: ✭ 59 (-99.62%)
Mutual labels:  crawler, scraper, scraping, crawling
Sasila
一个灵活、友好的爬虫框架
Stars: ✭ 286 (-98.16%)
Mutual labels:  crawler, scraping, crawling, framework
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (-94.92%)
Mutual labels:  crawler, scraper, scraping, crawling
Antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (-98.73%)
Mutual labels:  crawler, scraping, crawling, framework
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (-66.98%)
Mutual labels:  crawler, scraper, scraping, crawling
Ferret
Declarative web scraping
Stars: ✭ 4,837 (-68.86%)
Mutual labels:  crawler, scraper, scraping, crawling
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+172.57%)
Mutual labels:  crawler, scraping, crawling, framework
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (-95.78%)
Mutual labels:  crawler, spider, scraper
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-99.63%)
Mutual labels:  crawler, spider, scraping
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (-99.66%)
Mutual labels:  scraper, spider, scraping
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (-90.25%)
Mutual labels:  crawler, spider, crawling
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-99.66%)
Mutual labels:  scraper, scraping, crawling
Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (-25.68%)
Mutual labels:  crawler, scraper, crawling
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-99.9%)
Mutual labels:  crawler, scraper, scraping
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-99.69%)
Mutual labels:  crawler, spider, crawling
arachnod
High performance crawler for Nodejs
Stars: ✭ 17 (-99.89%)
Mutual labels:  crawler, scraper, spider
Querylist
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Stars: ✭ 2,392 (-84.6%)
Mutual labels:  crawler, spider, scraper
proxycrawl-python
ProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-99.67%)
Mutual labels:  scraper, scraping, crawling
Freshonions Torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Stars: ✭ 348 (-97.76%)
Mutual labels:  crawler, spider, scraper
Xcrawler
快速、简洁且强大的PHP爬虫框架
Stars: ✭ 344 (-97.79%)
Mutual labels:  crawler, spider, scraper
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (-97.66%)
Mutual labels:  crawler, spider, crawling
Creeper
🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (-95.09%)
Mutual labels:  crawler, spider, framework
Fbcrawl
A Facebook crawler
Stars: ✭ 536 (-96.55%)
Mutual labels:  crawler, spider, scraper
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (-96.25%)
Mutual labels:  crawler, scraping, crawling
Newcrawler
Free Web Scraping Tool with Java
Stars: ✭ 589 (-96.21%)
Mutual labels:  crawler, spider, scraping
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (-98.64%)
Mutual labels:  crawler, scraper, scraping
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (-93.41%)
Mutual labels:  spider, scraper, scraping
Crawler
A high performance web crawler in Elixir.
Stars: ✭ 781 (-94.97%)
Mutual labels:  crawler, spider, scraper
Scrapyrt
HTTP API for Scrapy spiders
Stars: ✭ 637 (-95.9%)
Mutual labels:  crawler, scraper, crawling
Scrapit
Scraping scripts for various websites.
Stars: ✭ 25 (-99.84%)
Mutual labels:  crawler, spider, scraper
Dataflowkit
Extract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (-97.06%)
Mutual labels:  scraper, scraping, crawling
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-99.86%)
Mutual labels:  scraper, spider, scraping
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-99.76%)
Mutual labels:  spider, scraping, crawling
Autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (-73.76%)
Mutual labels:  crawler, scraper, scraping
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-99.56%)
Mutual labels:  crawler, spider, crawling
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-99.36%)
Mutual labels:  crawler, scraping, crawling
Not Your Average Web Crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
Stars: ✭ 107 (-99.31%)
Mutual labels:  crawler, spider, scraper
Awesome Crawler
A collection of awesome web crawler,spider in different languages
Stars: ✭ 4,793 (-69.15%)
Mutual labels:  crawler, spider, scraper
Gosint
OSINT Swiss Army Knife
Stars: ✭ 401 (-97.42%)
Mutual labels:  crawler, spider, scraper
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (-47.65%)
Mutual labels:  crawler, spider, scraper
Webmagic
A scalable web crawler framework for Java.
Stars: ✭ 10,186 (-34.43%)
Mutual labels:  crawler, scraping, framework
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (-98.78%)
Mutual labels:  crawler, spider, scraper
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (-99.25%)
Mutual labels:  scraper, scraping
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (-88.02%)
Mutual labels:  crawler, spider
Free proxy website
获取免费socks/https/http代理的网站集合
Stars: ✭ 119 (-99.23%)
Mutual labels:  crawler, spider
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (-89.63%)
Mutual labels:  crawler, spider
Examples Of Web Crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Stars: ✭ 10,724 (-30.97%)
Mutual labels:  crawler, spider
Awesome Puppeteer
A curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (-88.88%)
Mutual labels:  scraping, crawling
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (-99.21%)
Mutual labels:  crawler, spider
Digger
Digger is a powerful and flexible web crawler implemented by pure golang
Stars: ✭ 130 (-99.16%)
Mutual labels:  crawler, spider
Weibo Topic Spider
微博超级话题爬虫,微博词频统计+情感分析+简单分类,新增肺炎超话爬取数据
Stars: ✭ 128 (-99.18%)
Mutual labels:  crawler, spider
Mm131
MM131网站图片爬取 🚨
Stars: ✭ 129 (-99.17%)
Mutual labels:  crawler, spider
Bilibili member crawler
B站用户爬虫 好耶~是爬虫
Stars: ✭ 115 (-99.26%)
Mutual labels:  crawler, spider
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-99.2%)
Mutual labels:  crawler, crawling
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (-99.12%)
Mutual labels:  scraper, scraping
1-60 of 2266 similar projects