All Categories → Data Processing → crawler

Top 615 crawler open source projects

Crawler For Github Trending

🕷️ A node crawler for github trending.

✭ 172

javascript crawler

Python爬虫代理IP池(proxy pool)

✭ 13,964

python redis proxy flask crawler spider crawl proxypool ssdb

Web crawling framework based on asyncio.

✭ 2,002

python crawler asyncio spider aiohttp uvloop

Crawl some picture for fun

✭ 169

python crawler spider

Sitemap Generator Crawler

Script that generates a sitemap by crawling a given URL

✭ 169

crawler xml seo sitemap

抖音爬虫，tiktok crawler，抖音数据采集接口，抖音视频去水印，百分百成功，不需要服务器，不需要代理 IP。

✭ 169

Bitextor generates translation memories from multilingual websites.

✭ 168

python crawler translation tokenizer dictionaries crawl wget

Scrapingoutsourcing

ScrapingOutsourcing专注分享爬虫代码尽量每周更新一个

✭ 164

julia docker crawler spider scrapy requests appium

Polite, slim and concurrent web crawler.

✭ 1,962

go crawler robots-txt

Alternative for VK Audio API

✭ 160

audio music crawler scraper mp3 api-server vk

JS逆向研究

✭ 159

javascript python3 html reverse-engineering crawler spider

一款分布式爬虫平台，帮助你更好的管理和开发爬虫。内置一套爬虫定义规则（模版），可使用模版快速定义爬虫，也可当作框架手动开发爬虫。(兴趣使然的项目，用的不爽了就更新)

✭ 158

go golang crawler spider

DownZemAll! is a download manager for Windows, MacOS and Linux

✭ 157

crawler qt streaming download youtube-dl webextensions youtube-downloader download-manager video-downloader magnet-link torrent-client

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

✭ 1,961

C#cross-platform crawler spider unit-testing parsing netcore web-crawler netcore2 pluggable spiders csharp-library abot netstandard20 netcore3 javascript-renderer netstandard21 abot-nuget netsta

Instagram Scraper

scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot

✭ 2,209

python bot crawler instagram scraper scrape ig igramscraper

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

✭ 2,055

PHP javascript shell crawler concurrency guzzle

Weibo wordcloud

根据关键词抓取微博数据，再生成词云

✭ 154

python search crawler weibo

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

✭ 2,129

python HTML TSQL crawler spider selenium scrapy taobao crawl splash geek scrapy-crawler meituan dianping pyppeteer

Dynamic meta tags in your AngularJS single page application

✭ 152

javascript crawler angularjs seo opengraph meta-tags

A lite distributed Java spider framework :-)

✭ 151

java crawler distributed-systems spider rabbitmq distributed

📢 Ptt 文章通知機器人！Notify Ptt Article in Realtime

✭ 150

go crawler chatbot telegram-bot messenger-bot

Dxy Covid 19 Crawler

2019新型冠状病毒疫情实时爬虫及API | COVID-19/2019-nCoV Realtime Infection Crawler and API

✭ 1,865

python crawler realtime-api 2019-ncov

CoCrawler is a versatile web crawler built using modern tools and concurrency.

✭ 148

python python3 crawler concurrency screenshot aiohttp

dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites

✭ 1,853

go Makefile Dockerfile react vue angular vuejs reactjs crawler ssr spa puppeteer server-side-rendering seo chrome-devtools chrome-headless seo-optimization dynamic-rendering

一些爬虫的代码

✭ 147

python2 jupyter-notebook crawler

简单、易用、高效一个有态度的开源.Net Http请求框架!可以用制作爬虫，api请求等等。

✭ 146

crawler net-core

Th Music Video Generator

Touhou Project random music video generator/player, crawling image and video from websites to generate MV.

✭ 146

javascript web crawler

Enjoy driving on a Javascriptive (originally Pythonic) way to Japanese AV!

✭ 147

javascript react crawler

Go process used to crawl websites

✭ 147

go golang docker crawler crawling

Python Dcdownloader

由Python编写的全异步实现的动漫之家(dmzj)漫画批量下载器（爬虫）

✭ 146

python crawler downloader

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

✭ 143

nlp dataset crawler sentiment-analysis named-entity-recognition corpus pos-tagging

🔥 Shadowsocks 账号爬虫

✭ 145

python crawler shadowsocks

Youtube Projects

This repository contains all the code I use in my YouTube tutorials.

✭ 144

javascript python html css chrome-extension algorithms google crawler youtube website jquery-plugin scraper project easy webscraping

Crawler China Mainland Universities

中国大陆大学列表爬虫

✭ 143

javascript nodejs data crawler spider china university school

Google Play Scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

✭ 143

python crawler scraper

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

✭ 142

To crawl all csgo skins from website.

✭ 139

python crawler steam

Amazonbigspider

😱Full Automatic Amazon Distributed Spider | 亚马逊分布式四国际站采集选款产品|账号admin,密码adminadmin

✭ 140

golang crawler spider amazon amazon-web-services

An Instagram bot developed using the Selenium Framework

✭ 138

python python3 bot automation crawler instagram selenium selenium-webdriver instagram-api crawling

An Open Source Search Engine

✭ 139

search crawler search-engine

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

✭ 1,745

go crawler spider pipeline schedule

Koreanewscrawler

대량의 뉴스 데이터를 수집하기 위해 만들어진 뉴스 크롤러입니다.

✭ 138

一个获取知乎用户主页信息的多线程Python爬虫程序。

✭ 137

python jupyter-notebook crawler matplotlib requests

This repository is no longer maintained.

✭ 137

python bot crawler instagram scraper instagram-api instagram-client

4chan Downloader

Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation

✭ 136

python python3 crawler downloader download download-manager

Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.

✭ 134

go golang crawler

News, full-text, and article metadata extraction in Python 3. Advanced docs:

✭ 11,545

python crawler scraper news crawling news-aggregator

All in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers

MM131网站图片爬取 🚨

✭ 129

python crawler spider

Digger is a powerful and flexible web crawler implemented by pure golang

✭ 130

go crawler spider

Weibo Topic Spider

微博超级话题爬虫，微博词频统计+情感分析+简单分类，新增肺炎超话爬取数据

✭ 128

python crawler spider weibo topic

Kuaishou Crawler

As you can see, a kuaishou crawler

✭ 126

Sina Weibo Album Downloader

Multithreading download all HD photos / pictures from someone's Sina Weibo album.

✭ 125

python crawler weibo

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

✭ 125

javascript chrome crawler puppeteer headless-chrome crawling chrome-headless

字体混淆服务

✭ 125

GUI based offensive penetration testing tool (Open Source)

✭ 124

python docker security dockerfile open-source gui parser tool crawler penetration-testing cluster cybersecurity offensive-security sql-injection sniffing

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

✭ 122

vue crawler spider scrapy platform web-crawler

Skill Share Crawler Dl

Download Videos Skill Share per ID or per Class

✭ 122

javascript nodejs crawler download videos

基于Scrapy的QQ音乐爬虫(QQ Music Spider)，爬取歌曲信息、歌词、精彩评论等，并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

✭ 120

python music crawler scrapy

简单易用的Python爬虫框架，QQ交流群：597510560

✭ 1,611

python crawler spider multiprocessing multi-threading web-crawler proxies python-spider web-spider

61-120 of 615 crawler projects