A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+215.38%)

Mutual labels: crawler

Weibo Topic Spider

微博超级话题爬虫，微博词频统计+情感分析+简单分类，新增肺炎超话爬取数据

Stars: ✭ 128 (-38.46%)

Mutual labels: crawler

Price Monitor

京东商品价格监控：监控用户设定商品价格，降价邮件/微信提醒。技术：Python爬虫/IP代理池/JS接口爬取/Selenium页面爬取

Stars: ✭ 634 (+204.81%)

Mutual labels: crawler

IpProxyPool

Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip crawler 爬虫 mysql viper cobra

Stars: ✭ 36 (-82.69%)

Mutual labels: proxypool

Sina Weibo Album Downloader

Multithreading download all HD photos / pictures from someone's Sina Weibo album.

Stars: ✭ 125 (-39.9%)

Mutual labels: crawler

Weibo wordcloud

根据关键词抓取微博数据，再生成词云

Stars: ✭ 154 (-25.96%)

Mutual labels: crawler

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-72.6%)

Mutual labels: crawler

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (+183.17%)

Mutual labels: crawler

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (+1050%)

Mutual labels: crawler

Douyin

API of DouYin for Humans used to Crawl Popular Videos and Musics

Stars: ✭ 580 (+178.85%)

Mutual labels: crawler

Fontobfuscator

字体混淆服务

Stars: ✭ 125 (-39.9%)

Mutual labels: crawler

Filemasta

A search application to explore, discover and share online files

Stars: ✭ 571 (+174.52%)

Mutual labels: crawler

mpapi

🐤 小程序API兼容插件，一次编写，多端运行。支持：微信小程序、支付宝小程序、百度智能小程序、字节跳动小程序

Stars: ✭ 40 (-80.77%)

Mutual labels: baidu

Xxl Crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Stars: ✭ 561 (+169.71%)

Mutual labels: crawler

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-41.35%)

Mutual labels: crawler

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+2365.87%)

Mutual labels: crawler

Videoserver

以Node.js基于express以及爬虫实现的视频资源后端

Stars: ✭ 200 (-3.85%)

Mutual labels: crawler

Scrapy Redis

Redis-based components for Scrapy.

Stars: ✭ 4,998 (+2302.88%)

Mutual labels: crawler

Qqmusicspider

基于Scrapy的QQ音乐爬虫(QQ Music Spider)，爬取歌曲信息、歌词、精彩评论等，并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

Stars: ✭ 120 (-42.31%)

Mutual labels: crawler

deepspeech.mxnet

A MXNet implementation of Baidu's DeepSpeech architecture

Stars: ✭ 82 (-60.58%)

Mutual labels: baidu

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+2300.48%)

Mutual labels: crawler

Tiebamanager

（已跑路）百度贴吧吧务管理工具，自动扫描帖子并处理违规帖

Stars: ✭ 119 (-42.79%)

Mutual labels: crawler

Scan T

a new crawler based on python with more function including Network fingerprint search

Stars: ✭ 504 (+142.31%)

Mutual labels: crawler

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (-4.81%)

Mutual labels: crawler

News feed

🐨实时监控1000家中国企业的新闻动态

Stars: ✭ 491 (+136.06%)

Mutual labels: crawler

Free proxy website

获取免费socks/https/http代理的网站集合

Stars: ✭ 119 (-42.79%)

Mutual labels: crawler

Free proxy pool

对免费代理IP网站进行爬取，收集汇总为自己的代理池。关键是验证代理的有效性、匿名性、去重复

Stars: ✭ 66 (-68.27%)

Mutual labels: proxypool

Picacomic downloader

哔咔漫画收藏夹下载程序

Stars: ✭ 57 (-72.6%)

Mutual labels: crawler

Scrapedin

LinkedIn Scraper (currently working 2020)

Stars: ✭ 453 (+117.79%)

Mutual labels: crawler

Docs

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

Stars: ✭ 118 (-43.27%)

Mutual labels: crawler

Bookcorpus

Crawl BookCorpus

Stars: ✭ 443 (+112.98%)

Mutual labels: crawler

Arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

Stars: ✭ 224 (+7.69%)

Mutual labels: crawler

Python3 Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Stars: ✭ 2,129 (+923.56%)

Mutual labels: crawler

Leetcode Ranking Search

Leetcode Contest Ranking Searcher

Stars: ✭ 51 (-75.48%)

Mutual labels: crawler

Html2article

Html网页正文提取

Stars: ✭ 441 (+112.02%)

Mutual labels: crawler

Runoob Pdf

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

Stars: ✭ 430 (+106.73%)

Mutual labels: crawler

lcg-php

百度莱茨狗 php 抓取,提交,增加99%识别率API

Stars: ✭ 11 (-94.71%)

Mutual labels: baidu

Iclr2020 Openreviewdata

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

Stars: ✭ 426 (+104.81%)

Mutual labels: crawler

Opensearchserver

Open-source Enterprise Grade Search Engine Software

Stars: ✭ 408 (+96.15%)

Mutual labels: crawler

Google Group Crawler

Get (almost) original messages from google group archives. Your data is yours.

Stars: ✭ 190 (-8.65%)

Mutual labels: crawler

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (+92.79%)

Mutual labels: crawler

Memex Explorer

Viewers for statistics and dashboarding of Domain Search Engine data

Stars: ✭ 115 (-44.71%)

Mutual labels: crawler

Bilili

🍻 bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

Stars: ✭ 379 (+82.21%)

Mutual labels: crawler

terminal-translate

a terminal-translate tool

Stars: ✭ 73 (-64.9%)

Mutual labels: baidu

Images Web Crawler

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..

Stars: ✭ 51 (-75.48%)

Mutual labels: crawler

Ngmeta

Dynamic meta tags in your AngularJS single page application

Stars: ✭ 152 (-26.92%)

Mutual labels: crawler

Lyrics Crawler

Get the lyrics for the song currently playing on Spotify

Stars: ✭ 49 (-76.44%)

Mutual labels: crawler

Jianso movie

🎬 电影资源爬虫,电影图片抓取脚本,Flask|Nginx|wsgi