《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

Stars: ✭ 118 (+42.17%)

Mutual labels: crawler, scrapy

Woid

Simple news aggregator displaying top stories in real time

Stars: ✭ 204 (+145.78%)

Mutual labels: news, crawler

Vault

swiss army knife for hackers

Stars: ✭ 346 (+316.87%)

Mutual labels: crawler, scrapy

Marmot

💐Marmot | Web Crawler/HTTP protocol Download Package 🐭

Stars: ✭ 186 (+124.1%)

Mutual labels: crawler, scrapy

Py3 scripts

Life is short, *****.

Stars: ✭ 5 (-93.98%)

Mutual labels: crawler, scrapy

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (+545.78%)

Mutual labels: crawler, scrapy

Scrapy Crawlera

Crawlera middleware for Scrapy

Stars: ✭ 281 (+238.55%)

Mutual labels: crawler, scrapy

N2h4

네이버 뉴스 수집을 위한 도구

Stars: ✭ 177 (+113.25%)

Mutual labels: news, crawler

Scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

Stars: ✭ 1,322 (+1492.77%)

Mutual labels: crawler, scrapy

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (+20.48%)

Mutual labels: crawler, scrapy

Python3 Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

Stars: ✭ 2,129 (+2465.06%)

Mutual labels: crawler, scrapy

Scrapy Redis

Redis-based components for Scrapy.

Stars: ✭ 4,998 (+5921.69%)

Mutual labels: crawler, scrapy

Scrapy Examples

Some scrapy and web.py exmaples

Stars: ✭ 71 (-14.46%)

Mutual labels: crawler, scrapy

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+13809.64%)

Mutual labels: news, crawler

Ttbot

今日头条机器人，支持用户登陆、关注、取消关注、获取关注粉丝、发文、发悟空问答、点赞、评论、采集各种类型新闻讯息等，使用今日头条网页版API实现

Stars: ✭ 338 (+307.23%)

Mutual labels: news, crawler

ptt-web-crawler

PTT 網路版爬蟲

Stars: ✭ 20 (-75.9%)

Mutual labels: crawler, scrapy

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+602.41%)

Mutual labels: crawler, scrapy

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (+657.83%)

Mutual labels: crawler, scrapy

Qqmusicspider

基于Scrapy的QQ音乐爬虫(QQ Music Spider)，爬取歌曲信息、歌词、精彩评论等，并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

Stars: ✭ 120 (+44.58%)

Mutual labels: crawler, scrapy

News Please

news-please - an integrated web crawler and information extractor for news that just works.

Stars: ✭ 969 (+1067.47%)

Mutual labels: news, crawler

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (+46.99%)

Mutual labels: crawler, scrapy

Hotnewsanalysis

利用文本挖掘技术进行新闻热点关注问题分析

Stars: ✭ 93 (+12.05%)

Mutual labels: news, crawler

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+667.47%)

Mutual labels: crawler, scrapy

Tsec

台灣上市上櫃股票爬蟲 Taiwan Stock Exchange Crawler

Stars: ✭ 327 (+293.98%)

Mutual labels: taiwan, crawler

Filesensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

Stars: ✭ 227 (+173.49%)

Mutual labels: crawler, scrapy

Ruiji.net

crawler framework, distributed crawler extractor

Stars: ✭ 220 (+165.06%)

Mutual labels: crawler, scrapy

Crawler

爬虫, http代理, 模拟登陆!

Stars: ✭ 106 (+27.71%)

Mutual labels: crawler, scrapy

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Stars: ✭ 63 (-24.1%)

Mutual labels: crawler, scrapy

Github Spider

Github 仓库及用户分析爬虫

Stars: ✭ 190 (+128.92%)

Mutual labels: crawler, scrapy

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+5915.66%)

Mutual labels: crawler, scrapy

Scrapy Azuresearch Crawler Samples

Scrapy as a Web Crawler for Azure Search Samples

Stars: ✭ 20 (-75.9%)

Mutual labels: crawler, scrapy

Crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+10010.84%)

Mutual labels: crawler, scrapy

Python Testing Crawler

A crawler for automated functional testing of a web application

Stars: ✭ 68 (-18.07%)

Mutual labels: crawler

Capturer

capture pictures from website like sina, lofter, huaban and so on

Stars: ✭ 76 (-8.43%)

Mutual labels: scrapy

Zhihuvapi

优雅地玩知乎

Stars: ✭ 67 (-19.28%)

Mutual labels: crawler

News

Cloud Native Daily Digest 云原生技术日报

Stars: ✭ 67 (-19.28%)

Mutual labels: news

Wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

Stars: ✭ 1,220 (+1369.88%)

Mutual labels: crawler

Pycontw2013tutorial

Python Conference Taiwan 2013 Tutorial

Stars: ✭ 75 (-9.64%)

Mutual labels: taiwan

Tracker Radar Collector

🕸 Modular, multithreaded, puppeteer-based crawler

Stars: ✭ 67 (-19.28%)

Mutual labels: crawler

Lxspider

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

Stars: ✭ 60 (-27.71%)

Mutual labels: crawler

Crawler examples

Some classic web crawler projects.一些经典的爬虫

Stars: ✭ 74 (-10.84%)

Mutual labels: crawler

Devtwitter

Bringing dev.to headlines to your Twitter browsing experience.

Stars: ✭ 66 (-20.48%)

Mutual labels: news

Feyz

Kafa açan içerikler

Stars: ✭ 64 (-22.89%)

Mutual labels: news

Newspaper

An aggregated newspaper app containing news from 10+ local news publishers in Hong Kong. Made with ❤

Stars: ✭ 82 (-1.2%)

Mutual labels: news

Swiftlinkpreview

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

Stars: ✭ 1,216 (+1365.06%)

Mutual labels: crawler

Useful Tools

A list of useful tools and programs for developers, DevOps and SysAdmins

Stars: ✭ 74 (-10.84%)

Mutual labels: news

Taobao duoshou

使用Scrapy采集淘宝数据，Flask展示

Stars: ✭ 63 (-24.1%)

Mutual labels: scrapy

Bee University

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

Stars: ✭ 73 (-12.05%)

Mutual labels: crawler

Tumblr Crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片，视频

Stars: ✭ 1,118 (+1246.99%)

Mutual labels: crawler

Hproxy

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

Stars: ✭ 62 (-25.3%)

Mutual labels: crawler

Puppeteer Walker

a puppeteer walker 🕷 🕸

Stars: ✭ 78 (-6.02%)

Mutual labels: crawler

Goscraper

Golang pkg to quickly return a preview of a webpage (title/description/images)

Stars: ✭ 72 (-13.25%)

Mutual labels: crawler

Boj Autocommit

When you solve the problem of Baekjoon Online Judge, it automatically commits and pushes to the remote repository.

Stars: ✭ 60 (-27.71%)

Mutual labels: crawler

1-60 of 847 similar projects

›

next*5