Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

✭ 34

python elasticsearch distributed-systems crawler django scrapy

advanced-php-crawler

新浪博客文章/wenku8轻小说文库爬虫，可抓取图片保存，一键制作电子书。kindle读书党的神器！

✭ 26

PHP crawler gitbook kindle calibre sina

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Spider

💫 Spider is a PHP library with easily module integration for crawling website that allows you to scrape informations.

✭ 14

PHP Dockerfile crawler symfony command-line seo-optimization scrapper

INMET-API-temperature

Crawler dos dados metereológicos de estações convencionais do INMET (BDMEP)

✭ 32

python crawler scraper temperatura inmet base-de-dados

All-IT-eBooks-Spider

[Updated] A simple python crawler for my tutorial blog at http://www.jianshu.com/p/8fb5bc33c78e

✭ 53

python crawler

iranian-calendar-events

Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website

✭ 28

javascript events crawler persian jalali-calendar jalali iranian

php-crawler

🕷️ A simple crawler (spider) writen in php just for fun, with zero dependencies

✭ 39

PHP crawler spider

crawl

Lightweight library for scalable crawlers in Go.

✭ 20

go Makefile crawler crawl

crawler

nodejs 爬虫框架. crawler framework for nodejs

✭ 42

typescript nodejs crawler

DouyuBarrage-Pro

(2020年最新)斗鱼弹幕抓取及可视化管理平台第二版，提供弹幕抓取、弹幕实时发送速度可视化、抓取记录查询、弹幕下载、自定义关键词统计、铁粉统计、高光时刻自动捕获、高频弹幕词云等功能，起飞~~~

✭ 139

typescript HTML crawler data-visualization danmu management-system douyu douyutv barrage

vietnam-ecommerce-crawler

Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs

✭ 28

python crawler scrapy

asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

✭ 86

python crawler aiohttp asyncio scrapy asyncpy

grapy

Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.

✭ 18

python crawler spider

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

✭ 53

chrome-extension crawler scraper awesome spider scraping crawl awesome-list chrome-extensions

googleplay api

Google Play Unofficial Python 3 API Library

✭ 21

python android crawler playstore googleplay googleplay-api

rolling-news

获取滚动新闻

✭ 44

python crawler news sina rolling-news

netease-music-cracker

🎵 缓存文件转换为 MP3 文件

✭ 406

python Batchfile crawler regex mp3

podcastcrawler

PHP library to find podcasts

✭ 40

PHP crawler podcast crawling itunes podcast-reader mp3-files itunes-podcast-feed itunes-api

Web-Iota

Iota is a web scraper which can find all of the images and links/suburls on a webpage

✭ 60

python crawler osint spider scrapy osint-python

ytpriv

YT metadata exporter

✭ 28

go json crawler youtube csv big-data video datascience

sponge

sponge is a website crawler and links downloader command-line tool

✭ 37

kotlin website crawler downloader links sponge command-line wtfpl crawl-pages website-crawler link-downloader crawling-sites file-downloader

web-crawler

Python Web Crawler with Selenium and PhantomJS

✭ 19

python Roff crawler scraper phantomjs webcrawler

frisbee

Collect email addresses by crawling search engine results.

✭ 29

python Batchfile Makefile crawler automation osint emails penetration-testing harvester

crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

✭ 70

PHP HTML middleware crawler web-scraping automated-testing crawling-framework web-search

findmeaflat

Get notified of new listings on popular German real estate portals.

✭ 21

javascript Makefile Dockerfile crawler telegram

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

✭ 84

CSS javascript python HTML Smarty crawler spider dynamic scrapy

nasty

NASTY Advanced Search Tweet Yielder

✭ 50

python Makefile crawler twitter

crawler

Nodejs crawler for cnbeta.com

✭ 18

javascript nodejs crawler

python-crawler

Python Crawler

✭ 69

python crawler python-crawler

lopez

Crawling and scraping the Web for fun and profit

✭ 20

rust PLpgSQL shell crawler scraper seo web-scraping

actor-youtube-scraper

Apify actor to scrape Youtube search results. You can set the maximum videos to scrape per page as well as the date from which to start scraping.

✭ 20

javascript Dockerfile search crawler youtube apifier apify pupetteer

Amazon-Price-Alert

Price tracker of Amazon

✭ 83

python crawler amazon switch price-tracker

crawler-client

crawler dev tools using electron webview

✭ 14

javascript HTML electron jquery crawler

NEEA-TOEFL-Testseat-Crawler

托福考位爬虫 NEEA TOEFL Testseat Crawler

✭ 18

python crawler toefl-ibt

diskover-community

Diskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch

PyTse

TseTmc Crawler

✭ 40

python crawler stock stock-prices tse tsetmc

lezhin-comics-downloader

📥 Downloader for lezhin comics

✭ 30

java groovy crawler scraper downloader selenium webtoon lezhin webtoon-crawler webtoon-downloader lezhin-scraper webtoon-scraper lezhin-downloader

jd-autobuy

Python爬虫，京东自动登录，在线抢购商品

✭ 1,262

python crawler scraper jingdong

qr-pirate

crawl QR-codes from search engines and look for bitcoin private keys

✭ 58

python shell crawler bitcoin qrcode qr-code cryptocurrency bitcoin-wallet qrcode-reader private-key

CrawlerSamples

This is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.

✭ 36

C#crawler dotnetcore headless anglesharp headless-browsers headless-chrome chsarp headless-chromium puppeteer

ctrip spider

Scrape Learning (ctrip)

✭ 77

python javascript crawler cookie ctrip eleven

pdf-crawler

SimFin's open source PDF crawler

✭ 100

python pdf crawler crawling selenium-webdriver geckodriver puppeteer pdf-crawler

gscholar-citations-crawler

Crawl all your citations from Google Scholar

✭ 43

python TeX Makefile crawler research citation publications google-scholar

541-600 of 615 crawler projects

first

‹

›