robotstxtrobots.txt file parsing and checking for R
Stars: ✭ 65 (+242.11%)
Goribot[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (+900%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (+15.79%)
OpenScraperAn open source webapp for scraping: towards a public service for webscraping
Stars: ✭ 80 (+321.05%)
FbcrawlA Facebook crawler
Stars: ✭ 536 (+2721.05%)
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-21.05%)
SpydanA web spider for shodan.io without using the Developer API.
Stars: ✭ 30 (+57.89%)
Mimo-CrawlerA web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
Stars: ✭ 22 (+15.79%)
CrawlerA high performance web crawler in Elixir.
Stars: ✭ 781 (+4010.53%)
metacritic apiPHP Metacritic API - Mirrored by my GitLab
Stars: ✭ 31 (+63.16%)
scrapy-adminA django admin site for scrapy
Stars: ✭ 44 (+131.58%)
SeekerSeeker - another job board aggregator.
Stars: ✭ 16 (-15.79%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+173.68%)
elves🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+1594.74%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+100%)
FunpyspidersearchengineWord2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+4015.79%)
newsembleAPI for fetching data from news websites.
Stars: ✭ 42 (+121.05%)
hk0weatherWeb scraper project to collect the useful Hong Kong weather data from HKO website
Stars: ✭ 49 (+157.89%)
python-spiderpython爬虫小项目【持续更新】【笔趣阁小说下载、Tweet数据抓取、天气查询、网易云音乐逆向、天天基金网查询、微博数据抓取(生成cookie)、有道翻译逆向、企查查免登陆爬虫、大众点评svg加密破解、B站用户爬虫、拉钩免登录爬虫、自如租房字体加密、知乎问答
Stars: ✭ 45 (+136.84%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+257.89%)
PttImageSpiderPTT 圖片下載器 (抓取整個看板的圖片,並用文章標題作為資料夾的名稱 ) (使用Scrapy)
Stars: ✭ 16 (-15.79%)
ip proxy poolGenerating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.
Stars: ✭ 39 (+105.26%)
RcrawlerAn R web crawler and scraper
Stars: ✭ 274 (+1342.11%)
Douban CrawlerUno Crawler por https://douban.com
Stars: ✭ 13 (-31.58%)
Java Spider一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。
Stars: ✭ 276 (+1352.63%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+1526.32%)
SpidrA versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+3352.63%)
163Music163music spider by scrapy.
Stars: ✭ 60 (+215.79%)
devsearchA web search engine built with Python which uses TF-IDF and PageRank to sort search results.
Stars: ✭ 52 (+173.68%)
aliexscrapeGet Aliexpress product details in JSON
Stars: ✭ 80 (+321.05%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+3252.63%)
NScrapyNScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (+363.16%)
IcrawlerA multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+3210.53%)
newspaperjsNews extraction and scraping. Article Parsing
Stars: ✭ 59 (+210.53%)
Python Spider豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Stars: ✭ 615 (+3136.84%)
Freshonions TorscraperFresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Stars: ✭ 348 (+1731.58%)
toutiao今日头条科技新闻接口爬虫
Stars: ✭ 17 (-10.53%)
Instagram-Scraper-2021Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).
Stars: ✭ 57 (+200%)
arachnodHigh performance crawler for Nodejs
Stars: ✭ 17 (-10.53%)
Happy Spiders🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。
Stars: ✭ 261 (+1273.68%)
AlltheplacesA set of spiders and scrapers to extract location information from places that post their location on the internet.
Stars: ✭ 277 (+1357.89%)
allitebooks.comDownload all the ebooks with indexed csv of "allitebooks.com"
Stars: ✭ 24 (+26.32%)
Xcrawler快速、简洁且强大的PHP爬虫框架
Stars: ✭ 344 (+1710.53%)
XidelCommand line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Stars: ✭ 335 (+1663.16%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+21357.89%)
GosintOSINT Swiss Army Knife
Stars: ✭ 401 (+2010.53%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+2215.79%)
TikTokDownloader PyWebIO🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
Stars: ✭ 919 (+4736.84%)
bing-ip2hostsbingip2hosts is a Bing.com web scraper that discovers websites by IP address
Stars: ✭ 99 (+421.05%)
Elves🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 315 (+1557.89%)