All Projects → Pdf_downloader → Similar Projects or Alternatives

298 Open source projects that are alternatives of or similar to Pdf_downloader

HTTP API for Scrapy spiders

Stars: ✭ 637 (+3438.89%)

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (+277.78%)

Mutual labels: crawling, scrapy

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (+111.11%)

Mutual labels: crawling, scrapy

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+583.33%)

Mutual labels: crawling, scrapy

Scrapy Selenium

Scrapy middleware to handle javascript pages using selenium

Stars: ✭ 550 (+2955.56%)

Mutual labels: scrapy, crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (+455.56%)

Mutual labels: scrapy, crawling

scrapy-fieldstats

A Scrapy extension to log items coverage when the spider shuts down

Stars: ✭ 17 (-5.56%)

Mutual labels: crawling, scrapy

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+3138.89%)

Mutual labels: scrapy, crawling

Post Tuto Deployment

Build and deploy a machine learning app from scratch 🚀

Stars: ✭ 368 (+1944.44%)

Mutual labels: scrapy

Vault

swiss army knife for hackers

Stars: ✭ 346 (+1822.22%)

Mutual labels: scrapy

Elves

🎊 Design and implement of lightweight crawler framework.

Stars: ✭ 315 (+1650%)

Mutual labels: scrapy

Advanced Web Scraping Tutorial

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

Stars: ✭ 384 (+2033.33%)

Mutual labels: scrapy

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+28394.44%)

Mutual labels: crawling

Awesome Scrapy

A curated list of awesome packages, articles, and other cool resources from the Scrapy community.

Stars: ✭ 360 (+1900%)

Mutual labels: scrapy

Webhubbot

Python + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website.

Stars: ✭ 5,427 (+30050%)

Mutual labels: scrapy

Scrapy Redis

Redis-based components for Scrapy.

Stars: ✭ 4,998 (+27666.67%)

Mutual labels: scrapy

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Stars: ✭ 309 (+1616.67%)

Mutual labels: scrapy

Scrapy Crawlera

Crawlera middleware for Scrapy

Stars: ✭ 281 (+1461.11%)

Mutual labels: scrapy

Alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Stars: ✭ 277 (+1438.89%)

Mutual labels: scrapy

Funpyspidersearchengine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Stars: ✭ 782 (+4244.44%)

Mutual labels: scrapy

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+1438.89%)

Mutual labels: crawling

Scrapy Fake Useragent

Random User-Agent middleware based on fake-useragent

Stars: ✭ 520 (+2788.89%)

Mutual labels: scrapy

Apify Js

Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+17422.22%)

Mutual labels: crawling

Tieba spider

百度贴吧爬虫(基于scrapy和mysql)

Stars: ✭ 257 (+1327.78%)

Mutual labels: scrapy

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+26772.22%)

Mutual labels: crawling

tripadvisor-scraper

TripAdvisor scraper

Stars: ✭ 63 (+250%)

Mutual labels: scrapy

Files

Docs and files for ScrapydWeb, Scrapyd, Scrapy, and other projects

Stars: ✭ 390 (+2066.67%)

Mutual labels: scrapy

Wechatsogou

基于搜狗微信搜索的微信公众号爬虫接口

Stars: ✭ 5,220 (+28900%)

Mutual labels: scrapy

E Commerce Crawlers

🚀电商网站爬虫合集，淘宝京东亚马逊等

Stars: ✭ 377 (+1994.44%)

Mutual labels: scrapy

Tweetscraper

TweetScraper is a simple crawler/spider for Twitter Search without using API

Stars: ✭ 694 (+3755.56%)

Mutual labels: scrapy

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (+1922.22%)

Mutual labels: crawling

Spider python

python爬虫

Stars: ✭ 557 (+2994.44%)

Mutual labels: scrapy

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+4283.33%)

Mutual labels: crawling

ip proxy pool

Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.

Stars: ✭ 39 (+116.67%)

Mutual labels: scrapy

Spidermon

Scrapy Extension for monitoring spiders execution.

Stars: ✭ 309 (+1616.67%)

Mutual labels: crawling

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (+2877.78%)

Mutual labels: scrapy

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (+1488.89%)

Mutual labels: crawling

Faster Than Requests

Faster requests on Python 3

Stars: ✭ 639 (+3450%)

Mutual labels: scrapy

Stopstalk Deployment

Stop stalking and start StopStalking 😉

Stars: ✭ 276 (+1433.33%)

Mutual labels: crawling

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+27638.89%)

Mutual labels: scrapy

Seeker

Seeker - another job board aggregator.

Stars: ✭ 16 (-11.11%)

Mutual labels: scrapy

Python Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Stars: ✭ 615 (+3316.67%)

Mutual labels: scrapy

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+2433.33%)

Mutual labels: crawling

PttImageSpider

PTT 圖片下載器 (抓取整個看板的圖片，並用文章標題作為資料夾的名稱 ) (使用Scrapy)

Stars: ✭ 16 (-11.11%)

Mutual labels: scrapy

Happy Spiders

🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。