All Projects → alaminopu → Pdf_downloader

alaminopu / Pdf_downloader

Licence: gpl-3.0
A Scrapy Spider for downloading PDF files from a webpage.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pdf downloader

Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+455.56%)
Mutual labels:  scrapy, crawling
Scrapyrt
HTTP API for Scrapy spiders
Stars: ✭ 637 (+3438.89%)
Mutual labels:  scrapy, crawling
Scrapy Selenium
Scrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+2955.56%)
Mutual labels:  scrapy, crawling
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+583.33%)
Mutual labels:  crawling, scrapy
scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-5.56%)
Mutual labels:  crawling, scrapy
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+277.78%)
Mutual labels:  crawling, scrapy
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+111.11%)
Mutual labels:  crawling, scrapy
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+3138.89%)
Mutual labels:  scrapy, crawling
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+28394.44%)
Mutual labels:  crawling
Webhubbot
Python + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website.
Stars: ✭ 5,427 (+30050%)
Mutual labels:  scrapy
Spider python
python爬虫
Stars: ✭ 557 (+2994.44%)
Mutual labels:  scrapy
Wechatsogou
基于搜狗微信搜索的微信公众号爬虫接口
Stars: ✭ 5,220 (+28900%)
Mutual labels:  scrapy
Tweetscraper
TweetScraper is a simple crawler/spider for Twitter Search without using API
Stars: ✭ 694 (+3755.56%)
Mutual labels:  scrapy
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+4283.33%)
Mutual labels:  crawling
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+4244.44%)
Mutual labels:  scrapy
Faster Than Requests
Faster requests on Python 3
Stars: ✭ 639 (+3450%)
Mutual labels:  scrapy
Scrapy Redis
Redis-based components for Scrapy.
Stars: ✭ 4,998 (+27666.67%)
Mutual labels:  scrapy
Fbcrawl
A Facebook crawler
Stars: ✭ 536 (+2877.78%)
Mutual labels:  scrapy
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+3394.44%)
Mutual labels:  scrapy
House Renting
Possibly the best practice of Scrapy 🕷 and renting a house 🏡
Stars: ✭ 741 (+4016.67%)
Mutual labels:  scrapy

Scrapy PDF Downloader

A Scrapy Spider for downloading PDF files from a webpage.

Installation

  1. Create a virtualenv - How to create virtualenv
  2. Activate the virtualenv - source path/to/bin/activate
  3. Run pip install -r requirements.txt

Note: Skip this section if you running using docker

Run

scrapy runspider pdf_downloader.py

scrapy runspider download_humblebundle.py

Run using docker

docker-compose run download

Download Humble Bundle PDF/EPUB

docker-compose run download_humblebundle

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].