alaminopu / Pdf_downloader
Licence: gpl-3.0
A Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Pdf downloader
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+455.56%)
Mutual labels: scrapy, crawling
Scrapy Selenium
Scrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+2955.56%)
Mutual labels: scrapy, crawling
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+583.33%)
Mutual labels: crawling, scrapy
scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-5.56%)
Mutual labels: crawling, scrapy
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+277.78%)
Mutual labels: crawling, scrapy
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+111.11%)
Mutual labels: crawling, scrapy
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+3138.89%)
Mutual labels: scrapy, crawling
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+28394.44%)
Mutual labels: crawling
Webhubbot
Python + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website.
Stars: ✭ 5,427 (+30050%)
Mutual labels: scrapy
Tweetscraper
TweetScraper is a simple crawler/spider for Twitter Search without using API
Stars: ✭ 694 (+3755.56%)
Mutual labels: scrapy
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+4283.33%)
Mutual labels: crawling
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+4244.44%)
Mutual labels: scrapy
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+3394.44%)
Mutual labels: scrapy
House Renting
Possibly the best practice of Scrapy 🕷 and renting a house 🏡
Stars: ✭ 741 (+4016.67%)
Mutual labels: scrapy
Scrapy PDF Downloader
A Scrapy Spider for downloading PDF files from a webpage.
Installation
- Create a virtualenv - How to create virtualenv
- Activate the virtualenv -
source path/to/bin/activate
- Run
pip install -r requirements.txt
Note: Skip this section if you running using docker
Run
scrapy runspider pdf_downloader.py
scrapy runspider download_humblebundle.py
Run using docker
docker-compose run download
Download Humble Bundle PDF/EPUB
docker-compose run download_humblebundle
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].