Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-39.29%)

Mutual labels: scrapy, webscraping

hk0weather

Web scraper project to collect the useful Hong Kong weather data from HKO website

Stars: ✭ 49 (-56.25%)

Mutual labels: scrapy, webscraping

Python Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Stars: ✭ 615 (+449.11%)

Mutual labels: scrapy, selenium

XMQ-BackUp

小密圈备份，圈子/话题/图片/文件。

Stars: ✭ 22 (-80.36%)

Mutual labels: selenium, scrapy

schedule-tweet

Schedules tweets using TweetDeck

Stars: ✭ 14 (-87.5%)

Mutual labels: selenium, webscraping

python-crawler

爬虫学习仓库，适合零基础的人学习，对新手比较友好

Stars: ✭ 37 (-66.96%)

Mutual labels: selenium, scrapy

Pythonspidernotes

Python入门网络爬虫之精华版

Stars: ✭ 5,634 (+4930.36%)

Mutual labels: scrapy, selenium

Scrapy Selenium

Scrapy middleware to handle javascript pages using selenium

Stars: ✭ 550 (+391.07%)

Mutual labels: scrapy, selenium

image-crawler

An image scraper that scraps images from unsplash.com

Stars: ✭ 12 (-89.29%)

Mutual labels: selenium, webscraping

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+814.29%)

Mutual labels: scrapy, webscraping

chesf

CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages

Stars: ✭ 18 (-83.93%)

Mutual labels: selenium, webscraping

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-10.71%)

Mutual labels: scrapy, webscraping

InstaBot

Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.

Stars: ✭ 32 (-71.43%)

Mutual labels: selenium, scrapy

Instagram-Scraper-2021

Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).

Stars: ✭ 57 (-49.11%)

Mutual labels: selenium, webscraping

Sneakers Project

Using Selenium, Neha scraped data about 35 top selling sneakers of Nike and Adidas from stockx.com. She used this data to draw insights about sneaker resales.

Stars: ✭ 32 (-71.43%)

Mutual labels: selenium, webscraping

non-api-fb-scraper

Scrape public FaceBook posts from any group or user into a .csv file without needing to register for any API access

Stars: ✭ 40 (-64.29%)

Mutual labels: selenium, webscraping

E Commerce Crawlers

🚀电商网站爬虫合集，淘宝京东亚马逊等

Stars: ✭ 377 (+236.61%)

Mutual labels: scrapy, selenium

Mailinglistscraper

A python web scraper for public email lists.

Stars: ✭ 19 (-83.04%)

Mutual labels: scrapy, webscraping

View All Similar Projects ➔

Web Scraping with Python

Welcome to the code repository for Web Scraping with Python, Second Edition! I hope you find the code and data here useful. If you have any questions reach out to @kjam on Twitter or GitHub.

Code Structure

All of the code samples are in folders separated by chapter. Scripts are intended to be run from the code folder, allowing you to easily import from the chapters.

Code Examples

I have not included every code sample you've found in the book, but I have included a majority of the finished scripts. Although these are included, I encourage you to write out each code sample on your own and use these only as a reference.

Firefox Issues

Depending on your version of Firefox and Selenium, you may run into JavaScript errors. Here are some fixes:

Use an older version of Firefox
Upgrade Selenium to >=3.0.2 and download the geckodriver. Make sure the geckodriver is findable by your PATH variable. You can do this by adding this line to your .bashrc or .bash_profile. (Wondering what these are? Please read the Appendix C on learning the command line).
Use PhantomJS with Selenium (change your browser line to webdriver.PhantomJS('path/to/your/phantomjs/installation'))
Use Chrome, InternetExplorer or any other supported browser

Feel free to reach out if you have any questions!

Issues with Module Import

Seeing chp1 ModuleNotFound errors? Try adding this snippet to the file:

import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)))

What this does is append the main module to your system path, which is where Python looks for imports. On some installations, I have noticed the current directory is not immediately added (common practice), so this code explicitly adds that directory to your path.

Corrections?

If you find any issues in these code examples, feel free to submit an Issue or Pull Request. I appreciate your input!

First edition repository

If you are looking for the first edition's repository, you can find it here: Web Scraping with Python, First Edition

Questions?

Reach out to @kjam on Twitter or GitHub. @kjam is also often on freenode. :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 112

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗