All Projects → kjam → Wswp

kjam / Wswp

Code for the second edition Web Scraping with Python book by Packt Publications

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Wswp

allitebooks.com
Download all the ebooks with indexed csv of "allitebooks.com"
Stars: ✭ 24 (-78.57%)
Mutual labels:  scrapy, webscraping
Post Tuto Deployment
Build and deploy a machine learning app from scratch 🚀
Stars: ✭ 368 (+228.57%)
Mutual labels:  scrapy, selenium
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-39.29%)
Mutual labels:  scrapy, webscraping
hk0weather
Web scraper project to collect the useful Hong Kong weather data from HKO website
Stars: ✭ 49 (-56.25%)
Mutual labels:  scrapy, webscraping
Python Spider
豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章
Stars: ✭ 615 (+449.11%)
Mutual labels:  scrapy, selenium
XMQ-BackUp
小密圈备份,圈子/话题/图片/文件。
Stars: ✭ 22 (-80.36%)
Mutual labels:  selenium, scrapy
schedule-tweet
Schedules tweets using TweetDeck
Stars: ✭ 14 (-87.5%)
Mutual labels:  selenium, webscraping
python-crawler
爬虫学习仓库,适合零基础的人学习,对新手比较友好
Stars: ✭ 37 (-66.96%)
Mutual labels:  selenium, scrapy
Pythonspidernotes
Python入门网络爬虫之精华版
Stars: ✭ 5,634 (+4930.36%)
Mutual labels:  scrapy, selenium
Scrapy Selenium
Scrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+391.07%)
Mutual labels:  scrapy, selenium
image-crawler
An image scraper that scraps images from unsplash.com
Stars: ✭ 12 (-89.29%)
Mutual labels:  selenium, webscraping
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+814.29%)
Mutual labels:  scrapy, webscraping
chesf
CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-83.93%)
Mutual labels:  selenium, webscraping
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-10.71%)
Mutual labels:  scrapy, webscraping
InstaBot
Simple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-71.43%)
Mutual labels:  selenium, scrapy
Instagram-Scraper-2021
Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).
Stars: ✭ 57 (-49.11%)
Mutual labels:  selenium, webscraping
Sneakers Project
Using Selenium, Neha scraped data about 35 top selling sneakers of Nike and Adidas from stockx.com. She used this data to draw insights about sneaker resales.
Stars: ✭ 32 (-71.43%)
Mutual labels:  selenium, webscraping
non-api-fb-scraper
Scrape public FaceBook posts from any group or user into a .csv file without needing to register for any API access
Stars: ✭ 40 (-64.29%)
Mutual labels:  selenium, webscraping
E Commerce Crawlers
🚀电商网站爬虫合集,淘宝京东亚马逊等
Stars: ✭ 377 (+236.61%)
Mutual labels:  scrapy, selenium
Mailinglistscraper
A python web scraper for public email lists.
Stars: ✭ 19 (-83.04%)
Mutual labels:  scrapy, webscraping

Web Scraping with Python

Welcome to the code repository for Web Scraping with Python, Second Edition! I hope you find the code and data here useful. If you have any questions reach out to @kjam on Twitter or GitHub.

Code Structure

All of the code samples are in folders separated by chapter. Scripts are intended to be run from the code folder, allowing you to easily import from the chapters.

Code Examples

I have not included every code sample you've found in the book, but I have included a majority of the finished scripts. Although these are included, I encourage you to write out each code sample on your own and use these only as a reference.

Firefox Issues

Depending on your version of Firefox and Selenium, you may run into JavaScript errors. Here are some fixes:

  • Use an older version of Firefox
  • Upgrade Selenium to >=3.0.2 and download the geckodriver. Make sure the geckodriver is findable by your PATH variable. You can do this by adding this line to your .bashrc or .bash_profile. (Wondering what these are? Please read the Appendix C on learning the command line).
  • Use PhantomJS with Selenium (change your browser line to webdriver.PhantomJS('path/to/your/phantomjs/installation'))
  • Use Chrome, InternetExplorer or any other supported browser

Feel free to reach out if you have any questions!

Issues with Module Import

Seeing chp1 ModuleNotFound errors? Try adding this snippet to the file:

import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)))

What this does is append the main module to your system path, which is where Python looks for imports. On some installations, I have noticed the current directory is not immediately added (common practice), so this code explicitly adds that directory to your path.

Corrections?

If you find any issues in these code examples, feel free to submit an Issue or Pull Request. I appreciate your input!

First edition repository

If you are looking for the first edition's repository, you can find it here: Web Scraping with Python, First Edition

Questions?

Reach out to @kjam on Twitter or GitHub. @kjam is also often on freenode. :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].