Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Stars: ✭ 153 (+183.33%)

Mutual labels: web-scraping, web-scraper

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+1114.81%)

Mutual labels: web-scraping, web-scraper

Juno crawler

Scrapy crawler to collect data on the back catalog of songs listed for sale.

Stars: ✭ 150 (+177.78%)

Mutual labels: scrapy, web-scraping

Scrapy Training

Scrapy Training companion code

Stars: ✭ 157 (+190.74%)

Mutual labels: scrapy, web-scraping

Scrapy Fake Useragent

Random User-Agent middleware based on fake-useragent

Stars: ✭ 520 (+862.96%)

Mutual labels: scrapy, web-scraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (+44.44%)

Mutual labels: web-scraping, web-scraper

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+174.07%)

Mutual labels: web-scraping, web-scraper

Social Media Profile Scrapers

Fetch user's data across social media

Stars: ✭ 60 (+11.11%)

Mutual labels: web-scraping, web-scraper

restaurant-finder-featureReviews

Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

Stars: ✭ 21 (-61.11%)

Mutual labels: web-scraping, scrapy

Daftlistings

A library that enables programmatic interaction with daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.

Stars: ✭ 86 (+59.26%)

Mutual labels: web-scraping, web-scraper

Html Metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

Stars: ✭ 129 (+138.89%)

Mutual labels: web-scraping, web-scraper

Awesome Web Scraper

A collection of awesome web scaper, crawler.

Stars: ✭ 147 (+172.22%)

Mutual labels: scrapy, web-scraper

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+342.59%)

Mutual labels: web-scraping, web-scraper

IMDB-Scraper

Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.

Stars: ✭ 37 (-31.48%)

Mutual labels: web-scraping, scrapy

scraping-ebay

Scraping Ebay's products using Scrapy Web Crawling Framework

Stars: ✭ 79 (+46.3%)

Mutual labels: web-scraping, scrapy

Basketball reference web scraper

NBA Stats API via Basketball Reference

Stars: ✭ 279 (+416.67%)

Mutual labels: web-scraping, web-scraper

City Scrapers

Scrape, standardize and share public meetings from local government websites

Stars: ✭ 220 (+307.41%)

Mutual labels: scrapy, web-scraping

Php Curl Class

PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs

Stars: ✭ 2,903 (+5275.93%)

Mutual labels: web-scraping, web-scraper

Project Tauro

A Router WiFi key recovery/cracking tool with a twist.

Stars: ✭ 52 (-3.7%)

Mutual labels: web-scraping, web-scraper

Snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Stars: ✭ 886 (+1540.74%)

Mutual labels: web-scraping

Webhubbot

Python + Scrapy + MongoDB . 5 million data per day !!!💥 The world's largest website.

Stars: ✭ 5,427 (+9950%)

Mutual labels: scrapy

Articlespider

慕课网python分布式爬虫源码-长期更新维护

Stars: ✭ 40 (-25.93%)

Mutual labels: scrapy

Webmiddle

Node.js framework for modular web scraping and data extraction

Stars: ✭ 13 (-75.93%)

Mutual labels: web-scraping

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+1079.63%)

Mutual labels: scrapy

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (+1064.81%)

Mutual labels: scrapy

Voyages Sncf Api

A scrapy spider that scraps times and prices from Voyages Sncf. It uses scrapyrt to provide an API interface.

Stars: ✭ 7 (-87.04%)

Mutual labels: scrapy

Coolqlcool

Nextjs server to query websites with GraphQL

Stars: ✭ 623 (+1053.7%)

Mutual labels: web-scraping

Python Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Stars: ✭ 615 (+1038.89%)

Mutual labels: scrapy

Wescraper

依赖Scrapy和搜狗搜索微信公众号文章

Stars: ✭ 46 (-14.81%)

Mutual labels: scrapy

App comments spider

爬取百度贴吧、TapTap、appstore、微博官方博主上的游戏评论(基于redis_scrapy)，过滤器采用了bloomfilter。

Stars: ✭ 38 (-29.63%)

Mutual labels: scrapy

Scrapy Cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Stars: ✭ 921 (+1605.56%)

Mutual labels: scrapy

Pythonspidernotes

Python入门网络爬虫之精华版

Stars: ✭ 5,634 (+10333.33%)

Mutual labels: scrapy

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+979.63%)

Mutual labels: scrapy

Letterboxd recommendations

Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username

Stars: ✭ 23 (-57.41%)

Mutual labels: web-scraping

Wechatsogou

基于搜狗微信搜索的微信公众号爬虫接口

Stars: ✭ 5,220 (+9566.67%)

Mutual labels: scrapy

Spider python

python爬虫

Stars: ✭ 557 (+931.48%)

Mutual labels: scrapy

Actor Google Search Scraper

Apify actor that crawls Google Search result pages (SERPs) and extracts a list of organic results, ads, related queries and more. It supports selection of custom country, language and location.

Stars: ✭ 38 (-29.63%)

Mutual labels: web-scraping

Mailinglistscraper

A python web scraper for public email lists.

Stars: ✭ 19 (-64.81%)

Mutual labels: scrapy

Scrapy Selenium

Scrapy middleware to handle javascript pages using selenium

Stars: ✭ 550 (+918.52%)

Mutual labels: scrapy

Pythoncode Tutorials

The Python Code Tutorials

Stars: ✭ 544 (+907.41%)

Mutual labels: web-scraping

Pdf downloader

A Scrapy Spider for downloading PDF files from a webpage.

Stars: ✭ 18 (-66.67%)

Mutual labels: scrapy

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (+892.59%)

Mutual labels: scrapy

Scrapy Redis

Redis-based components for Scrapy.

Stars: ✭ 4,998 (+9155.56%)

Mutual labels: scrapy

Reptile

🏀 Python3 网络爬虫实战（部分含详细教程）猫眼腾讯视频豆瓣研招网微博笔趣阁小说百度热点 B站 CSDN 网易云阅读阿里文学百度股票今日头条微信公众号网易云音乐拉勾有道 unsplash 实习僧汽车之家英雄联盟盒子大众点评链家 LPL赛程台风梦幻西游、阴阳师藏宝阁天气牛客网百度文库睡前故事知乎 Wish

Stars: ✭ 1,048 (+1840.74%)

Mutual labels: scrapy

Pixiv Crawler

Scrapy框架下的pixiv多功能爬虫

Stars: ✭ 46 (-14.81%)

Mutual labels: scrapy

Scrapymon

Simple Web UI for Scrapy spider management via Scrapyd

Stars: ✭ 35 (-35.19%)

Mutual labels: scrapy

Scrapy Finance

[OUTDATED] scrapy spiders to crawl the financial text data 📚 📜 pertinent to train word vectors 🚀