Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Stars: ✭ 153 (+992.86%)

Mutual labels: web-scraping

Wayback Machine Scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 230 (+1542.86%)

Mutual labels: web-scraping

web-poet

Web scraping Page Objects core library

Stars: ✭ 67 (+378.57%)

Mutual labels: web-scraping

Short Jokes Dataset

Python scripts for building 'Short Jokes' dataset, featured on Kaggle

Stars: ✭ 215 (+1435.71%)

Mutual labels: web-scraping

fs2-data

streaming data parsing and transformation library

Stars: ✭ 103 (+635.71%)

Mutual labels: xpath

Twitter Intelligence

Twitter Intelligence OSINT project performs tracking and analysis of the Twitter

Stars: ✭ 179 (+1178.57%)

Mutual labels: web-scraping

2017-summer-workshop

Exercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)

Stars: ✭ 33 (+135.71%)

Mutual labels: web-scraping

Scrapy Training

Scrapy Training companion code

Stars: ✭ 157 (+1021.43%)

Mutual labels: web-scraping

gdns

Tools to work with the Google DNS over HTTPS API in R

Stars: ✭ 23 (+64.29%)

Mutual labels: r-cyber

cypress-xpath

Adds XPath command to Cypress test runner

Stars: ✭ 145 (+935.71%)

Mutual labels: xpath

Juno crawler

Scrapy crawler to collect data on the back catalog of songs listed for sale.

Stars: ✭ 150 (+971.43%)

Mutual labels: web-scraping

Sqrape

Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)

Stars: ✭ 144 (+928.57%)

Mutual labels: web-scraping

Html Metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

Stars: ✭ 129 (+821.43%)

Mutual labels: web-scraping

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (+4978.57%)

Mutual labels: web-scraping

A Programming language for Web Scraping

Stars: ✭ 14 (+0%)

Mutual labels: web-scraping

30 Days Of Python

Learn Python for the next 30 (or so) Days.

Stars: ✭ 1,748 (+12385.71%)

Mutual labels: web-scraping

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+1607.14%)

Mutual labels: web-scraping

codepen-puppeteer

Use Puppeteer to download pens from Codepen.io as single html pages

Stars: ✭ 22 (+57.14%)

Mutual labels: web-scraping

Docbao

Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam

Stars: ✭ 230 (+1542.86%)

Mutual labels: web-scraping

xpath2.js

xpath.js - Open source XPath 2.0 implementation in JavaScript (DOM agnostic)

Stars: ✭ 74 (+428.57%)

Mutual labels: xpath

Selenium Python Helium

Selenium-python but lighter: Helium is the best Python library for web automation.

Stars: ✭ 2,732 (+19414.29%)

Mutual labels: web-scraping

core

The complete web scraping toolkit for PHP.

Stars: ✭ 1,110 (+7828.57%)

Mutual labels: web-scraping

R Web Scraping Cheat Sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

Stars: ✭ 207 (+1378.57%)

Mutual labels: web-scraping

Stock-Market-Predictor

Stock Market Predictor with LSTM network. Web scraping and analyzing tools (ohlc, mean)

Stars: ✭ 28 (+100%)

Mutual labels: web-scraping

Bet On Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

Stars: ✭ 190 (+1257.14%)

Mutual labels: web-scraping

crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

Stars: ✭ 70 (+400%)

Mutual labels: web-scraping

Grab

Web Scraping Framework

Stars: ✭ 2,147 (+15235.71%)

Mutual labels: web-scraping

vscode-xslt-tokenizer

VSCode extension for highlighting XSLT and XPath (upto 3.0/3.1)

Stars: ✭ 37 (+164.29%)

Mutual labels: xpath

Learnpythonforresearch

This repository provides everything you need to get started with Python for (social science) research.

Stars: ✭ 163 (+1064.29%)

Mutual labels: web-scraping

pdfbox

📄◻️ Create, Maniuplate and Extract Data from PDF Files (R Apache PDFBox wrapper)

Stars: ✭ 46 (+228.57%)

Mutual labels: r-cyber

Netflix Clone

Netflix like full-stack application with SPA client and backend implemented in service oriented architecture

Stars: ✭ 156 (+1014.29%)

Mutual labels: web-scraping

saveddit

Bulk Downloader for Reddit

Stars: ✭ 130 (+828.57%)

Mutual labels: web-scraping

Helena

A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.

Stars: ✭ 151 (+978.57%)

Mutual labels: web-scraping

PythonScrapyBasicSetup

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

Stars: ✭ 57 (+307.14%)

Mutual labels: web-scraping

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+957.14%)

Mutual labels: web-scraping

BookingScraper

🌎 🏨 Scrape Booking.com 🏨 🌎

Stars: ✭ 68 (+385.71%)

Mutual labels: web-scraping

Zillow

Zillow Scraper for Python using Selenium

Stars: ✭ 141 (+907.14%)

Mutual labels: web-scraping

panthro

An implementation of XPath 3.0 in Objective-C/Cocoa

Stars: ✭ 45 (+221.43%)

Mutual labels: xpath

Actor Page Analyzer

Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.

Stars: ✭ 124 (+785.71%)

Mutual labels: web-scraping

actor-content-checker

You can use this act to monitor any page's content and get a notification when content changes.

Stars: ✭ 16 (+14.29%)

Mutual labels: web-scraping

concurrent-web-scraping

Building a Concurrent Web Scraper with Python and Selenium

Stars: ✭ 28 (+100%)

Mutual labels: web-scraping

Ayakashi

⚡️ Ayakashi.io - The next generation web scraping framework