Wayback Machine ScraperA command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 230 (+1542.86%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+542.86%)
City ScrapersScrape, standardize and share public meetings from local government websites
Stars: ✭ 220 (+1471.43%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (+50%)
Short Jokes DatasetPython scripts for building 'Short Jokes' dataset, featured on Kaggle
Stars: ✭ 215 (+1435.71%)
cl-torrentsSearching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)
Stars: ✭ 83 (+492.86%)
Trump LiesTutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (+1335.71%)
audiobookshelfSelf-hosted audiobook and podcast server
Stars: ✭ 1,316 (+9300%)
Twitter IntelligenceTwitter Intelligence OSINT project performs tracking and analysis of the Twitter
Stars: ✭ 179 (+1178.57%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+278.57%)
codechef-rank-comparatorWeb application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
Stars: ✭ 23 (+64.29%)
Pythoncovers python basic to advance topics, practice questions, logical problems in python, web development using html, css, bootstrap, jquery, DOM, Django 🚀🚀. 💥 🌈
Stars: ✭ 29 (+107.14%)
Web ScrapingDetailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Stars: ✭ 153 (+992.86%)
automation-scriptsSimple scripts that I'm using to automate the boring things.
Stars: ✭ 14 (+0%)
Juno crawlerScrapy crawler to collect data on the back catalog of songs listed for sale.
Stars: ✭ 150 (+971.43%)
reapr🕸→ℹ️ Reap Information from Websites
Stars: ✭ 14 (+0%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+928.57%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (+185.71%)
Html MetadataMetaData html scraper and parser for Node.js (supports Promises and callback style)
Stars: ✭ 129 (+821.43%)
savedditBulk Downloader for Reddit
Stars: ✭ 130 (+828.57%)
30 Days Of PythonLearn Python for the next 30 (or so) Days.
Stars: ✭ 1,748 (+12385.71%)
leetcode-compensationCompensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.
Stars: ✭ 83 (+492.86%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+10728.57%)
Stock-Market-PredictorStock Market Predictor with LSTM network. Web scraping and analyzing tools (ohlc, mean)
Stars: ✭ 28 (+100%)
Scrapyd Cluster On HerokuSet up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Stars: ✭ 106 (+657.14%)
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
Stars: ✭ 24 (+71.43%)
PulsarTurn large Web sites into tables and charts using simple SQLs.
Stars: ✭ 100 (+614.29%)
scrapy-wayback-machineA Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (+557.14%)
Splashr💦 Tools to Work with the 'Splash' JavaScript Rendering Service in R
Stars: ✭ 93 (+564.29%)
rreddit𝐫⟋ Get Reddit data
Stars: ✭ 49 (+250%)
HumanoidNode.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (+528.57%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+4978.57%)
RvestSimple web scraping for R
Stars: ✭ 1,253 (+8850%)
ReaderExtract clean(er), readable text from web pages via Mercury Web Parser.
Stars: ✭ 75 (+435.71%)
codepen-puppeteerUse Puppeteer to download pens from Codepen.io as single html pages
Stars: ✭ 22 (+57.14%)
ArachnidPowerful web scraping framework for Crystal
Stars: ✭ 68 (+385.71%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+407.14%)
DecapitatedHeadless 'Chrome' Orchestration in R
Stars: ✭ 65 (+364.29%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+7828.57%)
InstagoDownload/access photos, videos, stories, story highlights, postlives, following and followers of Instagram
Stars: ✭ 59 (+321.43%)
GSoC-Data-AnalyserSimple search for organisations participating/participated in the GSoC
Stars: ✭ 29 (+107.14%)
Project TauroA Router WiFi key recovery/cracking tool with a twist.
Stars: ✭ 52 (+271.43%)
reading-listMy reading list since January 1996. Commits include comments on what I read beginning in June 2015
Stars: ✭ 34 (+142.86%)
WebmiddleNode.js framework for modular web scraping and data extraction
Stars: ✭ 13 (-7.14%)
lopezCrawling and scraping the Web for fun and profit
Stars: ✭ 20 (+42.86%)
Youtube tutorialsCollection of scripts corresponding to LucidProgramming YouTube tutorials
Stars: ✭ 769 (+5392.86%)
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (+464.29%)
HiA Programming language for Web Scraping
Stars: ✭ 14 (+0%)
linkextractorA Docker tutorial using a link extraction application example
Stars: ✭ 41 (+192.86%)
halfstaff🇺🇸 Is the US flag at half-staff?
Stars: ✭ 22 (+57.14%)
actor-scraperHouse of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
Stars: ✭ 83 (+492.86%)
librivox-catalogLibriVox catalog and reader workflow application
Stars: ✭ 20 (+42.86%)
Linkedin-ClientWeb scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (+200%)
unpaprdAn audiobook 🎧 📔 app made using Flutter
Stars: ✭ 73 (+421.43%)