wayback⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (+271.43%)
htmlunit🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
Stars: ✭ 39 (+178.57%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+278.57%)
codechef-rank-comparatorWeb application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
Stars: ✭ 23 (+64.29%)
curlconverter➰ ➡️ ➖ Translate cURL command lines into parameters for use with httr or actual httr calls (R)
Stars: ✭ 86 (+514.29%)
City ScrapersScrape, standardize and share public meetings from local government websites
Stars: ✭ 220 (+1471.43%)
Trump LiesTutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (+1335.71%)
scrapy-wayback-machineA Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (+557.14%)
lopezCrawling and scraping the Web for fun and profit
Stars: ✭ 20 (+42.86%)
Web ScrapingDetailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Stars: ✭ 153 (+992.86%)
Wayback Machine ScraperA command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 230 (+1542.86%)
web-poetWeb scraping Page Objects core library
Stars: ✭ 67 (+378.57%)
Short Jokes DatasetPython scripts for building 'Short Jokes' dataset, featured on Kaggle
Stars: ✭ 215 (+1435.71%)
fs2-datastreaming data parsing and transformation library
Stars: ✭ 103 (+635.71%)
Twitter IntelligenceTwitter Intelligence OSINT project performs tracking and analysis of the Twitter
Stars: ✭ 179 (+1178.57%)
2017-summer-workshopExercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)
Stars: ✭ 33 (+135.71%)
gdnsTools to work with the Google DNS over HTTPS API in R
Stars: ✭ 23 (+64.29%)
cypress-xpathAdds XPath command to Cypress test runner
Stars: ✭ 145 (+935.71%)
Juno crawlerScrapy crawler to collect data on the back catalog of songs listed for sale.
Stars: ✭ 150 (+971.43%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+928.57%)
Html MetadataMetaData html scraper and parser for Node.js (supports Promises and callback style)
Stars: ✭ 129 (+821.43%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+4978.57%)
HiA Programming language for Web Scraping
Stars: ✭ 14 (+0%)
30 Days Of PythonLearn Python for the next 30 (or so) Days.
Stars: ✭ 1,748 (+12385.71%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+1607.14%)
codepen-puppeteerUse Puppeteer to download pens from Codepen.io as single html pages
Stars: ✭ 22 (+57.14%)
DocbaoCông cụ quét và phân tích từ khoá các trang báo mạng Việt Nam
Stars: ✭ 230 (+1542.86%)
xpath2.jsxpath.js - Open source XPath 2.0 implementation in JavaScript (DOM agnostic)
Stars: ✭ 74 (+428.57%)
Selenium Python HeliumSelenium-python but lighter: Helium is the best Python library for web automation.
Stars: ✭ 2,732 (+19414.29%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+7828.57%)
R Web Scraping Cheat SheetGuide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
Stars: ✭ 207 (+1378.57%)
Stock-Market-PredictorStock Market Predictor with LSTM network. Web scraping and analyzing tools (ohlc, mean)
Stars: ✭ 28 (+100%)
Bet On SibylMachine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Stars: ✭ 190 (+1257.14%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (+400%)
GrabWeb Scraping Framework
Stars: ✭ 2,147 (+15235.71%)
vscode-xslt-tokenizerVSCode extension for highlighting XSLT and XPath (upto 3.0/3.1)
Stars: ✭ 37 (+164.29%)
LearnpythonforresearchThis repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (+1064.29%)
pdfbox📄◻️ Create, Maniuplate and Extract Data from PDF Files (R Apache PDFBox wrapper)
Stars: ✭ 46 (+228.57%)
Netflix CloneNetflix like full-stack application with SPA client and backend implemented in service oriented architecture
Stars: ✭ 156 (+1014.29%)
savedditBulk Downloader for Reddit
Stars: ✭ 130 (+828.57%)
HelenaA Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.
Stars: ✭ 151 (+978.57%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (+307.14%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+957.14%)
ZillowZillow Scraper for Python using Selenium
Stars: ✭ 141 (+907.14%)
panthroAn implementation of XPath 3.0 in Objective-C/Cocoa
Stars: ✭ 45 (+221.43%)
Actor Page AnalyzerApify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
Stars: ✭ 124 (+785.71%)
actor-content-checkerYou can use this act to monitor any page's content and get a notification when content changes.
Stars: ✭ 16 (+14.29%)
Ayakashi⚡️ Ayakashi.io - The next generation web scraping framework
Stars: ✭ 117 (+735.71%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+10728.57%)
Save For OfflineAndroid app for saving webpages for offline reading.
Stars: ✭ 114 (+714.29%)
Scrapyd Cluster On HerokuSet up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Stars: ✭ 106 (+657.14%)
RodA Devtools driver for web automation and scraping
Stars: ✭ 1,392 (+9842.86%)
UofT-Timetable-GeneratorA web application that generates timetables for university students at the University of Toronto
Stars: ✭ 34 (+142.86%)
iowebWeb Scraping Framework
Stars: ✭ 31 (+121.43%)
teleniumAutomation for Kivy Application
Stars: ✭ 56 (+300%)