Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-93.48%)

Mutual labels: web-scraping

Zillow

Zillow Scraper for Python using Selenium

Stars: ✭ 141 (-38.7%)

Mutual labels: web-scraping

audiobooker

Audio Book scrapper

Stars: ✭ 14 (-93.91%)

Mutual labels: web-scraping

Cascadia

Go cascadia package command line CSS selector

Stars: ✭ 67 (-70.87%)

Mutual labels: web-scraping

sp-subway-scraper

🚆This web scraper builds a dataset for São Paulo subway operation status

Stars: ✭ 24 (-89.57%)

Mutual labels: web-scraping

R Web Scraping Cheat Sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

Stars: ✭ 207 (-10%)

Mutual labels: web-scraping

codechef-rank-comparator

Web application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).

Stars: ✭ 23 (-90%)

Mutual labels: web-scraping

Social Media Profile Scrapers

Fetch user's data across social media

Stars: ✭ 60 (-73.91%)

Mutual labels: web-scraping

GSoC-Data-Analyser

Simple search for organisations participating/participated in the GSoC

Stars: ✭ 29 (-87.39%)

Mutual labels: web-scraping

Actor Page Analyzer

Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.

Stars: ✭ 124 (-46.09%)

Mutual labels: web-scraping

restaurant-finder-featureReviews

Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

Stars: ✭ 21 (-90.87%)

Mutual labels: web-scraping

Scrapy Craigslist

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

Stars: ✭ 54 (-76.52%)

Mutual labels: web-scraping

top-github-scraper

Scape top GitHub repositories and users based on keywords

Stars: ✭ 40 (-82.61%)

Mutual labels: web-scraping

Learnpythonforresearch

This repository provides everything you need to get started with Python for (social science) research.

Stars: ✭ 163 (-29.13%)

Mutual labels: web-scraping

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (-77.39%)

Mutual labels: web-scraping

Actor Google Search Scraper

Apify actor that crawls Google Search result pages (SERPs) and extracts a list of organic results, ads, related queries and more. It supports selection of custom country, language and location.

Stars: ✭ 38 (-83.48%)

Mutual labels: web-scraping

scraping-ebay

Scraping Ebay's products using Scrapy Web Crawling Framework

Stars: ✭ 79 (-65.65%)

Mutual labels: web-scraping

Ayakashi

⚡️ Ayakashi.io - The next generation web scraping framework

Stars: ✭ 117 (-49.13%)

Mutual labels: web-scraping

IMDB-Scraper

Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.

Stars: ✭ 37 (-83.91%)

Mutual labels: web-scraping

Snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Stars: ✭ 886 (+285.22%)

Mutual labels: web-scraping

automation-scripts

Simple scripts that I'm using to automate the boring things.

Stars: ✭ 14 (-93.91%)

Mutual labels: web-scraping

Selenium Python Helium

Selenium-python but lighter: Helium is the best Python library for web automation.

Stars: ✭ 2,732 (+1087.83%)

Mutual labels: web-scraping

leetcode-compensation

Compensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.

Stars: ✭ 83 (-63.91%)

Mutual labels: web-scraping

Letterboxd recommendations

Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username

Stars: ✭ 23 (-90%)

Mutual labels: web-scraping

rreddit

𝐫⟋ Get Reddit data

Stars: ✭ 49 (-78.7%)

Mutual labels: web-scraping

Save For Offline

Android app for saving webpages for offline reading.

Stars: ✭ 114 (-50.43%)

Mutual labels: web-scraping

extractnet

A Dragnet that also extract author, headline, date, keywords from context

Stars: ✭ 52 (-77.39%)

Mutual labels: web-scraping

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+185.22%)

Mutual labels: web-scraping

Linkedin-Client

Web scraper for grabing data from Linkedin profiles or company pages (personal project)

Stars: ✭ 42 (-81.74%)

Mutual labels: web-scraping

Netflix Clone

Netflix like full-stack application with SPA client and backend implemented in service oriented architecture

Stars: ✭ 156 (-32.17%)

Mutual labels: web-scraping

grailer

web scraping tool for grailed.com

Stars: ✭ 30 (-86.96%)

Mutual labels: web-scraping

Coolqlcool

Nextjs server to query websites with GraphQL

Stars: ✭ 623 (+170.87%)

Mutual labels: web-scraping

cl-torrents

Searching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)

Stars: ✭ 83 (-63.91%)

Mutual labels: web-scraping

Rod

A Devtools driver for web automation and scraping

Stars: ✭ 1,392 (+505.22%)

Mutual labels: web-scraping

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Stars: ✭ 53 (-76.96%)

Mutual labels: web-scraping

Scrapy Fake Useragent

Random User-Agent middleware based on fake-useragent

Stars: ✭ 520 (+126.09%)

Mutual labels: web-scraping

Python

covers python basic to advance topics, practice questions, logical problems in python, web development using html, css, bootstrap, jquery, DOM, Django 🚀🚀. 💥 🌈

Stars: ✭ 29 (-87.39%)

Mutual labels: web-scraping

Bet On Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

Stars: ✭ 190 (-17.39%)

Mutual labels: web-scraping

reapr

🕸→ℹ️ Reap Information from Websites

Stars: ✭ 14 (-93.91%)

Mutual labels: web-scraping

Rpa

UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++

Stars: ✭ 477 (+107.39%)

Mutual labels: web-scraping

saveddit

Bulk Downloader for Reddit

Stars: ✭ 130 (-43.48%)

Mutual labels: web-scraping

Sillynium

Automate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements

Stars: ✭ 100 (-56.52%)

Mutual labels: web-scraping

actor-content-checker

You can use this act to monitor any page's content and get a notification when content changes.

Stars: ✭ 16 (-93.04%)

Mutual labels: web-scraping

Awesome Web Scraping

List of libraries, tools and APIs for web scraping and data processing.

Stars: ✭ 4,510 (+1860.87%)

Mutual labels: web-scraping

BookingScraper

🌎 🏨 Scrape Booking.com 🏨 🌎

Stars: ✭ 68 (-70.43%)

Mutual labels: web-scraping

Helena

A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.

Stars: ✭ 151 (-34.35%)

Mutual labels: web-scraping

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+1672.61%)

Mutual labels: web-scraping

City Scrapers

Scrape, standardize and share public meetings from local government websites

Stars: ✭ 220 (-4.35%)

Mutual labels: web-scraping

Short Jokes Dataset

Python scripts for building 'Short Jokes' dataset, featured on Kaggle