ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+222.22%)
HumanoidNode.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (-38.89%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+2.78%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+2090.28%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (-50.69%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (-60.42%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (-63.19%)
raspagem-de-dados-fatec📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
Stars: ✭ 22 (-84.72%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+65.97%)
Detect CmsPHP Library for detecting CMS
Stars: ✭ 78 (-45.83%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+393.75%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-89.58%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+92.36%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-72.22%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-78.47%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+2731.25%)
CascadiaGo cascadia package command line CSS selector
Stars: ✭ 67 (-53.47%)
PulsarTurn large Web sites into tables and charts using simple SQLs.
Stars: ✭ 100 (-30.56%)
SouqscraperSimple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube
Stars: ✭ 118 (-18.06%)
SillyniumAutomate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (-30.56%)
SoupsieveA modern CSS selector implementation for BeautifulSoup
Stars: ✭ 95 (-34.03%)
Torchbear🔥🐻 The Speakeasy Scripting Engine Which Combines Speed, Safety, and Simplicity
Stars: ✭ 128 (-11.11%)
Ayakashi⚡️ Ayakashi.io - The next generation web scraping framework
Stars: ✭ 117 (-18.75%)
Splashr💦 Tools to Work with the 'Splash' JavaScript Rendering Service in R
Stars: ✭ 93 (-35.42%)
Hockey ScraperPython Package for scraping NHL Play-by-Play and Shift data
Stars: ✭ 93 (-35.42%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+952.78%)
Lipo👄 Free image manipulation API service built on top of Sharp (an alternative to Jimp, Graphics Magic, Image Magick, and PhantomJS)
Stars: ✭ 101 (-29.86%)
30 Days Of PythonLearn Python for the next 30 (or so) Days.
Stars: ✭ 1,748 (+1113.89%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-30.56%)
BewitchmentMod inspired by Witchery
Stars: ✭ 128 (-11.11%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (-31.94%)
SeleniumcrawlerAn example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (-18.75%)
NintendealsLibrary with a set of tools for scraping information about Nintendo games and its prices across all regions (NA, EU and JP).
Stars: ✭ 94 (-34.72%)
Educative.io Downloader📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: ✭ 139 (-3.47%)
Skywater PdkOpen source process design kit for usage with SkyWater Technology Foundry's 130nm node.
Stars: ✭ 1,765 (+1125.69%)
PastepwnPython framework to scrape Pastebin pastes and analyze them
Stars: ✭ 87 (-39.58%)
Save For OfflineAndroid app for saving webpages for offline reading.
Stars: ✭ 114 (-20.83%)
DaftlistingsA library that enables programmatic interaction with daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.
Stars: ✭ 86 (-40.28%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-40.28%)
Billylegacy backend for Open States
Stars: ✭ 85 (-40.97%)
ZillowZillow Scraper for Python using Selenium
Stars: ✭ 141 (-2.08%)
Python And OopObject-Oriented Programming concepts in Python
Stars: ✭ 123 (-14.58%)
GitpassOpen Source Your Password (Mismanagement)!
Stars: ✭ 113 (-21.53%)
GeziyorGeziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+765.28%)
RvestSimple web scraping for R
Stars: ✭ 1,253 (+770.14%)
FlokiFloki is a simple HTML parser that enables search for nodes using CSS selectors.
Stars: ✭ 1,642 (+1040.28%)
Actor Page AnalyzerApify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
Stars: ✭ 124 (-13.89%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+29304.86%)
Email ExtractorThe main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (-43.75%)
ViewstateASP.NET View State Decoder
Stars: ✭ 77 (-46.53%)
WebmagicA scalable web crawler framework for Java.
Stars: ✭ 10,186 (+6973.61%)
ReaderExtract clean(er), readable text from web pages via Mercury Web Parser.
Stars: ✭ 75 (-47.92%)
UdemycoursegrabberYour will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (-4.86%)
HtmlsqlhtmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.
Stars: ✭ 120 (-16.67%)
Scrapyd Cluster On HerokuSet up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Stars: ✭ 106 (-26.39%)