papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-99.63%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (-96.37%)
SpidrA versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (-83.91%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-89.21%)
RcrawlerAn R web crawler and scraper
Stars: ✭ 274 (-93.28%)
Youtube ProjectsThis repository contains all the code I use in my YouTube tutorials.
Stars: ✭ 144 (-96.47%)
Anime DlAnime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Stars: ✭ 190 (-95.34%)
PoliteBe nice on the web
Stars: ✭ 253 (-93.79%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-98.55%)
GoapyGoal-Oriented Action Planning implementation in Python
Stars: ✭ 33 (-99.19%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-95.81%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (-94.14%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-98.7%)
HuginnCreate agents that monitor and act on your behalf. Your agents are standing by!
Stars: ✭ 33,694 (+726.44%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+18.64%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+281.04%)
GeziyorGeziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (-69.44%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-97.55%)
Goose ParserUniversal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (-94.82%)
Instagram Scraperscrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Stars: ✭ 2,209 (-45.82%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-93.21%)
scrapersscrapers for building your own image databases
Stars: ✭ 46 (-98.87%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (-80.65%)
ha-multiscrapeHome Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (-97.47%)
RodA Devtools driver for web automation and scraping
Stars: ✭ 1,392 (-65.86%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (-88.62%)
SillyniumAutomate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (-97.55%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-99.24%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (-22.64%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (-98.26%)
anime-scraper[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3
Stars: ✭ 21 (-99.48%)
Lightnet🌓 Bringing pjreddie's DarkNet out of the shadows #yolo
Stars: ✭ 322 (-92.1%)
extractnetA Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (-98.72%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-98.72%)
document-dlCommand line program to download documents from web portals.
Stars: ✭ 14 (-99.66%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-99.56%)
Android-Web-ScraperAndroid Web Scraper is a simple library for android web automation. You can perform web task in background to fetch website data programmatically.
Stars: ✭ 38 (-99.07%)
ClaiCommand Line Artificial Intelligence or CLAI is an open-sourced project from IBM Research aimed to bring the power of AI to the command line interface.
Stars: ✭ 320 (-92.15%)
copycatA PHP Scraping Class
Stars: ✭ 70 (-98.28%)
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-99.63%)
Mimo-CrawlerA web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
Stars: ✭ 22 (-99.46%)
newspaperjsNews extraction and scraping. Article Parsing
Stars: ✭ 59 (-98.55%)
metacritic apiPHP Metacritic API - Mirrored by my GitLab
Stars: ✭ 31 (-99.24%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-99.02%)
Scraper-Projects🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-99.39%)
bing-ip2hostsbingip2hosts is a Bing.com web scraper that discovers websites by IP address
Stars: ✭ 99 (-97.57%)
newsembleAPI for fetching data from news websites.
Stars: ✭ 42 (-98.97%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-99.46%)
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
Stars: ✭ 24 (-99.41%)
arachnodHigh performance crawler for Nodejs
Stars: ✭ 17 (-99.58%)
ArtificioDeep Learning Computer Vision Algorithms for Real-World Use
Stars: ✭ 326 (-92%)
ZeiverA Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-99.66%)
Linkedin-ClientWeb scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (-98.97%)
Captcha-ToolsAll-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (-99.44%)