Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+385.96%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+159.65%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+24.56%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-45.61%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-43.86%)
IMDB-ScraperScrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-35.09%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+319.3%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-73.68%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+7052.63%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (-7.02%)
raspagem-de-dados-fatec📓 Minicurso de raspagem de dados web com Python ministrado na Semana de Tecnologia da FATEC Jundiaí
Stars: ✭ 22 (-61.4%)
KatanaA Python Tool For google Hacking
Stars: ✭ 355 (+522.81%)
SqrapeSimple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+152.63%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+5433.33%)
Detect CmsPHP Library for detecting CMS
Stars: ✭ 78 (+36.84%)
TorScrapperA Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (-57.89%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+1147.37%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-29.82%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+714.04%)
HumanoidNode.js package to bypass CloudFlare's anti-bot JavaScript challenges
Stars: ✭ 88 (+54.39%)
GooglescraperA Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Stars: ✭ 2,363 (+4045.61%)
UofT-Timetable-GeneratorA web application that generates timetables for university students at the University of Toronto
Stars: ✭ 34 (-40.35%)
IdtImage Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.
Stars: ✭ 202 (+254.39%)
Jsonframe Cheeriosimple multi-level scraper json input/output for Cheerio
Stars: ✭ 196 (+243.86%)
MusoqUse SQL on various data sources
Stars: ✭ 252 (+342.11%)
Anime DlAnime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Stars: ✭ 190 (+233.33%)
PantherA browser testing and web crawling library for PHP and Symfony
Stars: ✭ 2,480 (+4250.88%)
wayback⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (-8.77%)
Jikan RestThe REST API for Jikan
Stars: ✭ 200 (+250.88%)
Whatsapp-NetGenerate a network graph of connections from your WhatsApp groups data
Stars: ✭ 75 (+31.58%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+247.37%)
List Of User AgentsList of major web + mobile browser user agent strings. +1 Bonus script to scrape :)
Stars: ✭ 247 (+333.33%)
JuriscraperAn API to scrape American court websites for metadata.
Stars: ✭ 194 (+240.35%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+200%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+335.09%)
Requests HtmlPythonic HTML Parsing for Humans™
Stars: ✭ 12,268 (+21422.81%)
google-scraperThis class can retrieve search results from Google.
Stars: ✭ 33 (-42.11%)
Loconotion📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Stars: ✭ 237 (+315.79%)
Secret AgentThe web browser that's built for scraping.
Stars: ✭ 151 (+164.91%)
XqueryExtract data or evaluate value from HTML/XML documents using XPath
Stars: ✭ 155 (+171.93%)
SerpscrapSEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Stars: ✭ 153 (+168.42%)
onionfruitOnionFruit™ Connect - Tor access client with country selection, bridge configuration, pluggable transports and experimental DNS support
Stars: ✭ 150 (+163.16%)
pickall.NET agile and extensible web searching API
Stars: ✭ 25 (-56.14%)
garlicsharePrivate and self-hosted file sharing over the Tor network written in golang
Stars: ✭ 110 (+92.98%)
ReaperSocial media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+321.05%)
Shadow UseragentPick the most common user-agents on the Internet 👻
Stars: ✭ 147 (+157.89%)
Fantasy Basketball Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Stars: ✭ 146 (+156.14%)
EmbedGet info from any web service or page
Stars: ✭ 1,808 (+3071.93%)
Scrapysharpreborn of https://bitbucket.org/rflechner/scrapysharp
Stars: ✭ 226 (+296.49%)
Educative.io Downloader📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: ✭ 139 (+143.86%)
ArachnidCrawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Stars: ✭ 224 (+292.98%)
UdemycoursegrabberYour will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (+140.35%)
Torchbear🔥🐻 The Speakeasy Scripting Engine Which Combines Speed, Safety, and Simplicity
Stars: ✭ 128 (+124.56%)
github-languagesTiny little ruby on rails website that crawls though your public github repos to find out what your favourite languages are.
Stars: ✭ 23 (-59.65%)