Top 229 scraping open source projects

List Of User Agents
List of major web + mobile browser user agent strings. +1 Bonus script to scrape :)
Musoq
Use SQL on various data sources
Memorious
Distributed crawling framework for documents and structured data.
Loconotion
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Scrapysharp
reborn of https://bitbucket.org/rflechner/scrapysharp
Arachnid
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Colly
Elegant Scraper and Crawler Framework for Golang
Thal
Getting started with Puppeteer and Chrome Headless for Web Scraping
Transistor
Transistor, a Python web scraping framework for intelligent use cases.
Panther
A browser testing and web crawling library for PHP and Symfony
Googlescraper
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Idt
Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.
Antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Jsonframe Cheerio
simple multi-level scraper json input/output for Cheerio
Juriscraper
An API to scrape American court websites for metadata.
Anime Dl
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Linkedin Learning Downloader
Linkedin Learning videos downloader
Xquery
Extract data or evaluate value from HTML/XML documents using XPath
Serpscrap
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Shadow Useragent
Pick the most common user-agents on the Internet 👻
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Fantasy Basketball
Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Sqrape
Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Embed
Get info from any web service or page
Educative.io Downloader
📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Search Engine Google
🕷 Google client for SERPS
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Scan For Webcams
scan for webcams on the internet
Htmlsql
htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.
✭ 120
scraping
Od Database
Distributed crawler, database and web frontend for public directories indexing
Souqscraper
Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Webmagic
A scalable web crawler framework for Java.
Laravel Bank Statements
Laravel package to collect your bank statements history. Currently support for parsing statements history from BCA, Mandiri, BNI, and MUAMALAT e-banking websites.
D4n155
OWASP D4N155 - Intelligent and dynamic wordlist using OSINT
Languagepod101 Scraper
Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Grawler
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Nintendeals
Library with a set of tools for scraping information about Nintendo games and its prices across all regions (NA, EU and JP).
Humanoid
Node.js package to bypass CloudFlare's anti-bot JavaScript challenges
Pastepwn
Python framework to scrape Pastebin pastes and analyze them
Billy
legacy backend for Open States
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Google Covid19 Mobility Reports
Data extraction of Google's COVID-19 Mobility Reports
Email Extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Detect Cms
PHP Library for detecting CMS
1-60 of 229 scraping projects