Top 229 scraping open source projects

List Of User Agents
List of major web + mobile browser user agent strings. +1 Bonus script to scrape :)
Use SQL on various data sources
Distributed crawling framework for documents and structured data.
📄 Python tool to turn pages into lightweight, customizable static websites
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
reborn of
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Elegant Scraper and Crawler Framework for Golang
Getting started with Puppeteer and Chrome Headless for Web Scraping
Transistor, a Python web scraping framework for intelligent use cases.
A browser testing and web crawling library for PHP and Symfony
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Image Dataset Tool (idt) is a cli tool designed to make the otherwise repetitive and slow task of creating image datasets into a fast and intuitive process.
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Jsonframe Cheerio
simple multi-level scraper json input/output for Cheerio
An API to scrape American court websites for metadata.
Anime Dl
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Linkedin Learning Downloader
Linkedin Learning videos downloader
Extract data or evaluate value from HTML/XML documents using XPath
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Shadow Useragent
Pick the most common user-agents on the Internet 👻
PHP Scraper - an highly opinionated web-interface for PHP
Fantasy Basketball
Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Get info from any web service or page Downloader
📖 This tool is to download course from for offline usage. It uses your login credentials and download the course.
Search Engine Google
🕷 Google client for SERPS
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Scan For Webcams
scan for webcams on the internet
htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.
✭ 120
Od Database
Distributed crawler, database and web frontend for public directories indexing
Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Scrapy, a fast high-level web crawling & scraping framework for Python.
A scalable web crawler framework for Java.
Laravel Bank Statements
Laravel package to collect your bank statements history. Currently support for parsing statements history from BCA, Mandiri, BNI, and MUAMALAT e-banking websites.
OWASP D4N155 - Intelligent and dynamic wordlist using OSINT
Languagepod101 Scraper
Python scraper for Language Pods such as 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link :
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Library with a set of tools for scraping information about Nintendo games and its prices across all regions (NA, EU and JP).
Node.js package to bypass CloudFlare's anti-bot JavaScript challenges
Python framework to scrape Pastebin pastes and analyze them
legacy backend for Open States
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Google Covid19 Mobility Reports
Data extraction of Google's COVID-19 Mobility Reports
Email Extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Detect Cms
PHP Library for detecting CMS
1-60 of 229 scraping projects