All Categories → Data Processing → webscraping

Top 111 webscraping open source projects

Crosslinked
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
R Web Scraping Cheat Sheet
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
Tradingview Data Scraper
Extract price and indicator data from TradingView charts to create ML datasets
Instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Falkor
Open Source web scraping API. Falkor turns web pages into queryable JSON
Decryptr
An extensible API for breaking captchas
Stardox
Github stargazers information gathering tool
Tiktokbot
A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg
Ralger
ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.
Operating Systems Three Easy Pieces
operating systems three easy pieces by Rezmi
Anirip
🎬 A Crunchyroll show/season ripper
Soup
Web Scraper in Go, similar to BeautifulSoup
Php Crawler
A php crawler that finds emails on the internets
Nytcrossword
An exploration of New York Times crossword answers from 1994-2017, i.e. the Will Shortz era.
Wswp
Code for the second edition Web Scraping with Python book by Packt Publications
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Imghash
Perceptual image hashing for Node.js
Udemy bot
An automation bot for free Udemy courses
Clock
可视化任务调度系统,精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)
Covid 19 jhu data web scrap and cleaning
This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/
Instago
Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram
Fifa Fut Data
Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
Sneakerbot App
App that scrapes the Footlocker website to construct URLs for upcoming sneaker releases and adds the shoe to your cart if it is available. Uses Python and Selenium Webdriver. *Chrome and Chromedriver must be installed and Chromedriver must be on main path
Brokenlinkhijacker
A Fast Broken Link Hijacker Tool written in Python
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Configs
Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Redditsfinder
Archive a reddit user's post history. Formatted overview of a profile, JSON containing every post, and picture downloads. Uses the pushshift API.
Sig To Googlecalendar
A python script to get class schedules on UFLA's SIG and convert to a .CSV file to use in Google Calendar
Webscrapping
R语言爬虫;Python爬虫;rvest;Rcurl
Datadoubleconfirm
Simple datasets and notebooks for data visualization, statistical analysis and modelling - with write-ups here: http://projectosyo.wix.com/datadoubleconfirm.
Mailinglistscraper
A python web scraper for public email lists.
Gazpacho
🥫 The simple, fast, and modern web scraping library
Suckit
Suck the InTernet
Morph
Take the hassle out of web scraping
Proxy requests
a class that uses scraped proxies to make http GET/POST requests (Python requests)
Xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Rcrawler
An R web crawler and scraper
web check
Script for checking changes in webpages
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
amelia 2.0
An Artificial Intelligence Chat Bot and Service Provider written in Python and AIML.
anikimiapi
A Simple, LightWeight, Statically-Typed Python3 API wrapper for GogoAnime.
Utlyz-CLI
Let's you to access your FB account from the command line and returns various things number of unread notifications, messages or friend requests you have.
zimit
Make a ZIM file from any Web site and surf offline!
allitebooks.com
Download all the ebooks with indexed csv of "allitebooks.com"
OkanimeDownloader
Scrape your favorite Anime from Okanime.com without effort
1-60 of 111 webscraping projects