A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

✭ 231

kotlin hacktoberfest testing crawler dom scraper parse test-automation integration-testing html-parser jsoup

Annie

👾 Fast and simple video download library and CLI tool written in Go

✭ 16,369

go video crawler youtube downloader scraper bilibili qq tumblr download hacktoberfest youku iqiyi

Scrapysharp

reborn of https://bitbucket.org/rflechner/scrapysharp

✭ 226

csharp fsharp html dotnet scraper parsing scraping

Ruiji.net

crawler framework, distributed crawler extractor

✭ 220

crawler scraper netcore scrapy headless-chrome

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

✭ 211

javascript nodejs docker parser browser crawler scraper parsing scraping phantomjs

Media Scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

✭ 206

python crawler twitter instagram scraper reddit pixiv tumblr

Tianyancha

pip安装的天眼查爬虫API，指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.

✭ 206

python python3 data crawler pandas selenium scraper china business

Colly

Elegant Scraper and Crawler Framework for Golang

✭ 15,535

go HTML framework crawler spider scraper scraping crawling

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

✭ 2,295

python chatbot chinese scraper weibo corpus sina

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

✭ 2,392

PHP HTML crawler spider scraper querylist

Jsonframe Cheerio

simple multi-level scraper json input/output for Cheerio

✭ 196

javascript json scraper scraping selector frame

Jvppeteer

Headless Chrome For Java （Java 爬虫）

✭ 193

java chrome crawler scraper puppeteer chrome-headless

Node Ytdl Core

YouTube video downloader in javascript.

✭ 3,004

javascript node youtube scraper youtube-downloader video-downloader

Unfurl

Scraper for oEmbed, Twitter Cards and Open Graph metadata - fast and Promise-based ⚡️

✭ 193

typescript nodejs microservice slack scraper metadata micro embed meta-tags

Thepiratebay

💀 The Pirate Bay node.js client

✭ 191

typescript parser scraper torrent

Anime Dl

Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.

✭ 190

python web automation scraper anime scraping

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

✭ 190

go golang crawler spider scraper scrapy

Gmdb

GMDB is the ultra-simple, cross-platform Movie Library with Features (Search, Take Note, Watch Later, Like, Import, Learn, Instantly Torrent Magnet Watch)

✭ 189

go scraper search-engine torrent notebook note-taking movie movies netflix magnet imdb magnet-link

Docsearch Scraper

DocSearch - Scraper

✭ 188

python documentation scraper algolia

Unhtml.rs

A magic html parser

✭ 180

rust scraper html-parser

Instagram Crawler

Crawl instagram photos, posts and videos for download.

✭ 178

ruby crawler instagram scraper gem rubygems instagram-scraper

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

✭ 171

typescript nodejs json crawler spider expressjs scraper puppeteer scraping linkedin crawling

Readablewebproxy

Rewriting web proxy and archival tool. At this point, it just tries to download all the things.

✭ 172

python scraper

Novel

基于 Laravel 5.2 的小说网站

✭ 172

javascript laravel book scraper novel

Scrape Twitter

🐦 Access Twitter data without an API key. [DEPRECATED]

✭ 166

javascript cli twitter scraper timeline streams tweets conversation

Scrapelib

⛏ a library for scraping things

✭ 164

python http scraper

Datmusic Api

Alternative for VK Audio API

✭ 160

audio music crawler scraper mp3 api-server vk

Opensanctions

An open database of international sanctions data, persons of interest and politically exposed persons

✭ 157

python database scraper journalism

Covid19 mobility

COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉

✭ 156

python3 jupyter-notebook google apple scraper reports

Instagram Scraper

scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot

✭ 2,209

python bot crawler instagram scraper scrape ig igramscraper

Demeter

Demeter is a tool for scraping the calibre web ui

✭ 155

go hacktoberfest scraper download

Serpscrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.

✭ 153

python search research scraper screenshot seo scraping

Nooverviewavailable.com

A survey of Apple developer documentation.

✭ 152

ruby scraper

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

✭ 148

scraper scraping web-scraping web-scraper

Scraperwiki Python

ScraperWiki Python library for scraping and saving data

✭ 146

python scraper

Google2csv

Google2Csv a simple google scraper that saves the results on a csv/xlsx/jsonl file

✭ 145

python jupyter-notebook tutorial google csv scraper

Youtube Projects

This repository contains all the code I use in my YouTube tutorials.

✭ 144

javascript python html css chrome-extension algorithms google crawler youtube website jquery-plugin scraper project easy webscraping

Google Play Scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>

✭ 143

python crawler scraper

Zillow

Zillow Scraper for Python using Selenium

✭ 141

python selenium scraper web-scraping chromedriver

Go Jd

京东自动登录，在线商品自动下单

✭ 139

go golang scraper

Bandcamp Scraper

A scraper for https://bandcamp.com

✭ 137

javascript hacktoberfest api scraper album product

Onegram

This repository is no longer maintained.

✭ 137

python bot crawler instagram scraper instagram-api instagram-client

Udemycoursegrabber

Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!

✭ 137

python selenium scraper scraping udemy

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

✭ 11,545

python crawler scraper news crawling news-aggregator

Proxyscrape

Python library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).

✭ 134

python python3 proxy scraper

Scraper

A scraper that switches between normal mode and gentleman mode, built on Eletron, React