A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

Stars: ✭ 231 (-95.22%)

Mutual labels: hacktoberfest, crawler, scraper

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (-98.92%)

Mutual labels: scraper, scraping, crawling

Bat

A cat(1) clone with wings.

Stars: ✭ 30,833 (+537.44%)

Mutual labels: cli, hacktoberfest, tool

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-98.9%)

Mutual labels: scraper, scraping, crawling

Phpinsights

🔰 Instant PHP quality checks from your console

Stars: ✭ 4,442 (-8.17%)

Mutual labels: cli, hacktoberfest, tool

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (+138.68%)

Mutual labels: crawler, scraper, crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-97.93%)

Mutual labels: crawler, scraping, crawling

D4n155

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

Stars: ✭ 105 (-97.83%)

Mutual labels: crawler, scraping, tool

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (-95.91%)

Mutual labels: crawler, scraping, crawling

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (-87.95%)

Mutual labels: crawler, scraping, crawling

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (-95.64%)

Mutual labels: crawler, scraper, scraping

Lambdacd

a library to define a continuous delivery pipeline in code

Stars: ✭ 655 (-86.46%)

Mutual labels: hacktoberfest, library, tool

Headlesschrome

A Go package for working with headless Chrome. Run interactive JavaScript commands on web pages with Go and Chrome.

Stars: ✭ 112 (-97.68%)

Mutual labels: cli, scraper, chrome

Fselect

Find files with SQL-like queries

Stars: ✭ 3,103 (-35.85%)

Mutual labels: cli, hacktoberfest, tool

Search Engine Parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions

Stars: ✭ 216 (-95.53%)

Mutual labels: cli, scraping, library

scrapman

Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs

Stars: ✭ 21 (-99.57%)

Mutual labels: scraper, scraping, scraping-websites

Musoq

Use SQL on various data sources

Stars: ✭ 252 (-94.79%)

Mutual labels: cli, scraping, tool

LeetCode

At present contains scraped data from around 1500 problems present on the site. More to follow....

Stars: ✭ 45 (-99.07%)

Mutual labels: data-mining, scraper, scraping-websites

Npkill

List any node_modules directories in your system, as well as the space they take up. You can then select which ones you want to erase to free up space.

Stars: ✭ 5,325 (+10.09%)

Mutual labels: cli, hacktoberfest, tool

Emuto

manipulate JSON files

Stars: ✭ 180 (-96.28%)

Mutual labels: cli, data-mining, query-language

Mod Pbxproj

A python module to manipulate XCode projects

Stars: ✭ 959 (-80.17%)

Mutual labels: cli, hacktoberfest, library

gochanges

**[ARCHIVED]** website changes tracker 🔍

Stars: ✭ 12 (-99.75%)

Mutual labels: scraper, scraping, scraping-websites

Instascrape

🚀 A fast and lightweight utility and Python library for downloading posts, stories, and highlights from Instagram.

Stars: ✭ 76 (-98.43%)

Mutual labels: cli, scraper, library

document-dl

Command line program to download documents from web portals.

Stars: ✭ 14 (-99.71%)

Mutual labels: scraper, scraping, scraping-websites

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-99.69%)

Mutual labels: crawler, scraper, scraping

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (-94.27%)

Mutual labels: crawler, scraping, crawling

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (-74.24%)

Mutual labels: crawler, scraper, scraping

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stars: ✭ 125 (-97.42%)

Mutual labels: crawler, crawling, chrome

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (-86.83%)

Mutual labels: crawler, scraper, crawling

Jvppeteer

Headless Chrome For Java （Java 爬虫）

Stars: ✭ 193 (-96.01%)

Mutual labels: crawler, scraper, chrome

Spidermon

Scrapy Extension for monitoring spiders execution.

Stars: ✭ 309 (-93.61%)

Mutual labels: hacktoberfest, scraping, crawling

Cosmos

Hacktoberfest 2021 | World's largest Contributor driven code dataset | Algorithms that run our universe | Your personal library of every algorithm and data structure code that you will ever encounter |

Stars: ✭ 12,936 (+167.44%)

Mutual labels: hacktoberfest, library, hacktoberfest2021

Instagram-to-discord

Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!

Stars: ✭ 113 (-97.66%)

Mutual labels: scraper, scraping, scraping-websites

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (-94.09%)

Mutual labels: crawler, scraping, crawling

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (-90.57%)

Mutual labels: scraper, scraping, crawling

Teachcode

A tool to develop and improve a student’s programming skills by introducing the earliest lessons of coding.

Stars: ✭ 325 (-93.28%)

Mutual labels: cli, hacktoberfest

Kiimagepager

The KIImagePager is inspired by foursquare's ImageSlideshow, the user may scroll through images loaded from the Web

Stars: ✭ 324 (-93.3%)

Mutual labels: hacktoberfest, library

Launchpad

An open-source game launcher for your games

Stars: ✭ 322 (-93.34%)

Mutual labels: hacktoberfest, tool

Graphback

Graphback - Out of the box GraphQL server and client

Stars: ✭ 323 (-93.32%)

Mutual labels: cli, hacktoberfest

A simple, fast and user-friendly alternative to 'find'

Stars: ✭ 19,851 (+310.4%)

Mutual labels: cli, tool

Ack3

ack is a grep-like search tool optimized for source code.

Stars: ✭ 330 (-93.18%)

Mutual labels: cli, hacktoberfest

Super Productivity

To-do list & time tracker for programmers and other digital workers with Jira, Github, and Gitlab integration

Stars: ✭ 4,505 (-6.86%)

Mutual labels: hacktoberfest, hacktoberfest2021

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (-92.89%)

Mutual labels: crawler, scraper

Org Formation Cli

Better than landingzones!

Stars: ✭ 471 (-90.26%)

Mutual labels: cli, tool

Askql

AskQL is a query language that can express any data request

Stars: ✭ 352 (-92.72%)

Mutual labels: hacktoberfest, query-language

Horusec

Horusec is an open source tool that improves identification of vulnerabilities in your project with just one command.

Stars: ✭ 311 (-93.57%)

Mutual labels: cli, hacktoberfest

Xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Stars: ✭ 335 (-93.07%)

Mutual labels: cli, scraper

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (-92.81%)

Mutual labels: crawler, scraper

Katana

A Python Tool For google Hacking

Stars: ✭ 355 (-92.66%)

Mutual labels: scraper, scraping

Undetected Chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)