All Projects → pystardust → shup

pystardust / shup

Licence: GPL-3.0 license
A POSIX shell script to parse HTML

Programming Languages

shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to shup

nyx
Lean linux and OSX process monitoring written in C
Stars: ✭ 24 (-14.29%)
Mutual labels:  posix
Captcha-Tools
All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (-17.86%)
Mutual labels:  scraping
web-clipper
Easily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-46.43%)
Mutual labels:  scraping
kfc
A terminal-emulator color palette setter written in POSIX C99.
Stars: ✭ 25 (-10.71%)
Mutual labels:  posix
ferenda
Transform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-21.43%)
Mutual labels:  scraping
subscene scraper
Library to download subtitles from subscene.com
Stars: ✭ 14 (-50%)
Mutual labels:  scraping
gunaydin
Your good mornings ☀️
Stars: ✭ 16 (-42.86%)
Mutual labels:  scraping
image-collector
Download images from Google Image Search
Stars: ✭ 38 (+35.71%)
Mutual labels:  scraping
tonix
Tonix provides basic file system functionality, as well as an interactive shell with a Unix-style command line interface.
Stars: ✭ 20 (-28.57%)
Mutual labels:  posix
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+1592.86%)
Mutual labels:  scraping
internet-affordability
🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-53.57%)
Mutual labels:  scraping
proxi
Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (+14.29%)
Mutual labels:  scraping
top-github-scraper
Scape top GitHub repositories and users based on keywords
Stars: ✭ 40 (+42.86%)
Mutual labels:  scraping
energymech
EnergyMech IRC Bot
Stars: ✭ 24 (-14.29%)
Mutual labels:  posix
naos
📉 Uptime and error monitoring CLI
Stars: ✭ 30 (+7.14%)
Mutual labels:  scraping
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+35.71%)
Mutual labels:  scraping
feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
Stars: ✭ 23 (-17.86%)
Mutual labels:  scraping
chirps
Twitter bot powering @arichduvet
Stars: ✭ 35 (+25%)
Mutual labels:  scraping
Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (-10.71%)
Mutual labels:  scraping
AngleParse
HTML parsing and processing tool for PowerShell.
Stars: ✭ 35 (+25%)
Mutual labels:  scraping

Shup

Simple HTML parser in shell.

  • Requires
    • POSIX shell
    • sed

Installation

To install shup you can edit the Makefile to match your local setup (shup is installed into the /usr/local/bin by default).

Afterwards enter the following command to install shup:

sudo make install

To uninstall shup, just run:

sudo make uninstall

Usage

USAGE: shup [OPTIONS] ["FILTER1" "FILTER2" ...]
  -h                 show this help
  -v                 show version
  -r                 raw: last filter tag will not be shown
  -t                 text: no tags will be shown
  -o   "string"      specify output indentation

FILTER FORMAT: "<tagname>"  or  "<tagname>[<search string>]"
    the search string should be present in the tag line
  EXAMPLE
    to match all div tags
         shup "div"
    to match div tags with some string
         shup "div[Qynugf]"
    will match : <div class="Qynugf">

    The string could be present anywhere inside the tags body <.>
    Patterns can be specified in the string using shell patterns
         shup "div[Qy?*[!h]f]"
 When no filters applied, shup will only format the HTML

Example

curl -s "www.gnu.org" | shup -r "body" "div[inner]" "ul" "li[[pP]hilo]" "a"
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].