Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.

Stars: ✭ 2,014 (+1391.85%)

Mutual labels: xpath, html-parser

Oga

Read-only mirror of https://gitlab.com/yorickpeterse/oga

Stars: ✭ 1,147 (+749.63%)

Mutual labels: parser, html-parser

Internettools

XPath/XQuery 3.1 interpreter for Pascal with compatibility modes for XPath 2.0/XQuery 1.0/3.0, custom and JSONiq extensions, XML/HTML parsers and classes for HTTP/S requests

Stars: ✭ 82 (-39.26%)

Mutual labels: parser, xpath

Hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

Stars: ✭ 295 (+118.52%)

Mutual labels: parser, html-parser

Save For Offline

Android app for saving webpages for offline reading.

Stars: ✭ 114 (-15.56%)

Mutual labels: parser, html-parser

Html Parser

php html parser，类似与PHP Simple HTML DOM Parser，但是比它快好几倍

Stars: ✭ 510 (+277.78%)

Mutual labels: parser, html-parser

Sax Wasm

The first streamable, fixed memory XML, HTML, and JSX parser for WebAssembly.

Stars: ✭ 89 (-34.07%)

Mutual labels: parser, html-parser

Lua Gumbo

Moved to https://gitlab.com/craigbarnes/lua-gumbo

Stars: ✭ 116 (-14.07%)

Mutual labels: parser, html-parser

Typin

Declarative framework for interactive CLI applications

Stars: ✭ 126 (-6.67%)

Mutual labels: parser

Csly

a C# embeddable lexer and parser generator (.Net core)

Stars: ✭ 129 (-4.44%)

Mutual labels: parser

Gofeed

Parse RSS, Atom and JSON feeds in Go

Stars: ✭ 1,762 (+1205.19%)

Mutual labels: parser

Prowide Core

Model and parsers for all SWIFT MT (FIN) messages

Stars: ✭ 125 (-7.41%)

Mutual labels: parser

Babylon

PSA: moved into babel/babel as @babel/parser -->

Stars: ✭ 1,692 (+1153.33%)

Mutual labels: parser

View All Similar Projects ➔

Harser

Harser is a library for easy extracting data from HTML and building XPath.

Installation

pip install harser

Examples

>>> from harser import Harser

>>> HTML = '''
    <html><body>
    <div class="header" id="id-header">
        <li class="nav-item" data-nav="first-item" href="/nav1">First item</li>
        <li class="nav-item" data-nav="second-item" href="/nav2">Second item</li>
        <li class="nav-item" data-nav="third-item" href="/nav3">Third item</li>
    </div>
    <div>First layer
        <h3>Lorem Ipsum</h3>
        <span>Dolor sit amet</span>
    </div>
    <div>Second layer</div>
    <div>Third layer
        <span class="text">first block</span>
        <span class="text">second block</span>
        <span>third block</span>
    </div>
    <span>fourth layer</span>
    <img />
    <div class="footer" id="id-foobar" foobar="ab bc cde">
        <h3 some-attr="hey">
            <span id="foobar-span">foo ter</span>
        </h3>
    </div>
    </body></html>
'''

>>> harser = Harser(HTML)

>>> harser.find('div', class_='header').children(class_='nav-item').find('text').extract()
# Or just
# harser.find(class_='nav-item').find('text').extract()
['First item', 'Second item', 'Third item']

>>> harser.find(class_='nav-item').get_attr('href').extract()
['/nav1', '/nav2', '/nav3']

# It is equally
>>> harser.find('div', class_='header', id='id-header')
>>> harser.find('div', attrs={'class': 'header', 'id': 'id-header'})

>>> harser.find(id__contains='bar').get_attr('class').extract()
['footer']

>>> harser.find(href__not_contains='2').find('text').extract()
['First item', 'Third item']

>>> harser.find(attrs={'data-nav__contains': 'second'}).next_siblings().find('text').extract()
['Third item']

>>> harser.find('li').parent().next_siblings(filters={'text__contains': 'Second'}).clean_extract()
['<div>Second layer</div>']

>>> harser.find('h3', filters={'[email protected]__starts_with': 'foo'}).get_attr('some-attr').extract()
['hey']

>>> harser.find('div').children('h3').xpath
'//descendant::div/h3'

Support the project

Please contact Michael Sinov if you want to support the Harser project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 135

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗