kostya / Myhtml
Licence: mit
Fast HTML5 Parser with css selectors for Crystal language
Stars: ✭ 146
Programming Languages
crystal
512 projects
Projects that are alternatives of or similar to Myhtml
Tolerant Php Parser
An early-stage PHP parser designed for IDE usage scenarios.
Stars: ✭ 717 (+391.1%)
Mutual labels: parser, fast
Fast Xml Parser
Validate XML, Parse XML to JS/JSON and vise versa, or parse XML to Nimn rapidly without C/C++ based libraries and no callback
Stars: ✭ 1,021 (+599.32%)
Mutual labels: parser, fast
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Stars: ✭ 295 (+102.05%)
Mutual labels: parser, fast
Csstree
A tool set for CSS including fast detailed parser, walker, generator and lexer based on W3C specs and browser implementations
Stars: ✭ 1,121 (+667.81%)
Mutual labels: parser, fast
Antlr4 C3
A grammar agnostic code completion engine for ANTLR4 based parsers
Stars: ✭ 135 (-7.53%)
Mutual labels: parser
Swc
swc is a super-fast compiler written in rust; producing widely-supported javascript from modern standards and typescript.
Stars: ✭ 18,627 (+12658.22%)
Mutual labels: parser
Pygdbmi
A library to parse gdb mi output and interact with gdb subprocesses
Stars: ✭ 139 (-4.79%)
Mutual labels: parser
Guide To Swift Strings Sample Code
Xcode Playground Sample Code for the Flight School Guide to Swift Strings
Stars: ✭ 136 (-6.85%)
Mutual labels: parser
Entrypoint
Composable CLI Argument Parser for all modern .Net platforms.
Stars: ✭ 136 (-6.85%)
Mutual labels: parser
Cachewebview
Custom implement Android WebView cache, offline website, let cahe config more simple and flexible
Stars: ✭ 1,767 (+1110.27%)
Mutual labels: fast
Foxify
The fast, easy to use & typescript ready web framework for Node.js
Stars: ✭ 138 (-5.48%)
Mutual labels: fast
Marian Dev
Fast Neural Machine Translation in C++ - development repository
Stars: ✭ 136 (-6.85%)
Mutual labels: fast
MyHTML
Fast HTML5 Parser (Crystal binding for awesome lexborisov's myhtml and Modest). This shard used in production to parse millions of pages per day, very stable and fast.
Installation
Add this to your application's shard.yml
:
dependencies:
myhtml:
github: kostya/myhtml
And run shards install
Usage example
require "myhtml"
html = <<-HTML
<html>
<body>
<div id="t1" class="red">
<a >O_o</a>
</div>
<div id="t2"></div>
</body>
</html>
HTML
myhtml = Myhtml::Parser.new(html)
myhtml.nodes(:div).each do |node|
id = node.attribute_by("id")
if first_link = node.scope.nodes(:a).first?
href = first_link.attribute_by("href")
link_text = first_link.inner_text
puts "div with id #{id} have link [#{link_text}](#{href})"
else
puts "div with id #{id} have no links"
end
end
# Output:
# div with id t1 have link [O_o](/#)
# div with id t2 have no links
Css selectors example
require "myhtml"
html = <<-HTML
<html>
<body>
<table id="t1">
<tr><td>Hello</td></tr>
</table>
<table id="t2">
<tr><td>123</td><td>other</td></tr>
<tr><td>foo</td><td>columns</td></tr>
<tr><td>bar</td><td>are</td></tr>
<tr><td>xyz</td><td>ignored</td></tr>
</table>
</body>
</html>
HTML
myhtml = Myhtml::Parser.new(html)
p myhtml.css("#t2 tr td:first-child").map(&.inner_text).to_a
# => ["123", "foo", "bar", "xyz"]
p myhtml.css("#t2 tr td:first-child").map(&.to_html).to_a
# => ["<td>123</td>", "<td>foo</td>", "<td>bar</td>", "<td>xyz</td>"]
More Examples
Development Setup:
git clone https://github.com/kostya/myhtml.git
cd myhtml
make
crystal spec
Benchmark
Parse 1000 times google page, and 1000 times css select. myhtml-program, crystagiri-program, nokogiri-program
Lang | Shard | Lib | Parse time, s | Css time, s | Memory, MiB |
---|---|---|---|---|---|
Crystal | lexbor | lexbor | 2.39 | - | 7.7 |
Crystal | myhtml | myhtml(+modest) | 2.70 | 0.22 | 8.3 |
Crystal | Crystagiri | libxml2 | 8.02 | 8.59 | 75.4 |
Crystal | Gumbo | Gumbo | 18.18 | - | 2140.7 |
Ruby 2.7 | Nokogiri | libxml2 | 20.15 | 23.02 | 132.8 |
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].