All Projects → kostya → Myhtml

kostya / Myhtml

Licence: mit
Fast HTML5 Parser with css selectors for Crystal language

Programming Languages

crystal
512 projects

Projects that are alternatives of or similar to Myhtml

Tolerant Php Parser
An early-stage PHP parser designed for IDE usage scenarios.
Stars: ✭ 717 (+391.1%)
Mutual labels:  parser, fast
Ojg
Optimized JSON for Go
Stars: ✭ 281 (+92.47%)
Mutual labels:  parser, fast
Fast Xml Parser
Validate XML, Parse XML to JS/JSON and vise versa, or parse XML to Nimn rapidly without C/C++ based libraries and no callback
Stars: ✭ 1,021 (+599.32%)
Mutual labels:  parser, fast
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Stars: ✭ 295 (+102.05%)
Mutual labels:  parser, fast
Csstree
A tool set for CSS including fast detailed parser, walker, generator and lexer based on W3C specs and browser implementations
Stars: ✭ 1,121 (+667.81%)
Mutual labels:  parser, fast
Harser
Easy way for HTML parsing and building XPath
Stars: ✭ 135 (-7.53%)
Mutual labels:  parser
Dtp Stat
Карта ДТП
Stars: ✭ 141 (-3.42%)
Mutual labels:  parser
Antlr4 C3
A grammar agnostic code completion engine for ANTLR4 based parsers
Stars: ✭ 135 (-7.53%)
Mutual labels:  parser
Nimscan
🚀 Fast Port Scanner 🚀
Stars: ✭ 134 (-8.22%)
Mutual labels:  fast
Swc
swc is a super-fast compiler written in rust; producing widely-supported javascript from modern standards and typescript.
Stars: ✭ 18,627 (+12658.22%)
Mutual labels:  parser
Jaxon
Streaming JSON parser for Elixir
Stars: ✭ 145 (-0.68%)
Mutual labels:  parser
Pygdbmi
A library to parse gdb mi output and interact with gdb subprocesses
Stars: ✭ 139 (-4.79%)
Mutual labels:  parser
Guide To Swift Strings Sample Code
Xcode Playground Sample Code for the Flight School Guide to Swift Strings
Stars: ✭ 136 (-6.85%)
Mutual labels:  parser
Gumnut
JS parser in Web Assembly / C
Stars: ✭ 140 (-4.11%)
Mutual labels:  parser
Entrypoint
Composable CLI Argument Parser for all modern .Net platforms.
Stars: ✭ 136 (-6.85%)
Mutual labels:  parser
Glsl
GLSL parser for Rust
Stars: ✭ 145 (-0.68%)
Mutual labels:  parser
Cachewebview
Custom implement Android WebView cache, offline website, let cahe config more simple and flexible
Stars: ✭ 1,767 (+1110.27%)
Mutual labels:  fast
Foxify
The fast, easy to use & typescript ready web framework for Node.js
Stars: ✭ 138 (-5.48%)
Mutual labels:  fast
Nanoexpress
Professional backend framework for Node.js
Stars: ✭ 140 (-4.11%)
Mutual labels:  fast
Marian Dev
Fast Neural Machine Translation in C++ - development repository
Stars: ✭ 136 (-6.85%)
Mutual labels:  fast

MyHTML

Build Status

Fast HTML5 Parser (Crystal binding for awesome lexborisov's myhtml and Modest). This shard used in production to parse millions of pages per day, very stable and fast.

Installation

Add this to your application's shard.yml:

dependencies:
  myhtml:
    github: kostya/myhtml

And run shards install

Usage example

require "myhtml"

html = <<-HTML
  <html>
    <body>
      <div id="t1" class="red">
        <a >O_o</a>
      </div>
      <div id="t2"></div>
    </body>
  </html>
HTML

myhtml = Myhtml::Parser.new(html)

myhtml.nodes(:div).each do |node|
  id = node.attribute_by("id")

  if first_link = node.scope.nodes(:a).first?
    href = first_link.attribute_by("href")
    link_text = first_link.inner_text

    puts "div with id #{id} have link [#{link_text}](#{href})"
  else
    puts "div with id #{id} have no links"
  end
end

# Output:
#   div with id t1 have link [O_o](/#)
#   div with id t2 have no links

Css selectors example

require "myhtml"

html = <<-HTML
  <html>
    <body>
      <table id="t1">
        <tr><td>Hello</td></tr>
      </table>
      <table id="t2">
        <tr><td>123</td><td>other</td></tr>
        <tr><td>foo</td><td>columns</td></tr>
        <tr><td>bar</td><td>are</td></tr>
        <tr><td>xyz</td><td>ignored</td></tr>
      </table>
    </body>
  </html>
HTML

myhtml = Myhtml::Parser.new(html)

p myhtml.css("#t2 tr td:first-child").map(&.inner_text).to_a
# => ["123", "foo", "bar", "xyz"]

p myhtml.css("#t2 tr td:first-child").map(&.to_html).to_a
# => ["<td>123</td>", "<td>foo</td>", "<td>bar</td>", "<td>xyz</td>"]

More Examples

examples

Development Setup:

git clone https://github.com/kostya/myhtml.git
cd myhtml
make
crystal spec

Benchmark

Parse 1000 times google page, and 1000 times css select. myhtml-program, crystagiri-program, nokogiri-program

Lang Shard Lib Parse time, s Css time, s Memory, MiB
Crystal lexbor lexbor 2.39 - 7.7
Crystal myhtml myhtml(+modest) 2.70 0.22 8.3
Crystal Crystagiri libxml2 8.02 8.59 75.4
Crystal Gumbo Gumbo 18.18 - 2140.7
Ruby 2.7 Nokogiri libxml2 20.15 23.02 132.8
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].