All Projects → kostya → Modest

kostya / Modest

Licence: mit
CSS selectors for HTML5 Parser myhtml

Programming Languages

crystal
512 projects

Projects that are alternatives of or similar to Modest

visdom
A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
Stars: ✭ 80 (+70.21%)
Mutual labels:  css-selector
selective
Statically find HTML anti patterns using CSS Selectors
Stars: ✭ 15 (-68.09%)
Mutual labels:  css-selector
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+887.23%)
Mutual labels:  css-selector
CustomWebCheckbox
An example of a make checkbox design on the web.
Stars: ✭ 12 (-74.47%)
Mutual labels:  css-selector
Dom
Modern DOM API.
Stars: ✭ 88 (+87.23%)
Mutual labels:  css-selector
Temme
📄 Concise selector to extract JSON from HTML.
Stars: ✭ 257 (+446.81%)
Mutual labels:  css-selector
Sqrape
Simple Query Scraping with CSS and Go Reflection (MOVED to Gitlab)
Stars: ✭ 144 (+206.38%)
Mutual labels:  css-selector
Apifier
Apifier is a very simple HTML parser written in Python based on CSS selectors
Stars: ✭ 5 (-89.36%)
Mutual labels:  css-selector
ElementFinder
Fetch data from HTML and XML via xpath/css and prepare it with regexp
Stars: ✭ 29 (-38.3%)
Mutual labels:  css-selector
Xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Stars: ✭ 335 (+612.77%)
Mutual labels:  css-selector
GQ
CSS Selector Engine for Gumbo Parser
Stars: ✭ 25 (-46.81%)
Mutual labels:  css-selector
Landing-Page-Animation
Landing page animation made using CSS
Stars: ✭ 45 (-4.26%)
Mutual labels:  css-selector
Css Select
a CSS selector compiler & engine
Stars: ✭ 279 (+493.62%)
Mutual labels:  css-selector
html2data
Library and cli for extracting data from HTML via CSS selectors
Stars: ✭ 62 (+31.91%)
Mutual labels:  css-selector
Modest
Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
Stars: ✭ 572 (+1117.02%)
Mutual labels:  css-selector
Tq
Perform a lookup by CSS selector on an HTML input
Stars: ✭ 193 (+310.64%)
Mutual labels:  css-selector
modest ex
Elixir library to do pipeable transformations on html strings (with CSS selectors)
Stars: ✭ 31 (-34.04%)
Mutual labels:  css-selector
Css Selector
The CssSelector component converts CSS selectors to XPath expressions.
Stars: ✭ 6,928 (+14640.43%)
Mutual labels:  css-selector
Surgeon
Declarative DOM extraction expression evaluator. 👨‍⚕️
Stars: ✭ 653 (+1289.36%)
Mutual labels:  css-selector
Css Cheat Sheet
CSS Cheat Sheet - A reference for CSS goodness.
Stars: ✭ 310 (+559.57%)
Mutual labels:  css-selector

WARNING, this shard obsolete and moved to myhtml directly, use myhtml >= 1.0.0

modest

CSS selectors for HTML5 Parser myhtml (Crystal wrapper for https://github.com/lexborisov/Modest).

Installation

Add this to your application's shard.yml:

dependencies:
  modest:
    github: kostya/modest

Usage of CSS Selectors with myhtml parser

require "modest"

page = <<-PAGE
  <html>
    <div class=aaa><p id=bbb><a href="http://..." class=ccc>bla</a></div>
  </html>
PAGE

myhtml = Myhtml::Parser.new(page)

# css select from the root! scope (equal with myhtml.root!.css("..."))
iterator = myhtml.css("div.aaa p#bbb a.ccc") # => Iterator(Myhtml::Node), methods: .each, .to_a, ...

iterator.each do |node|
  p node.tag_id              # MyHTML_TAG_A
  p node.tag_name            # "a"
  p node.tag_sym             # :a
  p node.attributes["href"]? # "http://..."
  p node.inner_text          # "bla"
  puts node.to_html          # <a href="http://..." class="ccc">bla</a>
end

# css select from node scope
if p_node = myhtml.css("div.aaa p#bbb").first?
  p_node.css("a.ccc").each do |node|
    p node.tag_sym # :a
  end
end

Example 2

require "modest"

html = <<-PAGE
  <div>
    <p id=p1>
    <p id=p2 class=jo>
    <p id=p3>
      <a href="some.html" id=a1>link1</a>
      <a href="some.png" id=a2>link2</a>
    <div id=bla>
      <p id=p4 class=jo>
      <p id=p5 class=bu>
      <p id=p6 class=jo>
    </div>
  </div>
PAGE

parser = Myhtml::Parser.new(html)

# select all p nodes which id like `*p*`
p parser.css("p[id*=p]").map(&.attribute_by("id")).to_a # => ["p1", "p2", "p3", "p4", "p5", "p6"]

# select all nodes with class "jo"
p parser.css("p.jo").map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]
p parser.css(".jo").map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]

# select odd child tag inside div, which not contain a
p parser.css("div > :nth-child(2n+1):not(:has(a))").map(&.attribute_by("id")).to_a # => ["p1", "p4", "p6"]

# all elements with class=jo inside last div tag
p parser.css("div").to_a.last.css(".jo").map(&.attribute_by("id")).to_a # => ["p4", "p6"]

# a element with href ends like .png
p parser.css(%q{a[href$=".png"]}).map(&.attribute_by("id")).to_a # => ["a2"]

# find all a tags inside <p id=p3>, which href contain `html`
p parser.css(%q{p[id=p3] > a[href*="html"]}).map(&.attribute_by("id")).to_a # => ["a1"]

# find all a tags inside <p id=p3>, which href contain `html` or ends_with `.png`
p parser.css(%q{p[id=p3] > a:matches([href *= "html"], [href $= ".png"])}).map(&.attribute_by("id")).to_a # => ["a1", "a2"]

# create finder and use it in many places, this is faster, than create it many times
finder = Modest::Finder.new(".jo")
p parser.css(finder).map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]

Example 3

require "modest"

html = <<-PAGE
  <html><body>
  <table id="t1"><tbody>
  <tr><td>Hello</td></tr>
  </tbody></table>
  <table id="t2"><tbody>
  <tr><td>123</td><td>other</td></tr>
  <tr><td>foo</td><td>columns</td></tr>
  <tr><td>bar</td><td>are</td></tr>
  <tr><td>xyz</td><td>ignored</td></tr>
  </tbody></table>
  </body></html>
PAGE

parser = Myhtml::Parser.new(html)

p parser.css("#t2 tr td:first-child").map(&.inner_text).to_a # => ["123", "foo", "bar", "xyz"]
p parser.css("#t2 tr td:first-child").map(&.to_html).to_a # => ["<td>123</td>", "<td>foo</td>", "<td>bar</td>", "<td>xyz</td>"]

Benchmark

Comparing with nokorigi(libxml), and crystagiri(libxml). Parse 1000 times google page, code: https://github.com/kostya/modest/tree/master/bench

require "modest"
page = File.read("./google.html")
s = 0
links = [] of String
1000.times do
  myhtml = Myhtml::Parser.new(page)
  links = myhtml.css("div.g h3.r a").map(&.attribute_by("href")).to_a
  s += links.size
  myhtml.free
end
p links.last
p s

Parse + Selectors

Lang Package Time, s Memory, MiB
Crystal modest(myhtml) 2.52 7.7
Crystal Crystagiri(LibXML) 19.89 14.3
Ruby 2.2 Nokogiri(LibXML) 45.05 136.2

Selectors Only (files with suffix 2)

Lang Package Time, s Memory, MiB
Crystal modest(myhtml) 0.18 4.6
Crystal Crystagiri(LibXML) 12.30 6.6
Ruby 2.2 Nokogiri(LibXML) 28.06 68.8

CSS Selectors rules

https://drafts.csswg.org/selectors-4/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].