All Projects → GULPF → Nimquery

GULPF / Nimquery

Licence: mit
Nim library for querying HTML using CSS-selectors (like JavaScripts document.querySelector)

Programming Languages

nim
578 projects

Projects that are alternatives of or similar to Nimquery

Tabula
Tabula is a tool for liberating data tables trapped inside PDF files
Stars: ✭ 5,420 (+7126.67%)
Mutual labels:  scraping
Scrapy Cluster
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+1128%)
Mutual labels:  scraping
Mtnt
Code for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-36%)
Mutual labels:  scraping
Newcrawler
Free Web Scraping Tool with Java
Stars: ✭ 589 (+685.33%)
Mutual labels:  scraping
Webhere
HTML scraping for Objective-C.
Stars: ✭ 16 (-78.67%)
Mutual labels:  scraping
Configs
Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-50.67%)
Mutual labels:  scraping
Gazpacho
🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+600%)
Mutual labels:  scraping
Torrengo
Torrengo is a CLI (command line) program written in Go which concurrently searches torrents from various sources.
Stars: ✭ 67 (-10.67%)
Mutual labels:  scraping
Instagram Scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
Stars: ✭ 903 (+1104%)
Mutual labels:  scraping
Artoo
artoo.js - the client-side scraping companion.
Stars: ✭ 1,029 (+1272%)
Mutual labels:  scraping
Parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Stars: ✭ 628 (+737.33%)
Mutual labels:  scraping
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+952%)
Mutual labels:  scraping
Pge Outages
Tracking PG&E outages
Stars: ✭ 43 (-42.67%)
Mutual labels:  scraping
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+677.33%)
Mutual labels:  scraping
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-24%)
Mutual labels:  scraping
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+6738.67%)
Mutual labels:  scraping
Pypatent
Search for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-58.67%)
Mutual labels:  scraping
Api Store
Contains all the public APIs listed in Phantombuster's API store. Pull requests welcome!
Stars: ✭ 69 (-8%)
Mutual labels:  scraping
Mechaml
OCaml functional web scraping library
Stars: ✭ 60 (-20%)
Mutual labels:  scraping
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+1265.33%)
Mutual labels:  scraping

Nimquery CI

A library for querying HTML using CSS selectors, like JavaScripts document.querySelector/document.querySelectorAll.

Installation

Nimquery is available on Nimble:

nimble install nimquery

Usage

from xmltree import `$`
from htmlparser import parseHtml
from streams import newStringStream
import nimquery

let html = """
<!DOCTYPE html>
<html>
  <head><title>Example</title></head>
  <body>
    <p>1</p>
    <p>2</p>
    <p>3</p>
    <p>4</p>
  </body>
</html>
"""
let xml = parseHtml(newStringStream(html))
let elements = xml.querySelectorAll("p:nth-child(odd)")
echo elements
# => @[<p>1</p>, <p>3</p>]

API

proc querySelectorAll*(root: XmlNode,
                       queryString: string,
                       options: set[QueryOption] = DefaultQueryOptions): seq[XmlNode]

Get all elements matching queryString.
Raises ParseError if parsing of queryString fails.
See Options for information about the options parameter.


proc querySelector*(root: XmlNode,
                    queryString: string,
                    options: set[QueryOption] = DefaultQueryOptions): XmlNode

Get the first element matching queryString, or nil if no such element exists.
Raises ParseError if parsing of queryString fails.
See Options for information about the options parameter.


proc parseHtmlQuery*(queryString: string,
                     options: set[QueryOption] = DefaultQueryOptions): Query

Parses a query for later use.
Raises ParseError if parsing of queryString fails.
See Options for information about the options parameter.


proc exec*(query: Query,
           root: XmlNode,
           single: bool): seq[XmlNode]

Execute an already parsed query. If single = true, it will never return more than one element.

Options

The QueryOption enum contains flags for configuring the behavior when parsing/searching:

  • optUniqueIds: Indicates if id attributes should be assumed to be unique.
  • optSimpleNot: Indicates if only simple selectors are allowed as an argument to the :not(...) psuedo-class. Note that combinators are not allowed in the argument even if this flag is excluded.
  • optUnicodeIdentifiers: Indicates if unicode characters are allowed inside identifiers. Doesn't affect strings where unicode is always allowed.

The default options is defined as const DefaultQueryOptions* = { optUniqueIds, optUnicodeIdentifiers, optSimpleNot }.

Below is an example of using the options parameter to allow a complex :not(...) selector.

import xmltree
import htmlparser
import streams
import nimquery

let html = """
<!DOCTYPE html>
  <html>
    <head><title>Example</title></head>
    <body>
      <p>1</p>
      <p class="maybe-skip">2</p>
      <p class="maybe-skip">3</p>
      <p>4</p>
    </body>
  </html>
"""
let xml = parseHtml(newStringStream(html))
let options = DefaultQueryOptions - { optSimpleNot }
let elements = xml.querySelectorAll("p:not(.maybe-skip:nth-child(even))", options)
echo elements
# => @[<p>1</p>, <p class="maybe-skip">3</p>, <p>4</p>]

Unsupported selectors

Nimquery supports all CSS3 selectors except the following: :root, :link, :visited, :active, :hover, :focus, :target, :lang(...), :enabled, :disabled, :checked, ::first-line, ::first-letter, ::before, ::after. These selectors will not be implemented because they don't make much sense in the situations where Nimquery is useful.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].