Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → GULPF → Nimquery

GULPF / Nimquery

Licence: mit

Nim library for querying HTML using CSS-selectors (like JavaScripts document.querySelector)

Programming Languages

nim

578 projects

Labels

html web scraping

Projects that are alternatives of or similar to Nimquery

Tabula

Tabula is a tool for liberating data tables trapped inside PDF files

Stars: ✭ 5,420 (+7126.67%)

Mutual labels: scraping

Scrapy Cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Stars: ✭ 921 (+1128%)

Mutual labels: scraping

Mtnt

Code for the collection and analysis of the MTNT dataset

Stars: ✭ 48 (-36%)

Mutual labels: scraping

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (+685.33%)

Mutual labels: scraping

Webhere

HTML scraping for Objective-C.

Stars: ✭ 16 (-78.67%)

Mutual labels: scraping

Configs

Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores

Stars: ✭ 37 (-50.67%)

Mutual labels: scraping

Gazpacho

🥫 The simple, fast, and modern web scraping library

Stars: ✭ 525 (+600%)

Mutual labels: scraping

Torrengo

Torrengo is a CLI (command line) program written in Go which concurrently searches torrents from various sources.

Stars: ✭ 67 (-10.67%)

Mutual labels: scraping

Instagram Scraper

Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

Stars: ✭ 903 (+1104%)

Mutual labels: scraping

Artoo

artoo.js - the client-side scraping companion.

Stars: ✭ 1,029 (+1272%)

Mutual labels: scraping

Parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Stars: ✭ 628 (+737.33%)

Mutual labels: scraping

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+952%)

Mutual labels: scraping

Pge Outages

Tracking PG&E outages

Stars: ✭ 43 (-42.67%)

Mutual labels: scraping

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+677.33%)

Mutual labels: scraping

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-24%)

Mutual labels: scraping

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+6738.67%)

Mutual labels: scraping

Pypatent

Search for and retrieve US Patent and Trademark Office Patent Data

Stars: ✭ 31 (-58.67%)

Mutual labels: scraping

Api Store

Contains all the public APIs listed in Phantombuster's API store. Pull requests welcome!

Stars: ✭ 69 (-8%)

Mutual labels: scraping

Mechaml

OCaml functional web scraping library

Stars: ✭ 60 (-20%)

Mutual labels: scraping

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+1265.33%)

Mutual labels: scraping

View All Similar Projects ➔

Nimquery

A library for querying HTML using CSS selectors, like JavaScripts document.querySelector/document.querySelectorAll.

Installation

Nimquery is available on Nimble:

nimble install nimquery

Usage

from xmltree import `$`
from htmlparser import parseHtml
from streams import newStringStream
import nimquery

let html = """
<!DOCTYPE html>
<html>
  <head><title>Example</title></head>
  <body>
    <p>1</p>
    <p>2</p>
    <p>3</p>
    <p>4</p>
  </body>
</html>
"""
let xml = parseHtml(newStringStream(html))
let elements = xml.querySelectorAll("p:nth-child(odd)")
echo elements
# => @[<p>1</p>, <p>3</p>]

API

proc querySelectorAll*(root: XmlNode,
                       queryString: string,
                       options: set[QueryOption] = DefaultQueryOptions): seq[XmlNode]

Get all elements matching queryString.
Raises ParseError if parsing of queryString fails.
See Options for information about the options parameter.

proc querySelector*(root: XmlNode,
                    queryString: string,
                    options: set[QueryOption] = DefaultQueryOptions): XmlNode

Get the first element matching queryString, or nil if no such element exists.
Raises ParseError if parsing of queryString fails.
See Options for information about the options parameter.

proc parseHtmlQuery*(queryString: string,
                     options: set[QueryOption] = DefaultQueryOptions): Query

Parses a query for later use.
Raises ParseError if parsing of queryString fails.
See Options for information about the options parameter.

proc exec*(query: Query,
           root: XmlNode,
           single: bool): seq[XmlNode]

Execute an already parsed query. If single = true, it will never return more than one element.

Options

The QueryOption enum contains flags for configuring the behavior when parsing/searching:

optUniqueIds: Indicates if id attributes should be assumed to be unique.
optSimpleNot: Indicates if only simple selectors are allowed as an argument to the :not(...) psuedo-class. Note that combinators are not allowed in the argument even if this flag is excluded.
optUnicodeIdentifiers: Indicates if unicode characters are allowed inside identifiers. Doesn't affect strings where unicode is always allowed.

The default options is defined as const DefaultQueryOptions* = { optUniqueIds, optUnicodeIdentifiers, optSimpleNot }.

Below is an example of using the options parameter to allow a complex :not(...) selector.

import xmltree
import htmlparser
import streams
import nimquery

let html = """
<!DOCTYPE html>
  <html>
    <head><title>Example</title></head>
    <body>
      <p>1</p>
      <p class="maybe-skip">2</p>
      <p class="maybe-skip">3</p>
      <p>4</p>
    </body>
  </html>
"""
let xml = parseHtml(newStringStream(html))
let options = DefaultQueryOptions - { optSimpleNot }
let elements = xml.querySelectorAll("p:not(.maybe-skip:nth-child(even))", options)
echo elements
# => @[<p>1</p>, <p class="maybe-skip">3</p>, <p>4</p>]

Unsupported selectors

Nimquery supports all CSS3 selectors except the following: :root, :link, :visited, :active, :hover, :focus, :target, :lang(...), :enabled, :disabled, :checked, ::first-line, ::first-letter, ::before, ::after. These selectors will not be implemented because they don't make much sense in the situations where Nimquery is useful.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 75

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗