All Projects → itod → panthro

itod / panthro

Licence: MIT license
An implementation of XPath 3.0 in Objective-C/Cocoa

Programming Languages

c
50402 projects - #5 most used programming language
objective c
16641 projects - #2 most used programming language

Projects that are alternatives of or similar to panthro

Docs
《数据采集从入门到放弃》源码。内容简介:爬虫介绍、就业情况、爬虫工程师面试题 ;HTTP协议介绍; Requests使用 ;解析器Xpath介绍; MongoDB与MySQL; 多线程爬虫; Scrapy介绍 ;Scrapy-redis介绍; 使用docker部署; 使用nomad管理docker集群; 使用EFK查询docker日志
Stars: ✭ 118 (+162.22%)
Mutual labels:  xpath
Astpath
A command-line search utility for Python ASTs using XPath syntax.
Stars: ✭ 167 (+271.11%)
Mutual labels:  xpath
Pugixml
Light-weight, simple and fast XML parser for C++ with XPath support
Stars: ✭ 2,809 (+6142.22%)
Mutual labels:  xpath
Harser
Easy way for HTML parsing and building XPath
Stars: ✭ 135 (+200%)
Mutual labels:  xpath
Html Agility Pack
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
Stars: ✭ 2,014 (+4375.56%)
Mutual labels:  xpath
Zson
专为测试人员打造的JSON解析器
Stars: ✭ 181 (+302.22%)
Mutual labels:  xpath
Pythonstudy
Python related technologies used in work: crawler, data analysis, timing tasks, RPC, page parsing, decorator, built-in functions, Python objects, multi-threading, multi-process, asynchronous, redis, mongodb, mysql, openstack, etc.
Stars: ✭ 103 (+128.89%)
Mutual labels:  xpath
Meeseeks
An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
Stars: ✭ 252 (+460%)
Mutual labels:  xpath
Xquery
Extract data or evaluate value from HTML/XML documents using XPath
Stars: ✭ 155 (+244.44%)
Mutual labels:  xpath
Nokogiri
HTML parser for PHP - Парсер HTML
Stars: ✭ 214 (+375.56%)
Mutual labels:  xpath
Cssplus
CSSplus is a collection of CSS Reprocessor plugins that dynamically update CSS variables
Stars: ✭ 141 (+213.33%)
Mutual labels:  xpath
Goxpath
An XPath 1.0 implementation written in the Go programming language.
Stars: ✭ 148 (+228.89%)
Mutual labels:  xpath
Xmlquery
xmlquery is Golang XPath package for XML query.
Stars: ✭ 209 (+364.44%)
Mutual labels:  xpath
Jsonquery
jsonq package for Go. Golang XPath query for JSON query.
Stars: ✭ 134 (+197.78%)
Mutual labels:  xpath
Ftr Site Config
Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
Stars: ✭ 231 (+413.33%)
Mutual labels:  xpath
Graphquery
GraphQuery is a query language and execution engine tied to any backend service.
Stars: ✭ 112 (+148.89%)
Mutual labels:  xpath
Jquery Xpath
jQuery XPath plugin (with full XPath 2.0 language support)
Stars: ✭ 173 (+284.44%)
Mutual labels:  xpath
react-native-macos
Fork of https://github.com/ptmt/react-native-macos with more features
Stars: ✭ 22 (-51.11%)
Mutual labels:  cocoa
Ono
A sensible way to deal with XML & HTML for iOS & macOS
Stars: ✭ 2,599 (+5675.56%)
Mutual labels:  xpath
Xembly
Assembly for XML: imperative language to modify XML documents
Stars: ✭ 212 (+371.11%)
Mutual labels:  xpath

Panthro - XPath/XQuery 3.0-ish written in Cocoa, for use in Cocoa

Panthro is an implementation of XPath in Objective-C with decent unit test coverage, and intended for use on Apple's iOS and OS X platforms with bindings for libxml and NSXML included.

Panthro is mostly a port of the XPath 1.0 portions of the excellent Saxon 6.5 Java library by Michael Kay with my own additions.

Panthro supports all of XPath 1.0 and many of the most interesting features of 2.0 and even some of XPath 3.0 and XQuery. Here are some of the features supported by Panthro:

From XPath 1.0:
  • Evertything (I think)
From XPath 2.0:
  • Sequences (('a', 'b', 'c') or ())
  • Steps in Path expressions may be arbitrary sub-expressions (book/(chapter|appendix)/*)
  • for looping expressions
  • if conditional expressions
  • some and every quantified expressions
  • Range expressions (for $i in 1 to 10)
  • The intersect, except, and union operators
  • NameTest wildcard prefixes such as *:div
  • Many of the XPath 2.0 functions are supported including regex support in matches(), replace(), and tokenize()
  • Scientific notation (exponents) are allowed in number literals
From XQuery 1.0:
  • FLWOR (For, Let, Where, Order by, Return) expressions
  • Function declarations
  • Variable declarations
From XPath 3.0:
  • First-class inline functions (let $func := function() { … })
  • Anonymous functions ($map((1,2,3), function($n) { $n*$n }))
  • String concatenation operator ('foo' || 'bar' produces 'foobar')
  • Simple mapping operator (/book/section ! count(chapter))
From XQuery 3.0:
  • Switch expressions (switch (1) case 1 return 'one' case 2 return 'two' default return 'unknown')

I think most people familiar with XPath and XQuery will agree these are the most useful and interesting features beyond XPath 1.0, and Panthro has them all. Most of what is "missing" from XPath 2.0 in Panthro is related to the overly-complex and unpopular XML Schema-inspired static type system. Currently, implementing that portion of XPath 2.0 is not planned, and is probably a non-goal in the long term.

Data Model

Panthro's current data model lies in a fuzzy area somewhere between XPath 1.0 and 3.0. This is intentional. Panthro's type system has the simple, dynamic flavor and small number of types from XPath 1.0, plus the pervasive addition of Sequences from XPath 2.0, plus first-class functions from XPath 3.0.

The types are basically item (think base class), string, number, boolean, node, and sequence. As in XPath 2.0, every item is also a sequence of length 1. Any XPath or XQuery features related to explicit static types (e.g. as xs:integer) and casts (e.g. cast as xs:string, treat as xs:double, instance of xs:dateTime) are not currently supported, and will cause a syntax error.

The XPath parser is based on PEGKit. The PEGKit dependency is managed via git externals.

If you find any missing features you would like, please let me know via an Issue on this GitHub project.

Applications

Panthro currently powers two of my applications:

  1. Pathology - XPath Debugger and Visualizer for OS X
  2. Pathological - Search the OS X Finder with extreme precision using XPath

Examples

Some example expressions that currently work (i.e. they are parsed, execute, and return a correct result):

boolean(false() != true())

not(string-length('foo') = 1)

substring('12345', 2, 3)

substring-before('1999/04/01', '/')

/

.

.. 

chapter

chapter/title

*[@id]

//para

chapter[@id='c1' or @id='c3']

.|/|(//para)[2]

(//para)[1]|//chapter/@id[string(.)='c1']

ancestor-or-self::node()

chapter/@id != chapter[2]/@id

chapter[3]/preceding-sibling::*[2]/title

//chapter[1]/@*[namespace-uri(.)='bar']/..

id('c2 c1')[2]/title

book/(chapter|appendix)/*

book/(chapter[position()=last()]|appendix[1])/text()
let $map := function ($f, $seq) {
    for $item in $seq
        return $f($item)
}
return $map(function($arg) {$arg * $arg}, (1,2,3,4))
declare function mysum($v) {
    let $head := $v[1],
        $tail := subsequence($v, 2)
            return 
                if (count($v) = 1) then 
                    $head 
                else 
                    $head + mysum($tail)
};
mysum((1,2,3))

Non-standard Additions

  1. Other functions of my own design are incuded in the default function namespace: head(), tail(), title-case(), and trim-space().

XML Tree Model Bindings

XPath works on a tree-like representation of an XML document. So Panthro needs a tree-based XML API available (in C, C++, or ObjC) on Apple platforms. The most commonly-used XML tree APIs on these platforms are:

Panthro is designed to work with any XML tree API, but requires a small adapter layer for each (an implementation of the XPNodeInfo and XPDocumentInfo protocols). Panthro currently includes an adapter layer for libxml (for iOS and OS X) and NSXML (for OS X only).

Objective-C API

To use Panthro with NSXML on OS X:

// Build XML doc with NSXML
NSString *str = …
NSXMLDocument *doc = [[[NSXMLDocument alloc] initWithXMLString:str options:0 error:nil] autorelease];

// Wrap NSXML doc in Panthro adapter (id <XPNodeInfo>)
id <XPNodeInfo>ctxNode = [[[XPNSXMLDocumentImpl alloc] initWithNode:doc] autorelease];

// Create a Panthro stand-alone XPath context
XPStandaloneContext *env = [XPStandaloneContext standaloneContext];

The Panthro API allows you to first compile your XPath string into an intermediate tree representation (Abstract Syntax Tree, or AST), which can then be evaluated multiple times. The type of the AST is XPExpression. The API keywords for this are compile and evaluate:

// compile first…
NSError *err = nil;
XPExpression *expr = [env compile:@"book/chapter[@id='ch1']/title" error:&err];

// …then evaluate (possibly multiple times) later
NSString *ch1Title = [env evaluate:expr withContextNode:ctxNode error:&err];

Alternatively, the Panthro API allows you to complile and evaluate an XPath string all in one go. The API keyword for this combined action is execute:

// compile and evaluate together (AKA `execute`)
NSError *err = nil;
NSString *ch1Title = [env execute:@"book/chapter[@id='ch1']/title" withContextNode:ctxNode error:&err];
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].