All Projects → cezheng → Fuzi

cezheng / Fuzi

Licence: mit
A fast & lightweight XML & HTML parser in Swift with XPath & CSS support

Programming Languages

swift
15916 projects

Projects that are alternatives of or similar to Fuzi

Sax Wasm
The first streamable, fixed memory XML, HTML, and JSX parser for WebAssembly.
Stars: ✭ 89 (-90.04%)
Mutual labels:  xml, parser, html-parser, xml-parser
Oga
Read-only mirror of https://gitlab.com/yorickpeterse/oga
Stars: ✭ 1,147 (+28.3%)
Mutual labels:  xml, parser, html-parser, xml-parser
Didom
Simple and fast HTML and XML parser
Stars: ✭ 1,939 (+116.89%)
Mutual labels:  html-parser, xml-parser, xml, xpath
Hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
Stars: ✭ 295 (-67%)
Mutual labels:  xml, parser, html-parser, xml-parser
Posthtml
PostHTML is a tool to transform HTML/XML with JS plugins
Stars: ✭ 2,737 (+206.15%)
Mutual labels:  xml, parser, html-parser, xml-parser
Parse Xml
A fast, safe, compliant XML parser for Node.js and browsers.
Stars: ✭ 184 (-79.42%)
Mutual labels:  xml, parser, parsing, xml-parser
Harser
Easy way for HTML parsing and building XPath
Stars: ✭ 135 (-84.9%)
Mutual labels:  parser, xpath, html-parser
Node Xml2js
XML to JavaScript object converter.
Stars: ✭ 4,402 (+392.39%)
Mutual labels:  xml, parsing, xml-parser
Camaro
camaro is an utility to transform XML to JSON, using Node.js binding to native XML parser pugixml, one of the fastest XML parser around.
Stars: ✭ 438 (-51.01%)
Mutual labels:  xml, xpath, xml-parser
Xml Js
Converter utility between XML text and Javascript object / JSON text.
Stars: ✭ 874 (-2.24%)
Mutual labels:  xml, parser, xml-parser
Internettools
XPath/XQuery 3.1 interpreter for Pascal with compatibility modes for XPath 2.0/XQuery 1.0/3.0, custom and JSONiq extensions, XML/HTML parsers and classes for HTTP/S requests
Stars: ✭ 82 (-90.83%)
Mutual labels:  xml, parser, xpath
Xml2lua
XML Parser written entirely in Lua that works for Lua 5.1+. Convert XML to and from Lua Tables 🌖💱
Stars: ✭ 150 (-83.22%)
Mutual labels:  xml, parser, xml-parser
Xmlquery
xmlquery is Golang XPath package for XML query.
Stars: ✭ 209 (-76.62%)
Mutual labels:  xml, xpath, xml-parser
Pugixml
Light-weight, simple and fast XML parser for C++ with XPath support
Stars: ✭ 2,809 (+214.21%)
Mutual labels:  xml, xpath, xml-parser
Meeseeks
An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
Stars: ✭ 252 (-71.81%)
Mutual labels:  xml, parser, xpath
Dasel
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.
Stars: ✭ 759 (-15.1%)
Mutual labels:  xml, parser, xml-parser
Seafox
A blazing fast 100% spec compliant, self-hosted javascript parser written in Typescript
Stars: ✭ 425 (-52.46%)
Mutual labels:  parser, parsing
Stream Parser
⚡ PHP7 / Laravel Multi-format Streaming Parser
Stars: ✭ 391 (-56.26%)
Mutual labels:  xml, parser
Easyxml
Simplifies parsing and modifying of (huge) XML streams (files) based on the StAX parser with combination of JAXB or JDom2
Stars: ✭ 6 (-99.33%)
Mutual labels:  xml, xml-parser
Xpath
XPath package for Golang, supports HTML, XML, JSON document query.
Stars: ✭ 376 (-57.94%)
Mutual labels:  xml, xpath

Fuzi (斧子)

Build Status CocoaPods Compatible License Carthage Compatible Platform Twitter

A fast & lightweight XML/HTML parser in Swift that makes your life easier. [Documentation]

Fuzi is based on a Swift port of Mattt Thompson's Ono(斧), using most of its low level implementaions with moderate class & interface redesign following standard Swift conventions, along with several bug fixes.

Fuzi(斧子) means "axe", in homage to Ono(斧), which in turn is inspired by Nokogiri (鋸), which means "saw".

简体中文 日本語

A Quick Look

let xml = "..."
// or
// let xmlData = <some NSData or Data>
do {
  let document = try XMLDocument(string: xml)
  // or
  // let document = try XMLDocument(data: xmlData)
  
  if let root = document.root {
    // Accessing all child nodes of root element
    for element in root.children {
      print("\(element.tag): \(element.attributes)")
    }
    
    // Getting child element by tag & accessing attributes
    if let length = root.firstChild(tag:"Length", inNamespace: "dc") {
      print(length["unit"])     // `unit` attribute
      print(length.attributes)  // all attributes
    }
  }
  
  // XPath & CSS queries
  for element in document.xpath("//element") {
    print("\(element.tag): \(element.attributes)")
  }
  
  if let firstLink = document.firstChild(css: "a, link") {
    print(firstLink["href"])
  }
} catch let error {
  print(error)
}

Features

Inherited from Ono

  • Extremely performant document parsing and traversal, powered by libxml2
  • Support for both XPath and CSS queries
  • Automatic conversion of date and number values
  • Correct, common-sense handling of XML namespaces for elements and attributes
  • Ability to load HTML and XML documents from either String or NSData or [CChar]
  • Comprehensive test suite
  • Full documentation

Improved in Fuzi

  • Simple, modern API following standard Swift conventions, no more return types like AnyObject! that cause unnecessary type casts
  • Customizable date and number formatters
  • Some bugs fixes
  • More convenience methods for HTML Documents
  • Access XML nodes of all types (Including text, comment, etc.)
  • Support for more CSS selectors (yet to come)

Requirements

  • iOS 8.0+ / Mac OS X 10.9+
  • Xcode 8.0+

Use version 0.4.0 for Swift 2.3.

Installation

There are 4 ways you can install Fuzi to your project.

Using CocoaPods

You can use CocoaPods to install Fuzi by adding it to your to your Podfile:

platform :ios, '8.0'
use_frameworks!

target 'MyApp' do
	pod 'Fuzi', '~> 1.0.0'
end

Then, run the following command:

$ pod install

Using Swift Package Manager

The Swift Package Manager is now built-in with Xcode 11 (currently in beta). You can easily add Fuzi as a dependency by choosing File > Swift Packages > Add Package Dependency... or in the Swift Packages tab of your project file and clicking on +. Simply use https://github.com/cezheng/Fuzi as repository and Xcode should automatically resolve the current version.

Manually

  1. Add all *.swift files in Fuzi directory into your project.
  2. In your Xcode project Build Settings:
    1. Find Search Paths, add $(SDKROOT)/usr/include/libxml2 to Header Search Paths.
    2. Find Linking, add -lxml2 to Other Linker Flags.

Using Carthage

Create a Cartfile or Cartfile.private in the root directory of your project, and add the following line:

github "cezheng/Fuzi" ~> 1.0.0

Run the following command:

$ carthage update

Then do the followings in Xcode:

  1. Drag the Fuzi.framework built by Carthage into your target's General -> Embedded Binaries.
  2. In Build Settings, find Search Paths, add $(SDKROOT)/usr/include/libxml2 to Header Search Paths.

Usage

XML

import Fuzi

let xml = "..."
do {
  // if encoding is omitted, it defaults to NSUTF8StringEncoding
  let document = try XMLDocument(string: html, encoding: String.Encoding.utf8)
  if let root = document.root {
    print(root.tag)
    
    // define a prefix for a namespace
    document.definePrefix("atom", defaultNamespace: "http://www.w3.org/2005/Atom")
    
    // get first child element with given tag in namespace(optional)
    print(root.firstChild(tag: "title", inNamespace: "atom"))

    // iterate through all children
    for element in root.children {
      print("\(index) \(element.tag): \(element.attributes)")
    }
  }
  // you can also use CSS selector against XMLDocument when you feels it makes sense
} catch let error as XMLError {
  switch error {
  case .noError: print("wth this should not appear")
  case .parserFailure, .invalidData: print(error)
  case .libXMLError(let code, let message):
    print("libxml error code: \(code), message: \(message)")
  }
}

HTML

HTMLDocument is a subclass of XMLDocument.

import Fuzi

let html = "<html>...</html>"
do {
  // if encoding is omitted, it defaults to NSUTF8StringEncoding
  let doc = try HTMLDocument(string: html, encoding: String.Encoding.utf8)
  
  // CSS queries
  if let elementById = doc.firstChild(css: "#id") {
    print(elementById.stringValue)
  }
  for link in doc.css("a, link") {
      print(link.rawXML)
      print(link["href"])
  }
  
  // XPath queries
  if let firstAnchor = doc.firstChild(xpath: "//body/a") {
    print(firstAnchor["href"])
  }
  for script in doc.xpath("//head/script") {
    print(script["src"])
  }
  
  // Evaluate XPath functions
  if let result = doc.eval(xpath: "count(/*/a)") {
    print("anchor count : \(result.doubleValue)")
  }
  
  // Convenient HTML methods
  print(doc.title) // gets <title>'s innerHTML in <head>
  print(doc.head)  // gets <head> element
  print(doc.body)  // gets <body> element
  
} catch let error {
  print(error)
}

I don't care about error handling

import Fuzi

let xml = "..."

// Don't show me the errors, just don't crash
if let doc1 = try? XMLDocument(string: xml) {
  //...
}

let html = "<html>...</html>"

// I'm sure this won't crash
let doc2 = try! HTMLDocument(string: html)
//...

I want to access Text Nodes

Not only text nodes, you can specify what types of nodes you would like to access.

let document = ...
// Get all child nodes that are Element nodes, Text nodes, or Comment nodes
document.root?.childNodes(ofTypes: [.Element, .Text, .Comment])

Migrating From Ono?

Looking at example programs is the swiftest way to know the difference. The following 2 examples do exactly the same thing.

Ono Example

Fuzi Example

Accessing children

Ono

[doc firstChildWithTag:tag inNamespace:namespace];
[doc firstChildWithXPath:xpath];
[doc firstChildWithXPath:css];
for (ONOXMLElement *element in parent.children) {
  //...
}
[doc childrenWithTag:tag inNamespace:namespace];

Fuzi

doc.firstChild(tag: tag, inNamespace: namespace)
doc.firstChild(xpath: xpath)
doc.firstChild(css: css)
for element in parent.children {
  //...
}
doc.children(tag: tag, inNamespace:namespace)

Iterate through query results

Ono

Conforms to NSFastEnumeration.

// simply iterating through the results
// mark `__unused` to unused params `idx` and `stop`
[doc enumerateElementsWithXPath:xpath usingBlock:^(ONOXMLElement *element, __unused NSUInteger idx, __unused BOOL *stop) {
  NSLog(@"%@", element);
}];

// stop the iteration at second element
[doc enumerateElementsWithXPath:XPath usingBlock:^(ONOXMLElement *element, NSUInteger idx, BOOL *stop) {
  *stop = (idx == 1);
}];

// getting element by index 
ONOXMLDocument *nthElement = [(NSEnumerator*)[doc CSS:css] allObjects][n];

// total element count
NSUInteger count = [(NSEnumerator*)[document XPath:xpath] allObjects].count;

Fuzi

Conforms to Swift's SequenceType and Indexable.

// simply iterating through the results
// no need to write the unused `idx` or `stop` params
for element in doc.xpath(xpath) {
  print(element)
}

// stop the iteration at second element
for (index, element) in doc.xpath(xpath).enumerate() {
  if idx == 1 {
    break
  }
}

// getting element by index 
if let nthElement = doc.css(css)[n] {
  //...
}

// total element count
let count = doc.xpath(xpath).count

Evaluating XPath Functions

Ono

ONOXPathFunctionResult *result = [doc functionResultByEvaluatingXPath:xpath];
result.boolValue;    //BOOL
result.numericValue; //double
result.stringValue;  //NSString

Fuzi

if let result = doc.eval(xpath: xpath) {
  result.boolValue   //Bool
  result.doubleValue //Double
  result.stringValue //String
}

License

Fuzi is released under the MIT license. See LICENSE for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].