All Projects → sparklemotion → Nokogiri

sparklemotion / Nokogiri

Licence: other
Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.

Programming Languages

ruby
36898 projects - #4 most used programming language
c
50402 projects - #5 most used programming language
java
68154 projects - #9 most used programming language
C++
36643 projects - #6 most used programming language
HTML
75241 projects
Ragel
52 projects

Projects that are alternatives of or similar to Nokogiri

edireader
EDIReader is a flexible and lightweight EDI parser, written in pure Java with many integration options. It has handled millions of transactions in a wide variety of products, services, industries, platforms, and custom integrations. Available as the open source Community Edition and the Premium Edition with added-value modules.
Stars: ✭ 80 (-98.61%)
Mutual labels:  xml, sax
xmlresolver
The xmlresolver project provides an advanced implementation of the SAX EntityResolver (and extended EntityResolver2), the Transformer URIResolver, the DOM LSResourceResolver, the StAX XMLResolver, and a new NamespaceResolver. It uses the OASIS XML Catalogs V1.1 Standard to provide a mapping from external identifiers and URIs to local resources.
Stars: ✭ 31 (-99.46%)
Mutual labels:  xml, sax
Ono
A sensible way to deal with XML & HTML for iOS & macOS
Stars: ✭ 2,599 (-54.78%)
Mutual labels:  xml, libxml2
saxophone
Fast and lightweight event-driven streaming XML parser in pure JavaScript
Stars: ✭ 29 (-99.5%)
Mutual labels:  xml, sax
Poco
The POCO C++ Libraries are powerful cross-platform C++ libraries for building network- and internet-based applications that run on desktop, server, mobile, IoT, and embedded systems.
Stars: ✭ 5,762 (+0.24%)
Mutual labels:  xml
Ransack
Object-based searching.
Stars: ✭ 5,020 (-12.67%)
Mutual labels:  ruby-gem
Jekyll Theme Basically Basic
Your new Jekyll default theme
Stars: ✭ 524 (-90.88%)
Mutual labels:  ruby-gem
Facebook data analyzer
Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
Stars: ✭ 515 (-91.04%)
Mutual labels:  ruby-gem
Mobility
Pluggable Ruby translation framework
Stars: ✭ 644 (-88.8%)
Mutual labels:  ruby-gem
Parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Stars: ✭ 628 (-89.07%)
Mutual labels:  xml
Jquery Tmbundle
TextMate bundle for jQuery
Stars: ✭ 572 (-90.05%)
Mutual labels:  xml
Servicestack
Thoughtfully architected, obscenely fast, thoroughly enjoyable web services for all
Stars: ✭ 4,976 (-13.43%)
Mutual labels:  xml
Countries
World countries in JSON, CSV, XML and Yaml. Any help is welcome!
Stars: ✭ 5,379 (-6.42%)
Mutual labels:  xml
Refit
The automatic type-safe REST library for .NET Core, Xamarin and .NET. Heavily inspired by Square's Retrofit library, Refit turns your REST API into a live interface.
Stars: ✭ 5,545 (-3.53%)
Mutual labels:  xml
Fsharp.data
F# Data: Library for Data Access
Stars: ✭ 631 (-89.02%)
Mutual labels:  xml
Basex
BaseX Main Repository.
Stars: ✭ 515 (-91.04%)
Mutual labels:  xml
Koodo Reader
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
Stars: ✭ 2,938 (-48.89%)
Mutual labels:  xml
Jekyll Rss Feeds
Templates for rendering RSS feeds for your Jekyll blog
Stars: ✭ 627 (-89.09%)
Mutual labels:  xml
Pastel
Terminal output styling with intuitive and clean API.
Stars: ✭ 569 (-90.1%)
Mutual labels:  ruby-gem
Xstream
Serialize Java objects to XML and back again.
Stars: ✭ 569 (-90.1%)
Mutual labels:  xml

Nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).

Guiding Principles

Some guiding principles Nokogiri tries to follow:

  • be secure-by-default by treating all documents as untrusted by default
  • be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers

Features Overview

  • DOM Parser for XML, HTML4, and HTML5
  • SAX Parser for XML and HTML4
  • Push Parser for XML and HTML4
  • Document search via XPath 1.0
  • Document search via CSS3 selectors, with some jquery-like extensions
  • XSD Schema validation
  • XSLT transformation
  • "Builder" DSL for XML and HTML documents

Status

Github Actions CI Appveyor CI

Gem Version SemVer compatibility

CII Best Practices Tidelift dependencies

Support, Getting Help, and Reporting Issues

All official documentation is posted at https://nokogiri.org (the source for which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome contributions).

Consider subscribing to Tidelift which provides license assurances and timely security notifications for your open source dependencies, including Nokogiri. Tidelift subscriptions also help the Nokogiri maintainers fund our automated testing which in turn allows us to ship releases, bugfixes, and security updates more often.

Reading

Your first stops for learning more about Nokogiri should be:

Ask For Help

There are a few ways to ask exploratory questions:

Please do not mail the maintainers at their personal addresses.

Report A Bug

The Nokogiri bug tracker is at https://github.com/sparklemotion/nokogiri/issues

Please use the "Bug Report" or "Installation Difficulties" templates.

Security and Vulnerability Reporting

Please report vulnerabilities at https://hackerone.com/nokogiri

Full information and description of our security policy is in SECURITY.md

Semantic Versioning Policy

Nokogiri follows Semantic Versioning (since 2017 or so). Dependabot's SemVer compatibility score for Nokogiri

We bump Major.Minor.Patch versions following this guidance:

Major: (we've never done this)

  • Significant backwards-incompatible changes to the public API that would require rewriting existing application code.
  • Some examples of backwards-incompatible changes we might someday consider for a Major release are at ROADMAP.md.

Minor:

Patch:

  • Bugfixes.
  • Security updates.
  • Updating packaged libraries for security-related reasons.

Installation

Requirements:

  • Ruby >= 2.5
  • JRuby >= 9.2.0.0

Native Gems: Faster, more reliable installation

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries, or for system dependencies to exist. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

Supported Platforms

As of v1.11.0, Nokogiri ships pre-compiled, "native" gems for the following platforms:

  • Linux: x86-linux and x86_64-linux (req: glibc >= 2.17), including musl platforms like Alpine
  • Darwin/MacOS: x86_64-darwin and arm64-darwin
  • Windows: x86-mingw32 and x64-mingw32
  • Java: any platform running JRuby 9.2 or higher

To determine whether your system supports one of these gems, look at the output of bundle platform or ruby -e 'puts Gem::Platform.local.to_s'.

If you're on a supported platform, either gem install or bundle install should install a native gem without any additional action on your part. This installation should only take a few seconds, and your output should look something like:

$ gem install nokogiri
Fetching nokogiri-1.11.0-x86_64-linux.gem
Successfully installed nokogiri-1.11.0-x86_64-linux
1 gem installed

Other Installation Options

Because Nokogiri is a C extension, it requires that you have a C compiler toolchain, Ruby development header files, and some system dependencies installed.

The following may work for you if you have an appropriately-configured system:

gem install nokogiri

If you have any issues, please visit Installing Nokogiri for more complete instructions and troubleshooting.

How To Use Nokogiri

Nokogiri is a large library, and so it's challenging to briefly summarize it. We've tried to provide long, real-world examples at Tutorials.

Parsing and Querying

Here is example usage for parsing and querying a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(URI.open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

# Search for nodes by css
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

# Search for nodes by xpath
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

# Or mix and match
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end

Encoding

Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return a string containing markup (like to_xml, to_html and inner_html) will return a string encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')

Technical Overview

Guiding Principles

As noted above, two guiding principles of the software are:

  • be secure-by-default by treating all documents as untrusted by default
  • be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers

Notably, despite all parsers being standards-compliant, there are behavioral inconsistencies between the parsers used in the CRuby and JRuby implementations, and Nokogiri does not and should not attempt to remove these inconsistencies. Instead, we surface these differences in the test suite when they are important/semantic; or we intentionally write tests to depend only on the important/semantic bits (omitting whitespace from regex matchers on results, for example).

CRuby

The Ruby (a.k.a., CRuby, MRI, YARV) implementation is a C extension that depends on libxml2 and libxslt (which in turn depend on zlib and possibly libiconv).

These dependencies are met by default by Nokogiri's packaged versions of the libxml2 and libxslt source code, but a configuration option --use-system-libraries is provided to allow specification of alternative library locations. See Installing Nokogiri for full documentation.

We provide native gems by pre-compiling libxml2 and libxslt (and potentially zlib and libiconv) and packaging them into the gem file. In this case, no compilation is necessary at installation time, which leads to faster and more reliable installation.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are provided in which native and source gems.

JRuby

The Java (a.k.a. JRuby) implementation is a Java extension that depends primarily on Xerces and NekoHTML for parsing, though additional dependencies are on isorelax, nekodtd, jing, serializer, xalan-j, and xml-apis.

These dependencies are provided by pre-compiled jar files packaged in the java platform gem.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are provided in which native and source gems.

Contributing

See CONTRIBUTING.md for an intro guide to developing Nokogiri.

Code of Conduct

We've adopted the Contributor Covenant code of conduct, which you can read in full in CODE_OF_CONDUCT.md.

License

This project is licensed under the terms of the MIT license.

See this license at LICENSE.md.

Dependencies

Some additional libraries may be distributed with your version of Nokogiri. Please see LICENSE-DEPENDENCIES.md for a discussion of the variations as well as the licenses thereof.

Authors

  • Mike Dalessio
  • Aaron Patterson
  • Yoko Harada
  • Akinori MUSHA
  • John Shahid
  • Karol Bucek
  • Sam Ruby
  • Craig Barnes
  • Stephen Checkoway
  • Lars Kanis
  • Sergio Arbeo
  • Timothy Elliott
  • Nobuyoshi Nakada
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].