Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → soulcutter → Saxerator

soulcutter / Saxerator

Licence: mit

A SAX-based XML parser for parsing large files into manageable chunks

Programming Languages

ruby

36898 projects - #4 most used programming language

Labels

xml

Projects that are alternatives of or similar to Saxerator

I7j Pdfhtml

pdfHTML is an iText 7 add-on for Java that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, searchable and usable for indexing.

Stars: ✭ 104 (-12.61%)

Mutual labels: xml

Kripton

A Java/Kotlin library for Android platform, to manage bean's persistence in SQLite, SharedPreferences, JSON, XML, Properties, Yaml, CBOR.

Stars: ✭ 110 (-7.56%)

Mutual labels: xml

Fetch Plus

🐕 Fetch+ is a convenient Fetch API replacement with first-class middleware support.

Stars: ✭ 116 (-2.52%)

Mutual labels: xml

Qxmledit

QXmlEdit XML editor. Downloads: https://sourceforge.net/projects/qxmledit/files

Stars: ✭ 106 (-10.92%)

Mutual labels: xml

Webapiclient

An open source project based on the HttpClient. You only need to define the c# interface and modify the related features to invoke the client library of the remote http interface asynchronously.

Stars: ✭ 1,618 (+1259.66%)

Mutual labels: xml

Graphquery

GraphQuery is a query language and execution engine tied to any backend service.

Stars: ✭ 112 (-5.88%)

Mutual labels: xml

Iso 3166 Countries With Regional Codes

ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

Stars: ✭ 1,372 (+1052.94%)

Mutual labels: xml

Flexlib

FlexLib是一个基于flexbox模型，使用xml文件进行界面布局的框架，融合了web快速布局的能力，让iOS界面开发像写网页一样简单快速

Stars: ✭ 1,569 (+1218.49%)

Mutual labels: xml

Dotnet Transform Xdt

Modern .NET tools and library for XDT (Xml Document Transformation)

Stars: ✭ 110 (-7.56%)

Mutual labels: xml

Marklogic Data Hub

The MarkLogic Data Hub: documentation ==>

Stars: ✭ 113 (-5.04%)

Mutual labels: xml

Render

Go package for easily rendering JSON, XML, binary data, and HTML templates responses.

Stars: ✭ 1,562 (+1212.61%)

Mutual labels: xml

Pdfalto

PDF to XML ALTO file converter

Stars: ✭ 109 (-8.4%)

Mutual labels: xml

Repurrrsive

Recursive lists to use in teaching and examples, because there is no iris data for lists.

Stars: ✭ 112 (-5.88%)

Mutual labels: xml

Plot

A DSL for writing type-safe HTML, XML and RSS in Swift.

Stars: ✭ 1,722 (+1347.06%)

Mutual labels: xml

Twital

Twital is a "plugin" for Twig that adds some sugar syntax, which makes its templates similar to PHPTal or VueJS.

Stars: ✭ 116 (-2.52%)

Mutual labels: xml

Material Bottomnavigation

Bottom Navigation widget component inspired by the Google Material Design Guidelines at https://www.google.com/design/spec/components/bottom-navigation.html

Stars: ✭ 1,375 (+1055.46%)

Mutual labels: xml

Bible Database

Bible databases as XML, JSON, SQL & SQLITE3 Database format for various languages. Developers can download it freely for their development works. Freely received, freely give.

Stars: ✭ 111 (-6.72%)

Mutual labels: xml

Binding.scala

Reactive data-binding for Scala

Stars: ✭ 1,539 (+1193.28%)

Mutual labels: xml

Lemminx

XML Language Server

Stars: ✭ 117 (-1.68%)

Mutual labels: xml

Dino

Modern XMPP ("Jabber") Chat Client using GTK+/Vala

Stars: ✭ 1,637 (+1275.63%)

Mutual labels: xml

View All Similar Projects ➔

Saxerator

Saxerator is a streaming xml-to-hash parser designed for working with very large xml files by giving you Enumerable access to manageable chunks of the document.

Each xml chunk is parsed into a JSON-like Ruby Hash structure for consumption.

You can parse any valid xml in 3 simple steps.

Initialize the parser
Specify which tag you care about using a simple DSL
Perform your work in an each block, or using any Enumerable method

Installation

gem install saxerator
Choose an xml parser
- (default) Use ruby's built-in REXML parser - no other dependencies necessary
- gem install nokogiri
- gem install ox
If not using the default, specify your adapter in the Saxerator configuration

The DSL

The DSL consists of predicates that may be combined to describe which elements the parser should enumerate over. Saxerator will only enumerate over chunks of xml that match all of the combined predicates (see Examples section for added clarity).

Predicate	Explanation
`all`	Returns the entire document parsed into a hash. Cannot combine with other predicates
`for_tag(name)`	Elements whose name matches the given `name`
`for_tags(names)`	Elements whose name is in the `names` Array
`at_depth(n)`	Elements `n` levels deep inside the root of an xml document. The root element itself is `n = 0`
`within(name)`	Elements nested anywhere within an element with the given `name`
`child_of(name)`	Elements that are direct children of an element with the given `name`
`with_attribute(name, value)`	Elements that have an attribute with a given `name` and `value`. If no `value` is given, matches any element with the specified attribute name present
`with_attributes(attrs)`	Similar to `with_attribute` except takes an Array or Hash indicating the attributes to match

On any parsing error it'll raise an Saxerator::ParseException exception with the message that describe what is wrong on XML document. Warning Rexml won't raise and error if root elent wasn't closed. (will be fixed on ruby 2.5)

Examples

parser = Saxerator.parser(File.new("rss.xml"))

parser.for_tag(:item).each do |item|
  # where the xml contains <item><title>...</title><author>...</author></item>
  # item will look like {'title' => '...', 'author' => '...'}
  puts "#{item['title']}: #{item['author']}"
end

# a String is returned here since the given element contains only character data
puts "First title: #{parser.for_tag(:title).first}"

Attributes are stored as a part of the Hash or String object they relate to

# author is a String here, but also responds to .attributes
primary_authors = parser.for_tag(:author).select { |author| author.attributes['type'] == 'primary' }

You can combine predicates to isolate just the tags you want.

require 'saxerator'

parser = Saxerator.parser(bookshelf_xml)

# You can chain predicates
parser.for_tag(:name).within(:book).each { |book_name| puts book_name }

# You can re-use intermediary predicates
bookshelf_contents = parser.within(:bookshelf)

books = bookshelf_contents.for_tag(:book)
magazines = bookshelf_contents.for_tag(:magazine)

books.each do |book|
  # ...
end

magazines.each do |magazine|
  # ...
end

Configuration

Certain options are available via a configuration block at parser initialization.

Saxerator.parser(xml) do |config|
  config.output_type = :xml
end

Setting	Default	Values	Description
`adapter`	`:nokogiri`	`:nokogiri`, `:oga`, `:ox`, `:rexml`	The XML parser used by Saxerator
`output_type`	`:hash`	`:hash`, `:xml`	The type of object generated by Saxerator's parsing. `:hash` generates a Ruby Hash, `:xml` generates a `REXML::Document`
`symbolize_keys!`	n/a	n/a	Call this method if you want the hash keys to be symbols rather than strings
`ignore_namespaces!`	n/a	n/a	Call this method if you want to treat the XML document as if it has no namespace information. It differs slightly from `strip_namespaces!` since it deals with how the XML is processed rather than how it is output
`strip_namespaces!`	n/a	user-specified	Called with no arguments this strips all namespaces, or you may specify an arbitrary number of namespaces to strip, i.e. `config.strip_namespaces! :rss, :soapenv`
`put_attributes_in_hash!`	n/a	n/a	Call this method if you want xml attributes included as elements of the output hash - only valid with `output_type = :hash`

Known Issues

JRuby closes the file stream at the end of parsing, therefor to perform multiple operations which parse a file you will need to instantiate a new parser with a new File object.

FAQ

Why the name 'Saxerator'?

It's a combination of SAX + Enumerator.

Why use Saxerator over regular SAX parsing?

Much of the SAX parsing code I've written over the years has fallen into a pattern that Saxerator encapsulates: marshall a chunk of an XML document into an object, operate on that object, then move on to the next chunk. Saxerator alleviates the pain of marshalling and allows you to focus solely on operating on the document chunk.

Why not DOM parsing?

DOM parsers load the entire document into memory. Saxerator only holds a single chunk in memory at a time. If your document is very large, this can be an important consideration.

When I fetch a tag that has one or more elements, sometimes I get an Array, and other times I get a Hash or String. Is there a way I can treat these consistently?

You can treat objects consistently as arrays using Ruby's built-in array conversion method in the form Array(element_or_array)

Why Active Record fails when I'm passing String value to the query?

Saxerator doesn't return Array, Hash or String to you. But you can convert it to needed type by calling .to_<type> method as you usually do.

Contribution

For running tests for all parsers run rake spec:adapters

Acknowledgements

Saxerator was inspired by - but not affiliated with - nori and Gregory Brown's Practicing Ruby

Legal Stuff

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 119

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

soulcutter / Saxerator

Programming Languages

Labels

Projects that are alternatives of or similar to Saxerator

Saxerator

Installation

The DSL

Examples

Configuration

Known Issues

Other Documentation

FAQ

Contribution

Acknowledgements

Legal Stuff