All Projects → tolitius → xml-in

tolitius / xml-in

Licence: other
your friendly XML navigator

Programming Languages

clojure
4091 projects

Labels

Projects that are alternatives of or similar to xml-in

xast
Extensible Abstract Syntax Tree
Stars: ✭ 32 (-47.54%)
Mutual labels:  xml
learn-xquery
A list of great articles, blog posts, and books for learning XQuery
Stars: ✭ 33 (-45.9%)
Mutual labels:  xml
foliapy
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Stars: ✭ 13 (-78.69%)
Mutual labels:  xml
medialytics
A basic, free tool that shows information about Plex Media Server content
Stars: ✭ 31 (-49.18%)
Mutual labels:  xml
xrechnung-schematron
Schematron rules for the German CIUS (XRechnung) of EN16931:2017
Stars: ✭ 19 (-68.85%)
Mutual labels:  xml
xspec
XSpec is a unit test and behaviour-driven development (BDD) framework for XSLT, XQuery, and Schematron.
Stars: ✭ 91 (+49.18%)
Mutual labels:  xml
fiet
Fiết is a RSS feed parser in Elixir, which focuses on extensibility, speed, and standard compliance
Stars: ✭ 23 (-62.3%)
Mutual labels:  xml
cti-stix-elevator
OASIS Cyber Threat Intelligence (CTI) TC Open Repository: Convert STIX 1.2 XML to STIX 2.x JSON
Stars: ✭ 42 (-31.15%)
Mutual labels:  xml
Server-Help
💻 This VSTO Add-In allows the user to ping a list of servers and creates a file for Microsoft Remote Desktop Manager an Excel table. This is used for quickly determining which servers are offline in a list. It is written in 3 different versions as a VSTO Add-In in C# and VB.NET as well as a VBA Add-In.
Stars: ✭ 21 (-65.57%)
Mutual labels:  xml
pdf2xml-viewer
A simple viewer and inspection tool for text boxes in PDF documents
Stars: ✭ 82 (+34.43%)
Mutual labels:  xml
SAF-T-AO
Official XSD from the Government of Angola for use in SAF-T AO
Stars: ✭ 42 (-31.15%)
Mutual labels:  xml
xml2json
Xml To Json
Stars: ✭ 32 (-47.54%)
Mutual labels:  xml
ph-commons
Java 1.8+ Library with tons of utility classes required in all projects
Stars: ✭ 23 (-62.3%)
Mutual labels:  xml
asl
A C++ cross-platform library including JSON, XML, HTTP, Sockets, WebSockets, threads, processes, logs, file system, CSV, INI files, etc.
Stars: ✭ 44 (-27.87%)
Mutual labels:  xml
saxophone
Fast and lightweight event-driven streaming XML parser in pure JavaScript
Stars: ✭ 29 (-52.46%)
Mutual labels:  xml
onixcheck
ONIX validation library and commandline tool
Stars: ✭ 20 (-67.21%)
Mutual labels:  xml
GDX-HTML
using HTML + CSS + JS to build libGDX UI!
Stars: ✭ 21 (-65.57%)
Mutual labels:  xml
SitemapTools
A sitemap (sitemap.xml) querying and parsing library for .NET
Stars: ✭ 19 (-68.85%)
Mutual labels:  xml
FigmaConvertXib
FigmaConvertXib is a tool for exporting design elements from figma.com and generating files to a projects iOS .xib / Android .xml
Stars: ✭ 111 (+81.97%)
Mutual labels:  xml
escpos-xml
JavaScript library that implements the thermal printer ESC / POS protocol and provides an XML interface for preparing templates for printing.
Stars: ✭ 37 (-39.34%)
Mutual labels:  xml

xml-in

your friendly XML navigator

Clojars Project

What

XML is this new hot markup language everyone is raving about. Attributes, namespaces, schemas, security, XSL.. what's there not to love.

xml-in is not about parsing XML, but rather working with already parsed XML.

It takes heavily nested {:tag .. :attrs .. :content [...]} structures that Clojure XML parsers produce and helps to navigate these structures in a Clojure "get-in style" using internal and custom transducers.

  • clojure/data.xml is an example of a good and lazy Clojure/ClojureScript XML parser
  • funcool/tubax is another example of a ClojureScript XML parser

Why

XML navigation in Clojure is usually done with help of zippers. clojure/data.zip is usially used, and a common navigation looks like this:

(data.zip/xml1-> (clojure.zip/xml-zip parsed-xml)
                 :universe
                 :system
                 :delta-orionis
                 :δ-ori-aa1
                 :radius
                 data.zip/text)

There is a great article "XML for fun and profit" that shows how zippers are used to navigate XML DOM trees.

But we can do better: faster, cleaner, composable and "no zippers".

How much faster? Let's see:

zippers:

=> (time (dotimes [_ 250000] 
           (data.zip/xml1-> (clojure.zip/xml-zip parsed-xml)
                            :universe
                            :system
                            :delta-orionis
                            :δ-ori-aa1
                            :radius
                            data.zip/text)))
"Elapsed time: 13385.563442 msecs"

xml-in:

=> (time (dotimes [_ 250000]
     (xml/find-first parsed-xml [:universe
                                 :system
                                 :delta-orionis
                                 :δ-ori-aa1
                                 :radius])))
"Elapsed time: 765.884111 msecs"

Property based navigation

Here is an XML document all the examples in this documentation are based on:

<?xml version="1.0" encoding="UTF-8"?>
<universe>
  <system>
    <solar>
      <planet age="4.543" inhabitable="true">Earth</planet>
      <planet age="4.503">Mars</planet>
    </solar>
    <delta-orionis>
      <constellation>Orion</constellation>
      <δ-ori-aa1>
        <mass>24</mass>
        <radius>16.5</radius>
        <luminosity>190000</luminosity>
        <surface-gravity>3.37</surface-gravity>
        <temperature>29500</temperature>
        <rotational-velocity>130</rotational-velocity>
      </δ-ori-aa1>
    </delta-orionis>
  </system>
</universe>

it lives in dev-resources/universe.xml

Since xml-in works with a parsed XML (e.g. a DOM tree), let's parse it once and call it the "universe":

=> (require '[clojure.data.xml :as dx])
=> (def universe (dx/parse-str (slurp "dev-resources/universe.xml")))
#'boot.user/universe

it gets parsed into a common nested {:tag :attrs :content} structure that looks like this:

=> (pprint universe)
{:tag :universe,
 :attrs {},
 :content
 ("\n  "
  {:tag :system,
   :attrs {},
   :content
   ("\n    "
    {:tag :solar,
     :attrs {},
     :content
     ("\n      "
      {:tag :planet,
      ;; ...
      ;; ...

One way to access child nodes in this XML document is to use "a vector of nested properties".

For example, let's check out "those two" planets in a solar system.

Bringing xml-in in:

=> (require '[xml-in.core :as xml])

and

=> (xml/find-all universe [:universe :system :solar :planet])
("Earth" "Mars")

All the planets are returned. In case we need "a" planet we can match the first one and stop searching:

=> (xml/find-first universe [:universe :system :solar :planet])
("Earth")

notice find-all vs. find-first

All matching vs. The first matching

Even if there is only one element that matches a search criteria it is best not to look for it using find-all since there is a cost of looking at all the child nodes that are on the same level as a matched element.

Let's look at the example. From the XML above, let's find a radius of δ-ori-aa1 component of the delta-orionis star system:

=> (xml/find-all universe [:universe :system :delta-orionis :δ-ori-aa1 :radius])
("16.5")
=> (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :radius])
("16.5")

Both find-all and find-first return the same exact value, but we know for a fact that the δ-ori-aa1 component has only one radius. Which means it is best found with find-first rather than find-all.

Let's see the performance difference:

=> (time (dotimes [_ 250000]
           (xml/find-all universe [:universe :system :delta-orionis :δ-ori-aa1 :radius])))
"Elapsed time: 1216.927309 msecs"
=> (time (dotimes [_ 250000]
           (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :radius])))
"Elapsed time: 792.958283 msecs"

Quite a difference. The secret is quite simple: find-first stops searching once it finds a matching element. But it does improve performance, especially for a large number of XML documents.

NOTE: find-first returns a "seq", and not just a "single" value, so it can be composed as described in Creating sub documents

Functional navigation

Navigation using functions, or rather transducers, adds custom "predicate batteries" to the process.

A few internal batteries are included in xml-in:

=> (require '[xml-in.core :as xml :refer [tag= some-tag= attr=]])
  • tag= finds child nodes under all matched tags

  • some-tag= finds child nodes under the first matching tag

  • attr= finds child nodes under all tags with attribute's key and value

Let's find all inhabitable planets of the solar system to the best of our knowledge (i.e. based on the XML above):

=> (xml/find-in universe [(tag= :universe)
                          (tag= :system)
                          (some-tag= :solar)
                          (attr= :inhabitable "true")])
("Earth")

a find-in function takes a parsed XML and a sequence of transducers and computes a sequence from the application of all the transducers composed

Since find-in does not need to create transducers like find-all and find-first it is a bit more performant:

=> (time (dotimes [_ 250000] (xml/find-first universe [:universe :system :solar :planet])))
"Elapsed time: 507.325005 msecs"

vs.

=> (time (dotimes [_ 250000] (xml/find-in universe [(some-tag= :universe)
                                                    (some-tag= :system)
                                                    (some-tag= :solar)
                                                    (some-tag= :planet)])))
"Elapsed time: 467.535705 msecs"

Creating sub documents

Let's say we need to get several properties out of the δ-ori-aa1 component. We can do it as:

=> (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :mass])
("24")
=> (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :radius])
("16.5")
=> (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :surface-gravity])
("3.37")

we can of course group [:mass :radius :surface-gravity] together and map over them to call xml/find-first universe with a prefix, but it would not change the fact that we would need to "get-into" ":universe :system :delta-orionis :δ-ori-aa1" on every property lookup.

We can do better: navigate to :universe :system :delta-orionis :δ-ori-aa1 once and treat is as a document instead:

=> (def aa1 (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1]))
#'boot.user/aa1
=> (xml/find-first aa1 [:mass])
("24")
=> (xml/find-first aa1 [:radius])
("16.5")
=> (xml/find-first aa1 [:surface-gravity])
("3.37")

to create a sub document no special syntax is needed, just search "upto" the new root element.

and in cases where it is applicable, using a sub document is a bit faster:

=> (time (dotimes [_ 100000]
           [(xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :mass])
            (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :radius])
            (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1 :surface-gravity])]))

"Elapsed time: 973.376399 msecs"

vs.

=> (time (dotimes [_ 100000]
           (let [aa1 (xml/find-first universe [:universe :system :delta-orionis :δ-ori-aa1])]
             [(xml/find-first aa1 [:mass])
              (xml/find-first aa1 [:radius])
              (xml/find-first aa1 [:surface-gravity])])))

"Elapsed time: 760.332762 msecs"

License

Copyright © 2019 tolitius

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].