All Projects → syntax-tree → xast

syntax-tree / xast

Licence: other
Extensible Abstract Syntax Tree

Projects that are alternatives of or similar to xast

Unified
☔️ interface for parsing, inspecting, transforming, and serializing content through syntax trees
Stars: ✭ 3,036 (+9387.5%)
Mutual labels:  ast, syntax-tree, unist
MarkdownSyntax
☄️ A Type-safe Markdown parser in Swift.
Stars: ✭ 65 (+103.13%)
Mutual labels:  ast, syntax-tree, unist
sast
Parse CSS, Sass, SCSS, and Less into a unist syntax tree
Stars: ✭ 51 (+59.38%)
Mutual labels:  ast, syntax-tree, unist
venusscript
A dynamic, interpreted, scripting language written in Java.
Stars: ✭ 17 (-46.87%)
Mutual labels:  ast, extensible
Preact Markup
⚡️ Render HTML5 as VDOM, with Components as Custom Elements!
Stars: ✭ 167 (+421.88%)
Mutual labels:  xml, markup
hast-util-from-dom
utility to transform a DOM tree to hast
Stars: ✭ 20 (-37.5%)
Mutual labels:  syntax-tree, unist
mdast-util-to-string
utility to get the plain text content of an mdast node
Stars: ✭ 27 (-15.62%)
Mutual labels:  syntax-tree, unist
nlcst-to-string
utility to transform an nlcst tree to a string
Stars: ✭ 16 (-50%)
Mutual labels:  syntax-tree, unist
unist-util-map
utility to create a new tree by mapping all nodes
Stars: ✭ 30 (-6.25%)
Mutual labels:  syntax-tree, unist
abstract-syntax-tree
A library for working with abstract syntax trees.
Stars: ✭ 77 (+140.63%)
Mutual labels:  ast, syntax-tree
c-compiler
A compiler that accepts any valid program written in C. It is made using Lex and Yacc. Returns a symbol table, parse tree, annotated syntax tree and intermediate code.
Stars: ✭ 37 (+15.63%)
Mutual labels:  ast, syntax-tree
astutils
Bare essentials for building abstract syntax trees, and skeleton classes for PLY lexers and parsers.
Stars: ✭ 13 (-59.37%)
Mutual labels:  ast, syntax-tree
Svgo
Go Language Library for SVG generation
Stars: ✭ 1,779 (+5459.38%)
Mutual labels:  xml, markup
Webmarkupmin
The Web Markup Minifier (abbreviated WebMarkupMin) - a .NET library that contains a set of markup minifiers. The objective of this project is to improve the performance of web applications by reducing the size of HTML, XHTML and XML code.
Stars: ✭ 312 (+875%)
Mutual labels:  xml, markup
Deck
Slide Decks
Stars: ✭ 261 (+715.63%)
Mutual labels:  xml, markup
unist-util-inspect
utility to inspect nodes
Stars: ✭ 16 (-50%)
Mutual labels:  syntax-tree, unist
Nlcst
Natural Language Concrete Syntax Tree format
Stars: ✭ 116 (+262.5%)
Mutual labels:  ast, syntax-tree
Escaya
An blazing fast 100% spec compliant, incremental javascript parser written in Typescript
Stars: ✭ 217 (+578.13%)
Mutual labels:  ast, syntax-tree
unist-builder
utility to create a new trees with a nice syntax
Stars: ✭ 52 (+62.5%)
Mutual labels:  syntax-tree, unist
hast-util-sanitize
utility to sanitize hast nodes
Stars: ✭ 34 (+6.25%)
Mutual labels:  syntax-tree, unist

xast

Extensible Abstract Syntax Tree format.


xast is a specification for representing XML as an abstract syntax tree. It implements the unist spec.

This document may not be released. See releases for released documents. The latest released version is 1.0.0.

Contents

Introduction

This document defines a format for representing XML as an abstract syntax tree. This specification is written in a Web IDL-like grammar. Development started in January 2020.

Where this specification fits

xast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

xast relates to JavaScript in that it has an ecosystem of utilities for working with compliant syntax trees in JavaScript. However, xast is not limited to JavaScript and can be used in other programming languages.

xast relates to the unified project in that xast syntax trees are used throughout its ecosystem.

Scope

xast represents XML syntax, not semantics: there are no namespaces or local names; only qualified names.

xast supports a sensible subset of XML by omitting the ostensibly bad DTD. XML processors are not guaranteed to process DTDs, making them unsafe.

xast represents expanded entities and therefore does not deal with entities or character references. It is suggested that utilities around xast, that parse or serialize, do not support parameter-entity references or entity references other than the predefined entities (&lt; for < U+003C LESS THAN; &gt; for > U+003E GREATER THAN; &amp; for & U+0026 AMPERSAND; &apos; for ' U+0027 APOSTROPHE; &quot; for " U+0022 QUOTATION MARK). This prevents billion laughs attacks.

Declarations

Declarations ([XML]) other than doctype have no representation in xast:

<!ELEMENT %name.para; %content.para;>
<!ATTLIST poem xml:space (default|preserve) 'preserve'>
<!ENTITY % ISOLat2 SYSTEM "http://www.xml.com/iso/isolat2-xml.entities">
<!ENTITY Pub-Status "This is a pre-release of the specification.">
<![%draft;[<!ELEMENT book (comments*, title, body, supplements?)>]]>
<![%final;[<!ELEMENT book (title, body, supplements?)>]]>
Internal subset

Internal document type declarations have no representation in xast:

<!DOCTYPE greeting [
  <!ELEMENT greeting (#PCDATA)>
]>
<greeting>Hello, world!</greeting>

Nodes

Parent

interface Parent <: UnistParent {
  children: [Element | Text | Comment | Doctype | Instruction | Cdata]
}

Parent (UnistParent) represents a node in xast containing other nodes (said to be children).

Its content is limited to only other xast content.

Literal

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents a node in xast containing a value.

Root

interface Root <: Parent {
  type: "root"
}

Root (Parent) represents a document fragment or a whole document.

Root should be used as the root of a tree and must not be used as a child.

XML specifies that documents should have exactly one element child, therefore a root should have exactly one element child when representing a whole document.

Element

interface Element <: Parent {
  type: "element"
  name: string
  attributes: Attributes?
  children: [Element | Text | Comment | Instruction | Cdata]
}

Element (Parent) represents an element ([XML]).

The name field must be present. It represents the element’s name ([XML]), specifically its qualified name ([XML-NAMES]).

The children field should be present.

The attributes field should be present. It represents information associated with the element. The value of the attributes field implements the Attributes interface.

For example, the following XML:

<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="id" />

Yields:

{
  type: 'element',
  name: 'package',
  attributes: {
    xmlns: 'http://www.idpf.org/2007/opf',
    'unique-identifier': 'id'
  },
  children: []
}

Attributes

interface Attributes {}

Attributes represents information associated with an element.

Every field must be a AttributeName and every value an AttributeValue.

AttributeName

typedef string AttributeName

Attribute names are keys on Attributes objects and must reflect XML attribute names exactly.

AttributeValue

typedef string AttributeValue

Attribute values are values on Attributes objects and must reflect XML attribute values exactly as a string.

In JSON, the value null must be treated as if the attribute was not included. In JavaScript, both null and undefined must be similarly ignored.

Text

interface Text <: Literal {
  type: "text"
}

Text (Literal) represents character data ([XML]).

For example, the following XML:

<dc:language>en</dc:language>

Yields:

{
  type: 'element',
  name: 'dc:language',
  attributes: {},
  children: [{type: 'text', value: 'en'}]
}

Comment

interface Comment <: Literal {
  type: "comment"
}

Comment (Literal) represents a comment ([XML]).

For example, the following XML:

<!--Charlie-->

Yields:

{type: 'comment', value: 'Charlie'}

Doctype

interface Doctype <: Node {
  type: "doctype"
  name: string
  public: string?
  system: string?
}

Doctype (Node) represents a doctype ([XML]).

A name field must be present.

A public field should be present. If present, it must be set to a string, and represents the document’s public identifier.

A system field should be present. If present, it must be set to a string, and represents the document’s system identifier.

For example, the following XML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

Yields:

{
  type: 'doctype',
  name: 'HTML',
  public: '-//W3C//DTD HTML 4.0 Transitional//EN',
  system: 'http://www.w3.org/TR/REC-html40/loose.dtd'
}

Instruction

interface Instruction <: Literal {
  type: "instruction"
  name: string
}

Instruction (Literal) represents a processing instruction ([XML]).

A name field must be present.

For example, the following XML:

<?xml version="1.0" encoding="UTF-8"?>

Yields:

{
  type: 'instruction',
  name: 'xml',
  value: 'version="1.0" encoding="UTF-8"'
}

Cdata

interface Cdata <: Literal {
  type: "cdata"
}

Cdata (Literal) represents a CDATA section ([XML]).

For example, the following XML:

<![CDATA[<greeting>Hello, world!</greeting>]]>

Yields:

{
  type: 'cdata',
  value: '<greeting>Hello, world!</greeting>'
}

Glossary

See the unist glossary.

List of utilities

See the unist list of utilities for more utilities.

References

Related

  • hast — Hypertext Abstract Syntax Tree format
  • mdast — Markdown Abstract Syntax Tree format
  • nlcst — Natural Language Concrete Syntax Tree format

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help. Ideas for new utilities and tools can be posted in syntax-tree/ideas.

A curated list of awesome syntax-tree, unist, hast, mdast, nlcst, and xast resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

License

CC-BY-4.0 © Titus Wormer

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].