All Projects → syntax-tree → Hast

syntax-tree / Hast

Hypertext Abstract Syntax Tree format

Projects that are alternatives of or similar to Hast

Astviewer
Python Abstract Syntax Tree viewer in Qt
Stars: ✭ 101 (-70.64%)
Mutual labels:  ast, syntax-tree
Reshape
💠 transform html with javascript plugins
Stars: ✭ 314 (-8.72%)
Mutual labels:  ast, syntax-tree
Nlcst
Natural Language Concrete Syntax Tree format
Stars: ✭ 116 (-66.28%)
Mutual labels:  ast, syntax-tree
Metric Parser
📜 AST-based advanced mathematical parser written by Typescript.
Stars: ✭ 26 (-92.44%)
Mutual labels:  ast, syntax-tree
abstract-syntax-tree
A library for working with abstract syntax trees.
Stars: ✭ 77 (-77.62%)
Mutual labels:  ast, syntax-tree
Astexplorer.app
https://astexplorer.net with ES Modules support and Hot Reloading
Stars: ✭ 65 (-81.1%)
Mutual labels:  ast, syntax-tree
Unified
☔️ interface for parsing, inspecting, transforming, and serializing content through syntax trees
Stars: ✭ 3,036 (+782.56%)
Mutual labels:  ast, syntax-tree
Escaya
An blazing fast 100% spec compliant, incremental javascript parser written in Typescript
Stars: ✭ 217 (-36.92%)
Mutual labels:  ast, syntax-tree
c-compiler
A compiler that accepts any valid program written in C. It is made using Lex and Yacc. Returns a symbol table, parse tree, annotated syntax tree and intermediate code.
Stars: ✭ 37 (-89.24%)
Mutual labels:  ast, syntax-tree
MarkdownSyntax
☄️ A Type-safe Markdown parser in Swift.
Stars: ✭ 65 (-81.1%)
Mutual labels:  ast, syntax-tree
Mdast
Markdown Abstract Syntax Tree format
Stars: ✭ 493 (+43.31%)
Mutual labels:  ast, syntax-tree
xast
Extensible Abstract Syntax Tree
Stars: ✭ 32 (-90.7%)
Mutual labels:  ast, syntax-tree
Unist
Universal Syntax Tree used by @unifiedjs
Stars: ✭ 438 (+27.33%)
Mutual labels:  ast, syntax-tree
Libdparse
Library for lexing and parsing D source code
Stars: ✭ 91 (-73.55%)
Mutual labels:  ast, syntax-tree
Javaparser
Java 1-15 Parser and Abstract Syntax Tree for Java, including preview features to Java 13
Stars: ✭ 3,972 (+1054.65%)
Mutual labels:  ast, syntax-tree
sast
Parse CSS, Sass, SCSS, and Less into a unist syntax tree
Stars: ✭ 51 (-85.17%)
Mutual labels:  ast, syntax-tree
astutils
Bare essentials for building abstract syntax trees, and skeleton classes for PLY lexers and parsers.
Stars: ✭ 13 (-96.22%)
Mutual labels:  ast, syntax-tree
bright
Blazing fast parser for BrightScript that gives you ESTree like AST
Stars: ✭ 28 (-91.86%)
Mutual labels:  ast, syntax-tree
Decorator
Function decorators for Elixir
Stars: ✭ 278 (-19.19%)
Mutual labels:  ast
Awesome Graal
A curated list of awesome resources for Graal, GraalVM, Truffle and related topics
Stars: ✭ 302 (-12.21%)
Mutual labels:  ast

hast

Hypertext Abstract Syntax Tree format.


hast is a specification for representing HTML (and embedded SVG or MathML) as an abstract syntax tree. It implements the unist spec.

This document may not be released. See releases for released documents. The latest released version is 2.3.0.

Contents

Introduction

This document defines a format for representing hypertext as an abstract syntax tree. Development of hast started in April 2016 for rehype. This specification is written in a Web IDL-like grammar.

Where this specification fits

hast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

hast relates to ecosystem of utilities for working with compliant syntax trees in JavaScript. However, hast is not limited to JavaScript and can be used in other programming languages.

hast relates to the unified and rehype projects in that hast syntax trees are used throughout their ecosystems.

Virtual DOM

The reason for introducing a new “virtual” DOM is primarily:

  • The DOM is very heavy to implement outside of the browser, a lean and stripped down virtual DOM can be used everywhere
  • Most virtual DOMs do not focus on ease of use in transformations
  • Other virtual DOMs cannot represent the syntax of HTML in its entirety (think comments and document types)
  • Neither the DOM nor virtual DOMs focus on positional information

Nodes

Parent

interface Parent <: UnistParent {
  children: [Element | Doctype | Comment | Text]
}

Parent (UnistParent) represents a node in hast containing other nodes (said to be children).

Its content is limited to only other hast content.

Literal

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents a node in hast containing a value.

Root

interface Root <: Parent {
  type: "root"
}

Root (Parent) represents a document.

Root can be used as the root of a tree, or as a value of the content field on a 'template' Element, never as a child.

Element

interface Element <: Parent {
  type: "element"
  tagName: string
  properties: Properties?
  content: Root?
  children: [Element | Comment | Text]
}

Element (Parent) represents an Element ([DOM]).

A tagName field must be present. It represents the element’s local name ([DOM]).

The properties field represents information associated with the element. The value of the properties field implements the Properties interface.

If the tagName field is 'template', a content field can be present. The value of the content field implements the Root interface.

If the tagName field is 'template', the element must be a leaf.

If the tagName field is 'noscript', its children should be represented as if scripting is disabled ([HTML]).

For example, the following HTML:

<a href="https://alpha.com" class="bravo" download></a>

Yields:

{
  type: 'element',
  tagName: 'a',
  properties: {
    href: 'https://alpha.com',
    className: ['bravo'],
    download: true
  },
  children: []
}

Properties

interface Properties {}

Properties represents information associated with an element.

Every field must be a PropertyName and every value a PropertyValue.

PropertyName

typedef string PropertyName

Property names are keys on Properties objects and reflect HTML, SVG, ARIA, XML, XMLNS, or XLink attribute names. Often, they have the same value as the corresponding attribute (for example, id is a property name reflecting the id attribute name), but there are some notable differences.

These rules aren’t simple. Use hastscript (or property-information directly) to help.

The following rules are used to transform HTML attribute names to property names. These rules are based on how ARIA is reflected in the DOM ([ARIA]), and differs from how some (older) HTML attributes are reflected in the DOM.

  1. Any name referencing a combinations of multiple words (such as “stroke miter limit”) becomes a camelcased property name capitalizing each word boundary. This includes combinations that are sometimes written as several words. For example, stroke-miterlimit becomes strokeMiterLimit, autocorrect becomes autoCorrect, and allowfullscreen becomes allowFullScreen.
  2. Any name that can be hyphenated, becomes a camelcased property name capitalizing each boundary. For example, “read-only” becomes readOnly.
  3. Compound words that are not used with spaces or hyphens are treated as a normal word and the previous rules apply. For example, “placeholder”, “strikethrough”, and “playback” stay the same.
  4. Acronyms in names are treated as a normal word and the previous rules apply. For example, itemid become itemId and bgcolor becomes bgColor.
Exceptions

Some jargon is seen as one word even though it may not be seen as such by dictionaries. For example, nohref becomes noHref, playsinline becomes playsInline, and accept-charset becomes acceptCharset.

The HTML attributes class and for respectively become className and htmlFor in alignment with the DOM. No other attributes gain different names as properties, other than a change in casing.

Notes

property-information lists all property names.

The property name rules differ from how HTML is reflected in the DOM for the following attributes:

View list of differences
  • charoff becomes charOff (not chOff)
  • char stays char (does not become ch)
  • rel stays rel (does not become relList)
  • checked stays checked (does not become defaultChecked)
  • muted stays muted (does not become defaultMuted)
  • value stays value (does not become defaultValue)
  • selected stays selected (does not become defaultSelected)
  • allowfullscreen becomes allowFullScreen (not allowFullscreen)
  • hreflang becomes hrefLang, not hreflang
  • autoplay becomes autoPlay, not autoplay
  • autocomplete becomes autoComplete (not autocomplete)
  • autofocus becomes autoFocus, not autofocus
  • enctype becomes encType, not enctype
  • formenctype becomes formEncType (not formEnctype)
  • vspace becomes vSpace, not vspace
  • hspace becomes hSpace, not hspace
  • lowsrc becomes lowSrc, not lowsrc

PropertyValue

typedef any PropertyValue

Property values should reflect the data type determined by their property name. For example, the HTML <div hidden></div> has a hidden attribute, which is reflected as a hidden property name set to the property value true, and <input minlength="5">, which has a minlength attribute, is reflected as a minLength property name set to the property value 5.

In JSON, the value null must be treated as if the property was not included. In JavaScript, both null and undefined must be similarly ignored.

The DOM has strict rules on how it coerces HTML to expected values, whereas hast is more lenient in how it reflects the source. Where the DOM treats <div hidden="no"></div> as having a value of true and <img width="yes"> as having a value of 0, these should be reflected as 'no' and 'yes', respectively, in hast.

The reason for this is to allow plugins and utilities to inspect these non-standard values.

The DOM also specifies comma separated and space separated lists attribute values. In hast, these should be treated as ordered lists. For example, <div class="alpha bravo"></div> is represented as ['alpha', 'bravo'].

There’s no special format for the property value of the style property name.

Doctype

interface Doctype <: Node {
  type: "doctype"
  name: string
  public: string?
  system: string?
}

Doctype (Node) represents a DocumentType ([DOM]).

A name field must be present.

A public field can be present. If present, it must be set to a string, and represents the document’s public identifier.

A system field can be present. If system, it must be set to a string, and represents the document’s system identifier.

For example, the following HTML:

<!doctype html>

Yields:

{
  type: 'doctype',
  name: 'html',
  public: null,
  system: null
}

Comment

interface Comment <: Literal {
  type: "comment"
}

Comment (Literal) represents a Comment ([DOM]).

For example, the following HTML:

<!--Charlie-->

Yields:

{type: 'comment', value: 'Charlie'}

Text

interface Text <: Literal {
  type: "text"
}

Text (Literal) represents a Text ([DOM]).

For example, the following HTML:

<span>Foxtrot</span>

Yields:

{
  type: 'element',
  tagName: 'span',
  properties: {},
  children: [{type: 'text', value: 'Foxtrot'}]
}

Glossary

See the unist glossary.

List of utilities

See the unist list of utilities for more utilities.

Related HTML utilities

References

Security

As hast represents HTML, and improper use of HTML can open you up to a cross-site scripting (XSS) attack, improper use of hast is also unsafe. Always be careful with user input and use hast-util-santize to make the hast tree safe.

Related

  • mdast — Markdown Abstract Syntax Tree format
  • nlcst — Natural Language Concrete Syntax Tree format
  • xast — Extensible Abstract Syntax Tree

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help. Ideas for new utilities and tools can be posted in syntax-tree/ideas.

A curated list of awesome syntax-tree, unist, mdast, hast, xast, and nlcst resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!

Thanks to @andrewburgess, @arobase-che, @arystan-sw, @BarryThePenguin, @brechtcs, @ChristianMurphy, @ChristopherBiscardi, @craftzdog, @cupojoe, @davidtheclark, @derhuerst, @detj, @DxCx, @erquhart, @flurmbo, @Hamms, @Hypercubed, @inklesspen, @jeffal, @jlevy, @Justineo, @lfittl, @kgryte, @kmck, @kthjm, @KyleAMathews, @macklinu, @medfreeman, @Murderlon, @nevik, @nokome, @phiresky, @revolunet, @rhysd, @Rokt33r, @rubys, @s1n, @Sarah-Seo, @sethvincent, @simov, @s1n, @StarpTech, @stefanprobst, @stuff, @subhero24, @tripodsan, @tunnckoCore, @vhf, @voischev, and @zjaml, for contributing to hast and related projects!

License

CC-BY-4.0 © Titus Wormer

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].