All Projects → syntax-tree → Nlcst

syntax-tree / Nlcst

Natural Language Concrete Syntax Tree format

Projects that are alternatives of or similar to Nlcst

abstract-syntax-tree
A library for working with abstract syntax trees.
Stars: ✭ 77 (-33.62%)
Mutual labels:  ast, syntax-tree
Reshape
💠 transform html with javascript plugins
Stars: ✭ 314 (+170.69%)
Mutual labels:  ast, syntax-tree
astutils
Bare essentials for building abstract syntax trees, and skeleton classes for PLY lexers and parsers.
Stars: ✭ 13 (-88.79%)
Mutual labels:  ast, syntax-tree
MarkdownSyntax
☄️ A Type-safe Markdown parser in Swift.
Stars: ✭ 65 (-43.97%)
Mutual labels:  ast, syntax-tree
Mdast
Markdown Abstract Syntax Tree format
Stars: ✭ 493 (+325%)
Mutual labels:  ast, syntax-tree
c-compiler
A compiler that accepts any valid program written in C. It is made using Lex and Yacc. Returns a symbol table, parse tree, annotated syntax tree and intermediate code.
Stars: ✭ 37 (-68.1%)
Mutual labels:  ast, syntax-tree
bright
Blazing fast parser for BrightScript that gives you ESTree like AST
Stars: ✭ 28 (-75.86%)
Mutual labels:  ast, syntax-tree
Retext
natural language processor powered by plugins part of the @unifiedjs collective
Stars: ✭ 2,119 (+1726.72%)
Mutual labels:  ast, natural-language
Unist
Universal Syntax Tree used by @unifiedjs
Stars: ✭ 438 (+277.59%)
Mutual labels:  ast, syntax-tree
Javaparser
Java 1-15 Parser and Abstract Syntax Tree for Java, including preview features to Java 13
Stars: ✭ 3,972 (+3324.14%)
Mutual labels:  ast, syntax-tree
sast
Parse CSS, Sass, SCSS, and Less into a unist syntax tree
Stars: ✭ 51 (-56.03%)
Mutual labels:  ast, syntax-tree
Astexplorer.app
https://astexplorer.net with ES Modules support and Hot Reloading
Stars: ✭ 65 (-43.97%)
Mutual labels:  ast, syntax-tree
Unified
☔️ interface for parsing, inspecting, transforming, and serializing content through syntax trees
Stars: ✭ 3,036 (+2517.24%)
Mutual labels:  ast, syntax-tree
Astviewer
Python Abstract Syntax Tree viewer in Qt
Stars: ✭ 101 (-12.93%)
Mutual labels:  ast, syntax-tree
Escaya
An blazing fast 100% spec compliant, incremental javascript parser written in Typescript
Stars: ✭ 217 (+87.07%)
Mutual labels:  ast, syntax-tree
xast
Extensible Abstract Syntax Tree
Stars: ✭ 32 (-72.41%)
Mutual labels:  ast, syntax-tree
Hast
Hypertext Abstract Syntax Tree format
Stars: ✭ 344 (+196.55%)
Mutual labels:  ast, syntax-tree
Metric Parser
📜 AST-based advanced mathematical parser written by Typescript.
Stars: ✭ 26 (-77.59%)
Mutual labels:  ast, syntax-tree
Libdparse
Library for lexing and parsing D source code
Stars: ✭ 91 (-21.55%)
Mutual labels:  ast, syntax-tree
Cppinsights
C++ Insights - See your source code with the eyes of a compiler
Stars: ✭ 1,382 (+1091.38%)
Mutual labels:  ast

nlcst

Natural Language Concrete Syntax Tree format.


nlcst is a specification for representing natural language in a syntax tree. It implements the unist spec.

This document may not be released. See releases for released documents. The latest released version is 1.0.2.

Table of Contents

Introduction

This document defines a format for representing natural language as a concrete syntax tree. Development of nlcst started in May 2014, in the now deprecated textom project for retext, before unist existed. This specification is written in a Web IDL-like grammar.

Where this specification fits

nlcst extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

nlcst relates to JavaScript in that it has an ecosystem of utilities for working with compliant syntax trees in JavaScript. However, nlcst is not limited to JavaScript and can be used in other programming languages.

nlcst relates to the unified and retext projects in that nlcst syntax trees are used throughout their ecosystems.

Nodes

Parent

interface Parent <: UnistParent {
  children: [Paragraph | Sentence | Word | Symbol | Punctuation | WhiteSpace | Source]
}

Parent (UnistParent) represents a node in nlcst containing other nodes (said to be children).

Its content is limited to only other nlcst content.

Literal

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents a node in nlcst containing a value.

Its value field is a string.

Root

interface Root <: Parent {
  type: "RootNode"
}

Root (Parent) represents a document.

Root can be used as the root of a tree, never as a child. Its content model is not limited, it can contain any nlcst content, with the restriction that all content must be of the same category.

Paragraph

interface Paragraph <: Parent {
  type: "ParagraphNode"
  children: [Sentence | WhiteSpace | Source]
}

Paragraph (Parent) represents a unit of discourse dealing with a particular point or idea.

Paragraph can be used in a root node. It can contain sentence, whitespace, and source nodes.

Sentence

interface Sentence <: Parent {
  type: "SentenceNode"
  children: [Word | Symbol | Punctuation | WhiteSpace | Source]
}

Sentence (Parent) represents grouping of grammatically linked words, that in principle tells a complete thought, although it may make little sense taken in isolation out of context.

Sentence can be used in a paragraph node. It can contain word, symbol, punctuation, whitespace, and source nodes.

Word

interface Word <: Parent {
  type: "WordNode"
  children: [Text | Symbol | Punctuation | Source]
}

Word (Parent) represents the smallest element that may be uttered in isolation with semantic or pragmatic content.

Word can be used in a sentence node. It can contain text, symbol, punctuation, and source nodes.

Symbol

interface Symbol <: Literal {
  type: "SymbolNode"
}

Symbol (Literal) represents typographical devices different from characters which represent sounds (like letters and numerals), white space, or punctuation.

Symbol can be used in sentence or word nodes.

Punctuation

interface Punctuation <: Literal {
  type: "PunctuationNode"
}

Punctuation (Literal) represents typographical devices which aid understanding and correct reading of other grammatical units.

Punctuation can be used in sentence or word nodes.

WhiteSpace

interface WhiteSpace <: Literal {
  type: "WhiteSpaceNode"
}

WhiteSpace (Literal) represents typographical devices devoid of content, separating other units.

WhiteSpace can be used in root, paragraph, or sentence nodes.

Source

interface Source <: Literal {
  type: "SourceNode"
}

Source (Literal) represents an external (ungrammatical) value embedded into a grammatical unit: a hyperlink, code, and such.

Source can be used in root, paragraph, sentence, or word nodes.

Text

interface Text <: Literal {
  type: "TextNode"
}

Text (Literal) represents actual content in nlcst documents: one or more characters.

Text can be used in word nodes.

Glossary

See the unist glossary.

List of utilities

See the unist list of utilities for more utilities.

Related

  • mdast — Markdown Abstract Syntax Tree format
  • hast — Hypertext Abstract Syntax Tree format
  • xast — Extensible Abstract Syntax Tree

References

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help. Ideas for new utilities and tools can be posted in syntax-tree/ideas.

A curated list of awesome syntax-tree, unist, mdast, hast, xast, and nlcst resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

Thanks to @nwtn, @tmcw, @muraken720, and @dozoisch for contributing to nlcst and related projects!

License

CC-BY-4.0 © Titus Wormer

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].