All Projects → utybo → Lixy

utybo / Lixy

Licence: Apache-2.0 license
A Kotlin lexer framework with an easy-to-use DSL

Programming Languages

kotlin
9241 projects

Labels

Projects that are alternatives of or similar to Lixy

SwiLex
A universal lexer library in Swift.
Stars: ✭ 29 (-23.68%)
Mutual labels:  lexer
FLexer
Simple Lexer and Parser in F#
Stars: ✭ 22 (-42.11%)
Mutual labels:  lexer
ocean
Programming language that compiles into a x86 ELF executable.
Stars: ✭ 164 (+331.58%)
Mutual labels:  lexer
lexer
Hackable Lexer with UTF-8 support
Stars: ✭ 19 (-50%)
Mutual labels:  lexer
intellij-cue
IntelliJ support for the CUE language.
Stars: ✭ 23 (-39.47%)
Mutual labels:  lexer
bshift
Compiler for a language called bshift
Stars: ✭ 15 (-60.53%)
Mutual labels:  lexer
snapdragon-lexer
Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Stars: ✭ 19 (-50%)
Mutual labels:  lexer
socc
Simple C Compiler in OCaml
Stars: ✭ 41 (+7.89%)
Mutual labels:  lexer
ta-rust
A mirror for the textadept module ta-rust hosted in bitbucket
Stars: ✭ 21 (-44.74%)
Mutual labels:  lexer
parle
Parser and lexer for PHP
Stars: ✭ 68 (+78.95%)
Mutual labels:  lexer
yara-parser
Tools for parsing rulesets using the exact grammar as YARA. Written in Go.
Stars: ✭ 69 (+81.58%)
Mutual labels:  lexer
lex
Lex is an implementation of lex tool in Ruby.
Stars: ✭ 49 (+28.95%)
Mutual labels:  lexer
fayrant-lang
Simple, interpreted, dynamically-typed programming language
Stars: ✭ 30 (-21.05%)
Mutual labels:  lexer
compiler lab
Some toy labs for compiler course
Stars: ✭ 49 (+28.95%)
Mutual labels:  lexer
malluscript
A simple,gentle,humble scripting language for mallus, based on malayalam memes.
Stars: ✭ 112 (+194.74%)
Mutual labels:  lexer
sb-dynlex
Configurable lexer for PHP featuring a fluid API.
Stars: ✭ 27 (-28.95%)
Mutual labels:  lexer
compiler
Implementing a complete Compiler for a simple C-like language using the C-tools Flex and Bison
Stars: ✭ 106 (+178.95%)
Mutual labels:  lexer
ugo-compiler-book
📚 µGo语言实现(从头开发一个迷你Go语言编译器)[Go版本+Rust版本]
Stars: ✭ 996 (+2521.05%)
Mutual labels:  lexer
MonkeyLang.jl
"Writing an Interpreter in GO" and "Writing a Compiler in GO" in Julia.
Stars: ✭ 30 (-21.05%)
Mutual labels:  lexer
JavaScript-compiler
编程语言的本质:语言只是一串字符,我们认为它是什么,它就可以是什么
Stars: ✭ 51 (+34.21%)
Mutual labels:  lexer

Shinx Lixy, the lexer with a beautiful Kotlin DSL

CHANGELOG | DOCUMENTATION

Actions Status JVM Actions Status JS Code Climate coverage Code Climate maintainability Made with Kotlin

Experimental Release You can get the latest release/commit on JitPack.

What is Lixy?

Lixy is a "lexer" framework, and is a Kotlin Multi-platform Project. It is a library that allows you to turn a string into a sequence of tokens using rules that you define using a Kotlin DSL.

This lexical analysis is typically the first step when making a compiler of any kind.

A lexer will only get you so far. The next step in the compilation is parsing, which Pangoro can help you with if you are using Lixy!

A Kotlin DSL?

You will notice when looking at examples that Lixy uses a specific syntax that might not look like real code at first. It is entirely valid Kotlin code! Lixy uses a kind of "domain-specific language": a language within a language in this case, specifically made to create lexers.

Experimental!

Be careful, Lixy is still in an experimental stage! The API may (and will) break constantly until a 0.1 version comes out. Lixy is already fairly usable, and most functions and classes are already documented using KDoc, but there are no user guides currently available. The best way to learn is by looking at the tests (src/main/test/kotlin/guru/zoroark/lixy).

Example

This simple example shows you what can be done using a single state.

// We need this so we can use e.g. DOT instead of MyTokenTypes.DOT
import MyTokenTypes.* 

enum class MyTokenTypes : LixyTokenType {
    DOT, WORD, WHITESPACE
}

val lexer = lixy {
    state {
        "." isToken DOT
        anyOf(" ", "\n", "\t") isToken WHITESPACE
        matches("[A-Za-z]+") isToken WORD
    }
}

val tokens = lexer.tokenize("Hello Kotlin.\n")
/* 
 * tokens = [
 *      ("Hello", 0, 5, WORD), 
 *      (" ", 5, 6, WHITESPACE), 
 *      ("Kotlin", 6, 11, WORD),
 *      (".", 11, 12, DOT),
 *      ("\n", 12, 13, WHITESPACE)
 *  ]
 */

This is fine, but we can do much more using multiple states, for example, a string detector that differentiates string content from content that is not from inside a string.

import TokenTypes.*
import Labels.*

enum class TokenTypes : LixyTokenType {
    WORD, STRING_CONTENT, QUOTES, WHITESPACE
}

enum class Labels : LixyStateLabel {
    IN_STRING
}

val lexer = lixy {
    default state {
        " " isToken WHITESPACE
        matches("[a-zA-Z]+") isToken WORD
        "\"" isToken QUOTES thenState IN_STRING
    }
    IN_STRING state {
        // triple quotes to make it a raw string, so that we don't need to
        // escape everything
        matches("""(\\"|[^"])+""") isToken STRING_CONTENT
        "\"" isToken QUOTES thenState default
    }
}

val tokens = """Hello "Kotlin \"fans\"!" Hi"""
/* 
 * tokens = [
 *      (Hello, 0, 5, WORD), 
 *      ( , 5, 6, WHITESPACE), 
 *      (", 6, 7, QUOTES),
 *      (Kotlin \"fans\"!, 7, 23, STRING_CONTENT),
 *      (", 23, 24, QUOTES),
 *      ( , 24, 25, WHITESPACE),
 *      (Hi, 25, 27, WORD)
 *  ]
 */

There are a lot of possibilities!

Getting Lixy

You can get the following artifacts from Jitpack:

  • Kotlin/JVM: guru.zoroark.lixy:lixy-jvm:version
  • Kotlin/JS: guru.zoroark.lixy:lixy-js:version
  • Kotlin MPP: guru.zoroark.lixy:lixy:version

Zoroark

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].