All Projects → vshaxe → hxparser

vshaxe / hxparser

Licence: MIT license
OCaml/menhir implementation of a new Haxe parser.

Programming Languages

ocaml
1615 projects

Projects that are alternatives of or similar to hxparser

re-typescript
An opinionated attempt at finally solving typescript interop for ReasonML / OCaml.
Stars: ✭ 68 (+353.33%)
Mutual labels:  menhir
socc
Simple C Compiler in OCaml
Stars: ✭ 41 (+173.33%)
Mutual labels:  menhir
punchscript
A programming language made up of Rajinikanth punch dialogues
Stars: ✭ 17 (+13.33%)
Mutual labels:  menhir

Build Status

Dependencies

  • oasis (for compilation)
  • menhir
  • sedlex
  • js_of_ocaml (only for generating .js)

Preparation

opam install oasis
opam install sedlex
opam install menhir

Compilation

oasis setup
make build

JS compilation

Run sh configure --enable-js, then compile as usual.

Usage

The most common command is

hxparser --json --print-reject YourFile.hx

You can also specify a directory, in which case all .hx file are recursively parsed.

Questions

Why?

We are slowly heading towards Haxe 4, so the idea came up to rewrite the parser in a more maintainable way with a strong focus on IDE support. This project is the beginning of said undertaking and, if successful, will be integrated as the official Haxe parser.

Performance?

Performance is not that great at the moment, roughly 50% slower than the current Haxe parser. We are going to investigate and improve on this.

I'm a retarded Windows user, how do I opam?

Check out our Building on Windows (mingw) instructions for Haxe. You might have to run the opam install commands in a cygwin window, but everything else should work fine.

hxparse, haxeparser, hxparser... Are you serious?

Sorry!

What's the difference to the current Haxe parser?

The current parser is a recursive-descent parser which is implemented using the camlp4o extension to OCaml. It has display support built-in, which is convenient for simple cases but not very flexible overall.

The new parser uses a yacc-like grammar definition file, which is much more concise. Parser resuming is to be built into the parser loop instead, which allows for a better separation and more flexibility.

Furthermore, it uses sedlex which supports Unicode lexers.

Any problems?

While developing this parser, the author came to the realization that the Haxe grammar is in the LL(2) category due to a specific construct:

{
	if (cond)
		e1;
	else
		e3;
}

After parsing e1, the current parser looks ahead for 2 tokens to see if there's ; else coming up. Menhir supports LR(1) grammars, which required a very ugly workaround in the parser loop.

Another problem is that Haxe allows some optional ; if they are preceded by a } token:

{
	var a = { }; // optional ;
	b;
}

This seems to be tricky to express in the grammar because if the ; is omitted, the } serves double-purpose: As termination token of the object declaration (or whatever is to be closed) and the block-element.

It remains to be seen if these are actual issues or if the author is just incompetent.

Any good news?

It successfully parses my GitHub directory recursively and doesn't use much memory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].