Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → artagnon → Rhine

artagnon / Rhine

Licence: mit

🔬 a C++ compiler middle-end, using an LLVM backend

Labels

compiler llvm

Projects that are alternatives of or similar to Rhine

Kai

An expressive low level programming language

Stars: ✭ 68 (-56.69%)

Mutual labels: compiler, llvm

Enzyme.jl

Julia bindings for the Enzyme automatic differentiator

Stars: ✭ 90 (-42.68%)

Mutual labels: compiler, llvm

Seeless

C IDE for iOS

Stars: ✭ 71 (-54.78%)

Mutual labels: compiler, llvm

Unlisp Llvm

Compiler for a toy Lisp language

Stars: ✭ 33 (-78.98%)

Mutual labels: compiler, llvm

Brain

An esoteric programming language compiler on top of LLVM based on Brainfuck

Stars: ✭ 112 (-28.66%)

Mutual labels: compiler, llvm

Llvm Tutorial Standalone

DEPRECATED (Use: https://github.com/llvm-hs/llvm-hs-kaleidoscope )

Stars: ✭ 38 (-75.8%)

Mutual labels: compiler, llvm

Ghdl

VHDL 2008/93/87 simulator

Stars: ✭ 1,285 (+718.47%)

Mutual labels: compiler, llvm

Ldc

The LLVM-based D Compiler.

Stars: ✭ 937 (+496.82%)

Mutual labels: compiler, llvm

Fanx

A portable programming language

Stars: ✭ 101 (-35.67%)

Mutual labels: compiler, llvm

Faust

Functional programming language for signal processing and sound synthesis

Stars: ✭ 1,360 (+766.24%)

Mutual labels: compiler, llvm

Zion

A statically-typed strictly-evaluated garbage-collected readable programming language.

Stars: ✭ 33 (-78.98%)

Mutual labels: compiler, llvm

Hikari

LLVM Obfuscator

Stars: ✭ 1,585 (+909.55%)

Mutual labels: compiler, llvm

Lyca

programming language compiler w/ llvm

Stars: ✭ 9 (-94.27%)

Mutual labels: compiler, llvm

Leekscript V2

A dynamically typed, compiled just-in-time programming language used in Leek Wars' AIs

Stars: ✭ 46 (-70.7%)

Mutual labels: compiler, llvm

Cfl

a Compileable statically typed Functional programming Language

Stars: ✭ 7 (-95.54%)

Mutual labels: compiler, llvm

Akilang

A compiler for a simple language, built with Python and LLVM

Stars: ✭ 71 (-54.78%)

Mutual labels: compiler, llvm

Numba

NumPy aware dynamic Python compiler using LLVM

Stars: ✭ 7,090 (+4415.92%)

Mutual labels: compiler, llvm

Grin

GRIN is a compiler back-end for lazy and strict functional languages with whole program optimization support.

Stars: ✭ 834 (+431.21%)

Mutual labels: compiler, llvm

Numba Scipy

numba_scipy extends Numba to make it aware of SciPy

Stars: ✭ 98 (-37.58%)

Mutual labels: compiler, llvm

Flax

general purpose programming language, in the vein of C++

Stars: ✭ 111 (-29.3%)

Mutual labels: compiler, llvm

View All Similar Projects ➔

rhine: a C++ compiler middle-end for a typed ruby

rhine is designed to be a fast language utilizing the LLVM JIT featuring N-d tensors, first-class functions, and type inference; specifying argument types is enough. It has a full blown AST into which it embeds a UseDef graph.

rhine started off as rhine-ml, and rhine-ml was called rhine earlier.

Effort put into rhine-ml: 2 months
Effort put into rhine: 1 year, 1 month

Language Features

def bar(arithFn Function(Int -> Int -> Int)) do
  println $ arithFn 2 4
end
def addCandidate(alpha Int, beta Int) do
  ret $ alpha + beta
end
def subCandidate(gamma Int, delta Int) do
  ret $ gamma - delta
end
def main() do
  if false do
    bar addCandidate
  else
    bar subCandidate
  end
  mu = {{2}, {3}}
  println mu[1][0]
end

Int is a type annotation, and only argument types need to be annotated, return type is inferred. Function(Int -> Int -> Int) is a function that takes two integers and returns one integer, mixing in some Haskell syntax. $ is again from Haskell, which is basically like putting the RHS in parens.

rhine-ml, in contrast, has arrays, first-class functions, closures, variadic arguments, macros. It's also much less buggy.

The recursive-descent parser

rhine uses a handwritten recursive-descent parser, which is faster and reports better errors, than the former Bison one. You will need to use a one-token lookahead atleast, if you want to keep the code simple. This gives you one level of:

parseSymbol(); // Oops, the lexed token indicates that we're not in the right
               // function

parseInstruction(); // Ask it to use an existing token, not lex a new one

Another minor consideration is that newlines must be handled explicitly if you want to substitute ; with a newline in the language.

void Parser::getTok() {
  LastTok = CurTok;
  CurTok = Driver->Lexx->lex(&CurSema, &CurLoc);
  LastTokWasNewlineTerminated = false;
  while (CurTok == NEWLINE) {
    LastTokWasNewlineTerminated = true;
    CurTok = Driver->Lexx->lex(&CurSema, &CurLoc);
  }
}

The AST

The AST is heavily inspired by LLVM IR, although it has some higher-level concepts like Tensor. It's an SSA and has a UseDef graph embedded in it, making analysis and transformation easy.

The main classes are Type and Value. All types like IntType, FloatType inherit from Type, most of the others inherit from Value. A BasicBlock is a Value, and so is ConstantInt.

A BasicBlock is a vector of Instruction, and this is how the AST is an SSA: assignments are handled as a StoreInst; there is no real LHS, just RHS references.

StoreInst::StoreInst(Value *MallocedValue, Value *NewValue);

UseDef in AST

Value is uniquified using LLVM's FoldingSet, and Use wraps it, so we can replace one Value with another.

/// A Use is basically a linked list of Value wrappers
class Use {
  Value *Val;
  Use *Prev;
  Use *Next;
   // Laid out in memory as [User] - [Use1] - [Use2]. Use2 has DistToUser 2
  unsigned DistToUser;
};

An Instruction is a User. User and its Use values are laid out sequentially in memory, so it's possible to reach all the Use values from the User. It's also possible to reach the User from any Use, using DistToUser.

class User : public Value {
protected:
  unsigned NumOperands;
};
class Instruction : User;

The User has a custom new to allocate memory for the Use instances as well.

  void *User::operator new(size_t Size, unsigned Us) {
    void *Storage = ::operator new (Us * sizeof(Use) + Size);
    auto Start = static_cast<Use *>(Storage);
    auto End = Start + Us;
    for (unsigned Iter = 0; Iter < Us; Iter++) {
      new (Start + Iter) Use(Us - Iter);
    }
    auto Obj = reinterpret_cast<User *>(End);
    return Obj;
  }
};

The Context

The Context is a somewhat large object that keeps the uniqified Type and Value instances. It also keeps track of Externals, the external C functions that are provided as part of a "standard library". Unique llvm::Builder and llvm::Context objects, as well as the DiagnosticPrinter are exposed member variables. Finally, it is necessary for symbol resolution, and keeps the ResolutionMap.

Symbol resolution

src/Transform/Resolve is an example of something that utilizes the UseDef embedded in the AST.

  B = A + 2

creates one UnresolvedValue, A, an AddInst, and a MallocInst, which takes the string "B" and AddInst as operands.

The transform basically goes over all the Instruction in the BasicBlock, resolves UnresolvedValue instances, and sets the Use to the resolved value. It hence replaces the Value underneath the Use, and since the Instruction is referencing Use instances, there are no dangling references.

if (auto S = K->Map.get(V, Block)) {
  /// %S = 2;
  ///  ^
  /// Came from here (MallocInst, Argument, or Prototype)
  ///
  /// Foo(%S);
  ///      ^
  ///  UnresolvedValue; replace with %Replacement
  if (auto M = dyn_cast<MallocInst>(S)) {
    if (dyn_cast<StoreInst>(U->getUser()))
      U.set(M);
  }
}

Type Inference

Type Inference is too simple. One visit function is overloaded for all possible Value classes.

Type *TypeInfer::visit(MallocInst *V) {
  V->setType(visit(V->getVal()));
  assert(!V->isUnTyped() && "unable to type infer MallocInst");
  return VoidType::get(K);
}

Building

The desired directory structure is:

bin/ ; if you downloaded the tarball for this
    cmake
    ninja
    flex
src/
    rhine/
            README.md
            llvm/ ; git submodule update --init to get the sources
            llvm-build/
                        bin/
                            llvm-config ; you need to call this to build
            rhine-build/
                        rhine ; the executable

On an OSX where you have everything:

$ brew install flex
$ brew link --force flex
$ git submodule update --init
$ cd llvm-build
# rhine is buggy; without debugging symbols, you can't report a useful bug
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm
$ export PATH=`pwd`/bin:$PATH
$ cd ../rhine-build
$ cmake -GNinja ..
# this will run the packages unittests, which should all pass
$ ninja check

On a Linux where you have nothing (and no root privileges are required):

Get git-lfs, and fetch cmake-ninja-flex.tar.bz2

$ git lfs fetch

Untar it and set up environment variables.

$ tar xf cmake-ninja-flex.tar.bz2
$ cd cmake-ninja-flex

# for bash/zsh
$ export TOOLS_ROOT=`pwd`
$ export PATH=$TOOLS_ROOT:$PATH
# for csh
$ setenv TOOLS_ROOT `pwd`
$ setenv PATH $TOOLS_ROOT:$PATH

Then,

$ git submodule update --init
$ cd llvm-build
# rhine is buggy; without debugging symbols, you can't report a useful bug
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm
$ ninja
$ export PATH=`pwd`/bin:$PATH
$ cd ../rhine-build
# flex isn't picked up from $PATH
$ cmake -GNinja -DTOOLS_ROOT=$TOOLS_ROOT -DFLEX_EXECUTABLE=$TOOLS_ROOT/flex ..
# if there are build (usually link) errors, please open an issue
# tests are currently failing on Linux, need to look into it
$ ninja check

Commentary

An inefficient untyped language is easy to implement. println taking 23 and "twenty three" as arguments is a simple matter of switching on type-when-unboxed. There's no need to rewrite the value in IR, and certainly no need to come up with an overloading scheme.

Crystal made a good decision to start with Ruby. If your idea is to self-host, then the original language's efficiency does not matter. All you need is good generated assembly (which LLVM makes easy).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 157

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗