All Projects → ghaiklor → llvm-kaleidoscope

ghaiklor / llvm-kaleidoscope

Licence: other
LLVM Tutorial: Kaleidoscope (Implementing a Language with LLVM)

Programming Languages

C++
36643 projects - #6 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to llvm-kaleidoscope

dumb-obfuscator
Tutorial on how to write the dumbest obfuscator I could think of.
Stars: ✭ 147 (+18.55%)
Mutual labels:  llvm, llvm-tutorial
llvm-semantics
Formal semantics of LLVM IR in K
Stars: ✭ 42 (-66.13%)
Mutual labels:  llvm, llvm-ir
LLAST
A high level LLVM IR AST provider for GraphEngine JIT.
Stars: ✭ 21 (-83.06%)
Mutual labels:  llvm, llvm-ir
bl
Simple imperative programming language created for fun.
Stars: ✭ 57 (-54.03%)
Mutual labels:  llvm, llvm-ir
alexa
A Lexical Analyzer Generator
Stars: ✭ 54 (-56.45%)
Mutual labels:  lexer, lexical-analysis
Mcsema
Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
Stars: ✭ 2,198 (+1672.58%)
Mutual labels:  llvm, llvm-ir
compiler lab
Some toy labs for compiler course
Stars: ✭ 49 (-60.48%)
Mutual labels:  llvm, lexer
SwiLex
A universal lexer library in Swift.
Stars: ✭ 29 (-76.61%)
Mutual labels:  lexer, lexical-analysis
TinyCompiler
c compiler based on flex(lex), bison(yacc) and LLVM, supports LLVM IR and obj code generation. 基于flex,bison以及LLVM,使用c++11实现的类C语法编译器, 支持生成中间代码及可执行文件.
Stars: ✭ 162 (+30.65%)
Mutual labels:  llvm, llvm-ir
doc
Design documents related to the decompilation pipeline.
Stars: ✭ 23 (-81.45%)
Mutual labels:  llvm, llvm-ir
llvm-brainfuck
Brainfuck compiler based on LLVM API
Stars: ✭ 27 (-78.23%)
Mutual labels:  llvm, llvm-ir
ugo-compiler-book
📚 µGo语言实现(从头开发一个迷你Go语言编译器)[Go版本+Rust版本]
Stars: ✭ 996 (+703.23%)
Mutual labels:  llvm, lexer
pascal-interpreter
A simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-83.06%)
Mutual labels:  lexer, lexical-analysis
ugo
µGo编程语言(从头开发一个迷你Go语言编译器)
Stars: ✭ 38 (-69.35%)
Mutual labels:  llvm, lexer
compiler
Implementing a complete Compiler for a simple C-like language using the C-tools Flex and Bison
Stars: ✭ 106 (-14.52%)
Mutual labels:  lexer, lexical-analysis
LLVM-Metadata-Visualizer
LLVM Metadata Visualizer
Stars: ✭ 20 (-83.87%)
Mutual labels:  llvm, llvm-ir
llvm-hs-typed
Type Safe LLVM IR ( Experimental )
Stars: ✭ 47 (-62.1%)
Mutual labels:  llvm, llvm-ir
cere
CERE: Codelet Extractor and REplayer
Stars: ✭ 27 (-78.23%)
Mutual labels:  llvm
js-ziju
Compile javascript to LLVM IR, x86 assembly and self interpreting
Stars: ✭ 112 (-9.68%)
Mutual labels:  llvm
llvm-statepoint-utils
Runtime support for LLVM's GC Statepoints
Stars: ✭ 35 (-71.77%)
Mutual labels:  llvm

Kaleidoscope: Implementing a Language with LLVM

How to build it

On macOS (tested on 10.11.6).

# Install llvm (version 4.0, though @3.9 also works if you modify the llvm path in the Makefile)
brew install llvm@4
make
./main
# This should bring up a simple repl

Why?

Self-education...

I'm interested in LLVM and want to try simple things with it. That's why I've started official LLVM tutorial - Kaleidoscope.

What's all about?

This tutorial runs through the implementation of a simple language, showing how fun and easy it can be. This tutorial will get you up and started as well as help to build a framework you can extend to other languages. The code in this tutorial can also be used as a playground to hack on other LLVM specific things.

The goal of this tutorial is to progressively unveil our language, describing how it is built up over time. This will let us cover a fairly broad range of language design and LLVM-specific usage issues, showing and explaining the code for it all along the way, without overwhelming you with tons of details up front.

It is useful to point out ahead of time that this tutorial is really about teaching compiler techniques and LLVM specifically, not about teaching modern and sane software engineering principles. In practice, this means that we’ll take a number of shortcuts to simplify the exposition. For example, the code uses global variables all over the place, doesn’t use nice design patterns like visitors, etc... but it is very simple. If you dig in and use the code as a basis for future projects, fixing these deficiencies shouldn’t be hard.

How it works all together?

Lexer

The first thing here is a lexer. Lexer is responsible for getting a stream of chars and translating it into a groups of tokens.

A lexer is a software program that performs lexical analysis. Lexical analysis is the process of separating a stream of characters into different words, which in computer science we call 'tokens'.

Tokens identifiers are stored under lexer/token.h file and lexer implementation under lexer/lexer.cpp file.

Tokens are just an enum structure, which consists of token identifier and a number assigned to this token. This way, we can identify tokens through lexical analysis.

The actual reading of a stream is implemented in lexer/lexer.cpp file. Function gettok reads characters one-by-one from stdin and groups them in tokens. So, basically, gettok function reads characters and returns numbers (tokens).

Further, we can use these tokens in parser (semantic analysis).

AST (Abstract Syntax Tree)

Though, before diving into the parser, we need to implement AST nodes, that we can use during parsing.

Basic block of each AST node is ExprAST node, which is stored under ast/ExprAST.h file. All other nodes are extends from ExprAST node.

Each of AST nodes must implement one method - codegen(). codegen() method is responsible for generating LLVM IR, using LLVM IRBuilder API, that's all.

As you can see in ast folder, we have implemented the following AST nodes with appropriate code generation into LLVM IR:

  • Binary Expressions;
  • Call Expressions;
  • Function Expressions;
  • Number Expressions;
  • Prototype Expressions;
  • Variable Expressions;

Each of these nodes have a constructor where all mandatory values are initialized. Based on that information, codegen() can build LLVM IR, usine these values.

The simplest one, i.e. is Number Expression. codegen() for number expression just calls appropriate method in LLVM IR Builder:

llvm::Value *NumberExprAST::codegen() {
  return llvm::ConstantFP::get(TheContext, llvm::APFloat(Val));
}

Now, we have two parts of a compiler which we can combine.

Parser

Parser is where lexer and AST are combined together. The actual implementation of a parser stores into parser/parser.cpp file.

Parser uses lexer for getting a stream of tokens, which are used for building an AST, using our AST implementation.

So, in general, when parser sees a known token, i.e. number token, it tries to create a NumberExprAST node.

When parsing is done, got the last character/token from the stream, we have an AST representation of our code. We can use it and generate LLVM IR from our AST using codegen() method in each AST node.

This process is done in main.cpp file. main.cpp file is the place where all the parts are combined in one place.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].