All Projects → ForceBru → PyVM

ForceBru / PyVM

Licence: MIT license
A virtual machine written in Python that executes x86 binaries according to the Intel Software Developer Manual

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
assembly
5116 projects
Makefile
30231 projects

Projects that are alternatives of or similar to PyVM

MeetixOS
An hobby OS written in modern C++20 which aims to be Unix-like. Currently based on EvangelionNG, a GhostOS derived kernel
Stars: ✭ 179 (+123.75%)
Mutual labels:  x86
profiler-api
The portable version of JetBrains profiler API for .NET Framework / .NET Core / .NET / .NET Standard / Mono
Stars: ✭ 21 (-73.75%)
Mutual labels:  x86
Capstone.NET
.NET Core and .NET Framework binding for the Capstone Disassembly Framework
Stars: ✭ 108 (+35%)
Mutual labels:  x86
SDA
SDA is a rich cross-platform tool for reverse engineering that focused firstly on analysis of computer games. I'm trying to create a mix of the Ghidra, Cheat Engine and x64dbg. My tool will combine static and dynamic analysis of programs. Now SDA is being developed.
Stars: ✭ 98 (+22.5%)
Mutual labels:  x86
natick
natickOS - A minimal, lightweight, research Linux Distribution
Stars: ✭ 33 (-58.75%)
Mutual labels:  x86
third
Third, a small Forth compiler for 8086 DOS
Stars: ✭ 67 (-16.25%)
Mutual labels:  x86
scemu
x86 malware emulator
Stars: ✭ 150 (+87.5%)
Mutual labels:  x86
dflat20
D-Flat Windowing System (SAA/CUA Interface) Version 20
Stars: ✭ 42 (-47.5%)
Mutual labels:  x86
kora-kernel
Kernel for my operating system KoraOS
Stars: ✭ 15 (-81.25%)
Mutual labels:  x86
ShawnOS
A Basic x86 Operating System/Kernel
Stars: ✭ 39 (-51.25%)
Mutual labels:  x86
compv
Insanely fast Open Source Computer Vision library for ARM and x86 devices (Up to #50 times faster than OpenCV)
Stars: ✭ 155 (+93.75%)
Mutual labels:  x86
kar98k public
pwn & ctf tools for windows
Stars: ✭ 24 (-70%)
Mutual labels:  x86
rediserver
Pure Python Redis server implementation
Stars: ✭ 26 (-67.5%)
Mutual labels:  pure-python
bmod
bmod parses binaries for modification/patching and disassembles machine code sections.
Stars: ✭ 12 (-85%)
Mutual labels:  x86
APIInfo-Plugin-x86
APIInfo Plugin (x86) - A Plugin For x64dbg
Stars: ✭ 42 (-47.5%)
Mutual labels:  x86
asm2cfg
Python command-line tool and GDB extension to view and save x86, ARM and objdump assembly files as control-flow graph (CFG) pdf files
Stars: ✭ 42 (-47.5%)
Mutual labels:  x86
browser-vm
A small Linux x86 VM meant for use in the browser
Stars: ✭ 122 (+52.5%)
Mutual labels:  x86
async
async is a tiny C++ header-only high-performance library for async calls handled by a thread-pool, which is built on top of an unbounded MPMC lock-free queue.
Stars: ✭ 25 (-68.75%)
Mutual labels:  x86
dcc
Direct/Interactive C Compiler
Stars: ✭ 18 (-77.5%)
Mutual labels:  x86
FutureDOS
A futuristic DOS
Stars: ✭ 46 (-42.5%)
Mutual labels:  x86

PyVM - execute x86 bytecode in pure Python!

Build Status

PyVM executes x86 (IA-32) bytecode in pure Python, without any dependencies.

It can run multiple types of executables:

  • Raw bytecode (interprets bytes and bytearrays as bytecode)
  • Flat binaries (for example, those produced by default by NASM; interprets a file's contents as bytecode)
  • ELF binaries (any statically linked ELF binary)

Features:

  • x86 CPU (files: VM/Registers.py, VM/CPU.py, VM/fetchLoop.py, VM/misc.py)
    • General-purpose registers: 32-bit, 16-bit, 8-bit. See file #1;
    • Segment registers: ES, CS, SS, DS, FS, GS. See file #1;
    • EFLAGS register. See file #1;
    • Stack. See file #2;
    • The "fetch-decode-execute" cycle (instruction cycle). Includes prefixes handling. See file #3;
    • ModRM and SIB bytes parser. See file #4 ...which should be refactored;
  • x87 FPU (files: VM/FPU.py)
    • The extended precision floating-point type, a.k.a. binary80;
    • 8 data registers: ST(0), ST(1), ..., ST(7);
    • control, status and flag registers;
  • RAM (files: VM/Memory.py)
    • Allows to read 8-bit, 16-bit and 32-bit integers, 80-bit, 64-bit and 32-bit floating-point numbers and raw bytes at any valid address;
    • Provides basic bounds checking (so that Python doesn't segfault when a program tries to access an invalid address);
    • Provides basic segmented access via segment registers;
  • x86 instruction set (files: VM/instructions/*)
    • Bitwise operations: and, or, xor, test, neg, not, sal, sar, shl, shr, shld, shrd;
    • Control flow operations: nop, jmp, jcc, setcc, cmovcc, bt, int, call, ret, enter, leave, cpuid;
    • Floating-point operations: fld, fst, fstp, fist, fistp, fmul, fmulp, fimul, faddp, fdiv, fdivp, fucom, fucomp, fucompp, fcomi, fcomip, fucomip, fucomipp, fldcw, fstcw, fnstcw, fxch;
    • Integer arithmetic operations: add, sub, cmp, adc, sbb, inc, dec, mul, imul, div, idiv;
    • Memory management operations: mov, movs, movsx, movsxd, movzx, push, pop, lea, xchg, cmpxchg, cbw, cwde, cwd, cdq, cmc, clc, cld, stc, std, bsf, bsr;
    • Repeatable operations: stos.
  • Linux system calls (files: VM/kernel/kernel.py, VM/kernel/kernel_filesystem.py, VM/kernel/kernel_memory.py, VM/kernel/kernel_sys.py)
    • Syscall registration and execution. See file #1 and VM/__init__.py:VM.interrupt;
    • Input-output: sys_read, sys_write, sys_writev, sys_open, sys_close, sys_unlink, sys_llseek. See file #2;
    • Memory management: brk, sys_set_thread_area, sys_set_tid_address, mmap, munmap. See file #3;
    • System management: sys_exit, sys_exit_group, sys_clock_gettime, sys_ioctl, sys_newuname. See file #4.
  • A debugger that prints the instructions and syscalls that are being executed in a (relatively) human-readable format.
  • Ability to run binaries from command line (files: VM/__main__.py)
    1. Change directory to PyVM-master (or wherever you downloaded PyVM);
    2. Execute yor command (for example, ./C/real_life/nasm -h) like this: python3 -OO -m VM 'C/real_life/nasm -h'
    3. ...
    4. Profit!

How to use

Simple example:

import VM  # import the module

def parse_code(code: str) -> bytes:
    # This just converts the prettified code below to the raw, ugly bytecode. You can ignore this function.
    import re
    
    binary = ''
    regex = re.compile(r"[0-9a-f]+:\s+([^;]+)\s*;.*", re.DOTALL)

    for i, line in enumerate(code.strip().splitlines(keepends=False)):
        if line.startswith(';'):
            continue
        match = regex.match(line)
        assert match is not None, f"Could not parse code (line {i})"
        
        binary += match.group(1)

    return bytes.fromhex(binary)


if __name__ == "__main__":
    # This is the bytecode we'll run
    code = """
;                           section .text
;                           _start:
0:  b8 04 00 00 00          ;mov    eax,0x4   ; SYS_WRITE
5:  bb 01 00 00 00          ;mov    ebx,0x1   ; STDOUT
a:  b9 29 00 00 00          ;mov    ecx,0x29  ; address of the message
f:  ba 0e 00 00 00          ;mov    edx,0xe   ; length of the message
14: cd 80                   ;int    0x80      ; interrupt kernel
16: e9 02 00 00 00          ;jmp    0x1d      ; _exit
1b: 89 c8                   ;mov    eax,ecx   ; this is here to mess things up if JMP doesn't work
;                           _exit:
1d: b8 01 00 00 00          ;mov    eax,0x1   ; SYS_EXIT
22: bb 00 00 00 00          ;mov    ebx,0x0   ; EXIT_SUCCESS
27: cd 80                   ;int    0x80      ; interrupt kernel
; section .data
29: 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A ; "Hello, world!",10
             """

    vm = VM.VMKernel(500)  # Initialize the VM with the Linux kernel and give it 500 bytes of memory.

    # EXECUTE IT!
    vm.execute(
        VM.ExecutionStrategy.BYTES,  # We're executing raw bytecode
        parse_code(code)  # This is the actual bytecode
    )

Output:

Hello, world!
[!] Process exited with code 0

Please see example_BYTES.py, example_FLAT.py and example_ELF.py for more examples of usage.

Also see READMEs in other directories: for example, VM/instructions, VM/kernel and many more.

What this is

  • A toy emulator of the x86 CPU that actually works;
  • A small toy Linux Kernel written in Python that kinda works, but is mostly a giant stub;
  • A learning resource that shows how an Intel CPU may work;
    • The instructions and the basic architecture are implemented according to the Intel Software Developer Manual, but the algorithms for parsing opcodes, finding the appropriate instruction implementation for an opcode, parsing the ModRM and SIB bytes, accessing registers and memory are custom, because they aren't exactly specified in the Manual;
    • The implementation of instructions may be buggy, so take the code with a grain of salt.
  • A learning resource that shows how a teeny-tiny and buggy replica of the Linux kernel may work;
    • Individual implementations of system calls should have links to the docs that explain how a given syscall works;
    • Some of the syscalls were roughly translated from the C code of the Linux kernel.

What this is not

  • A fast emulator. It's actually super slow. I mean, this is pure Python, what did you expect?!
    • Although version 0.1-beta is almost two times faster than the commit 453fb47617f269fd8fa4ebe7c8cb28cc0611ede0 on master.
  • A fast (or properly implemented, or safe) Linux kernel.
    • At this point (version 0.1-beta) it's a huge stub that has a minimal set of syscalls that allow basic programs to work.
  • Something that's written by an expert in either CPUs or kernels. There are lots of TODOs and bugs.
    • If your program crashes, chances are that this is not you, but it's still possible to get a legitimate segmentation fault. Please open an issue about the crash.

Found a bug? Have an idea?

You're welcome to contribute! Open issues, pull requests, contact me via Twitter or Reddit. Learn more about the x86 architecture and the Linux kernel and have fun!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].