kelthuzadx / Yvm
Labels
Projects that are alternatives of or similar to Yvm
This is a homemade Java virtual machine written in c++, it supports most Java language features and includes a mark-sweep-based concurrent garbage collector. The main components of this VM are conforming to Java Virtual Machine Specification 8. It is runnable and various language features will add into this VM progressively. I don't have enough time to write a full coverage unit tests to ensure that all aspects of yvm work well, so if you find any bugs, you can open an Issue or fix up in place and pull request directly.
Available language features
Advanced language features will support later, you can also PR to contribute your awesome code.
- Java arithmetic, flow control, object-oriented programming(virtual method, inherit,etc.)
- Runtime type identification
- String concatenation
- Exception handling
- Async native threads
- Synchronized block with object lock
- Garbage Collection(With mark-and-sweep policy)
Build and run
- Prerequisite
-
Boost(>=1.65) Please set Boost root directory in
CMakeLists.txt
manually if automatic cmake detecting failed - C++14
- gcc/msvc/mingw
-
Boost(>=1.65) Please set Boost root directory in
- Stereotype
$ cd yvm
$ cmake .
$ make -j4
$ make test
$ ./yvm --help
Usage:
--help List help documentations and usages.
--runtime arg Attach java runtime libraries where yvm would lookup
classes at
--run arg Program which would be executed soon
You must specify the "runtime" flag to tell yvm where it could find jdk classes, and also program name is required.
$ ./yvm --runtime=C:\Users\Cthulhu\Desktop\yvm\bytecode ydk.test.QuickSort
Running snapshots
- helloworld
- quick sort
- print stack trace when exception occurred
- native multithreading
- multithreading with synchronized(){}
- Garbage Collection
Developing and hacking
1. From bytecode to an object
MethodArea
used to handle a complete lifecycle of JavaClass, its APIs are self-explanatory:
class MethodArea {
public:
// Pass runtime libraries paths to tell virutal machine searches
// where to lookup dependent classes
MethodArea(const vector<string>& libPaths);
~MethodArea();
// check whether it already exists or absents
JavaClass* findJavaClass(const string& jcName);
// load class which specified by jcName
bool loadJavaClass(const string& jcName);
// remove class which specified by jcNameοΌUsed for gc onlyοΌ
bool removeJavaClass(const string& jcName);
// link class which specified by jcNameοΌinitialize its fields
void linkJavaClass(const string& jcName);
// initialize class specified by jcNameοΌcall the static{} block
void initJavaClass(Interpreter& exec, const string& jcName);
public:
//auxiliary functions
JavaClass* loadClassIfAbsent(const string& jcName);
void linkClassIfAbsent(const string& jcName);
void initClassIfAbsent(Interpreter& exec, const string& jcName);
}
For example, we have a bytecode file named Test.class
οΌit would be available for jvm only if the following steps finishedοΌ
Test.class[in the disk]
-> loadJavaClass("Test.class")[in memory]
-> linkJavaClass("Test.class")
->initJavaClass("Test.class")
Now we can create corresponding objects as soon as above steps accomplishedοΌ
// yrt is a global runtime variableοΌma stands for MethodArea module,jheap stands for JavaHeap module
JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
2.1 Inside the object
jvm stack only holds basic numeric data and object/array reference, which we call the JObject/JArray, they have the following structure:
struct JObject {
std::size_t offset = 0;
const JavaClass* jc{};
};
offset
stands for an objectοΌall operations of object in heap required this offset
γjc
references to the JavaClassγ
Every object in heap constructed with <offset, fields> pair
[1] -> [field_a, field_b, field_c]
[2] -> []
[3] -> [field_a,field_b]
[4] -> [field_a]
[..] -> [...]
If we get the object's offset, we can do anything of that indirectly.
Array is almost the same as object, it has a length field instead of jc since it's unnecessary for array to hold a meta class reference.
struct JArray {
int length = 0;
std::size_t offset = 0;
};
[1] -> <3, [field_a, field_b, field_c]>
[2] -> <0, []>
[3] -> <2, [field_a,field_b]>
[4] -> <1, [field_a]>
[..] -> <..,[...]>
2.2 From object creation to extinction
As above mentioned, a JObject holdsoffset
and jc
. MethodArea
has responsible to manage JavaClass
which referenced by jc
, another offset
field referenced to JObject
, which in control of JavaHeap
. JavaHeap
provides a large number of self-explanatory APIs:
class JavaHeap {
public:
// create and object/array
JObject* createObject(const JavaClass& javaClass);
JArray* createObjectArray(const JavaClass& jc, int length);
// get/set field
auto getFieldByName(const JavaClass* jc, const string& name,
const string& descriptor, JObject* object);
void putFieldByName(const JavaClass* jc, const string& name,
const string& descriptor, JObject* object,
JType* value);
// get/set specific element in the array
void putElement(const JArray& array, size_t index, JType* value);
auto getElement(const JArray& array, size_t index);
// remove an array/object from heap
void removeArray(size_t offset;
void removeObject(size_t offset);
};
Back to the above example again, assume its corresponding Java class structure is as follows:
public class Test{
public int k;
private String hello;
}
In the first step, we've already got testClass
, now we can do more things via it:
const JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
// get the field hello
JObject* helloField = yrt.jheap->getFieldByName(testClass,"hello","Ljava/lang/String;",testInstance);
//set the field k
yrt.jheap->putFieldByName(testClass,"k","I",testInstance);
β . About JDK
Any java virtual machines can not run a Java program without Java libraries. As you may know, some opcodes like ldc
,monitorenter/monitorexit
,athrow
are internally requiring our virtual machine to operate JDK classes(java.lang.Class
,java.lang.String
,java.lang.Throwable
,etc). Hence, I have to rewrite some JDK classes for building a runnable VM , because original JDK classes are so complicated that it's inconvenient for early developing.
Rewrote JDK classes are as follows:
java.lang.String
java.lang.StringBuilder
java.lang.Throwable
java.lang.Math(::random())
java.lang.Runnable
java.lang.Thread
II. Structure of source code
[email protected]:~/yvm/src$ tree .
.
βββ classfile
β βββ AccessFlag.h # Access flag of class, method, field
β βββ ClassFile.h # Corresponding structures for .class file
β βββ FileReader.h # Read .class file
βββ gc
β βββ Concurrent.cpp # Concurrency utilities
β βββ Concurrent.hpp
β βββ GC.cpp # Garbage collector
β βββ GC.h
βββ interpreter
β βββ CallSite.cpp # Call site to denote a concrete calling
β βββ CallSite.h
β βββ Internal.h # Types that widely used within internal vm
β βββ Interpreter.cpp # Interprete opcode
β βββ Interpreter.hpp
β βββ MethodResolve.cpp # Resolve calling memthod
β βββ MethodResolve.h
βββ misc
β βββ Debug.cpp # Debuggin utilities
β βββ Debug.h
β βββ NativeMethod.cpp # Implementations of java native methods
β βββ NativeMethod.h
β βββ Option.h # VM arguments and options
β βββ Utils.cpp # Tools and utilities
β βββ Utils.h
βββ runtime
β βββ JavaClass.cpp # Representation of java class
β βββ JavaClass.h
β βββ JavaException.cpp # Exception handling
β βββ JavaException.h
β βββ JavaFrame.cpp # Runtime frame
β βββ JavaFrame.hpp
β βββ JavaHeap.cpp # Runtime heap, used to manage objects and arrays
β βββ JavaHeap.hpp
β βββ JavaType.h # Java primitive types and reference type definitions
β βββ MethodArea.cpp # Method area has responsible to manage JavaClass objects
β βββ MethodArea.h
β βββ ObjectMonitor.cpp # synchronized syntax implementation
β βββ ObjectMonitor.h
β βββ RuntimeEnv.cpp # Definitions of runtime structures
β βββ RuntimeEnv.h
βββ vm
βββ Main.cpp # Parse command line arguments
βββ YVM.cpp # Abstraction of virtual machine
βββ YVM.h
6 directories, 39 files
For more development documentations, see its Wiki or source code comments(recommend), which contains various contents with regard to its structures, usages, and design principles, etc.
License
Code licensed under the MIT License.