All Projects → stephenrkell → Libcrunch

stephenrkell / Libcrunch

Runtime + instrumentation adding precise, accurate and fash(-ish) dynamic checking to C (bounds errors, type confusion, ...).

Programming Languages

ocaml
1615 projects

libcrunch is a system for fast dynamic type and bounds checking in unsafe languages -- currently C, although languages are fairly pluggable in the design.

It is somewhat inaccurately named, in that it is nowadays both a runtime library and some toolchain extensions (compiler wrapper, linker plugin, auxiliary tools).

"Dynamic type checking" mostly means checking pointer casts. There is limited checking of other things like va_arg and union use; more to add in due course.

Bounds checking means probably what you think it means. The innovation of libcrunch is to do fine-grained bounds checking (sensitive to subobjects, such as arrays-in-structs), over all allocators (static, stack, heap and custom), with very few false positives. The key to doing this is run-time type information and a run-time model of allocators. Currently, bounds checking performs about the same as ASan, but does finer-grained checking. It's also comparable to SoftBound, but doesn't suffer the kind of false positives that fat-pointer systems do when they lose track of bounds.

I have some plans for temporal checking too, including a garbage collector (which masks errors) and a mostly-timely checker (which catches errors), but nothing concrete yet.

The medium-term goal is a proof-of-concept implementation of C that is dynamically safe... and runs most source code unmodified, is binary-compatible even with uninstrumented code (albeit sacrificing safety guarantees), and performs usably well (hopefully no worse than half native speed, usually better).

To get good performance, I have some plans for exploiting hardware assistance (various kinds of tagged memory that are springing up) and also speculative/dynamic optimisations. Again, nothing concrete yet (but feel free to ask).

All this is built on top of my other project, liballocs, which you should build (and probably understand) first. In a nutshell, liballocs provides the type information and other dynamic run-time services; its goal is "Smalltalk-style dynamism for Unix processes".

Building is non-trivial... but you can do it! Overall, the build looks something like this.

$ git clone https://github.com/stephenrkell/liballocs.git $ cat liballocs/README (and follow those instructions, then...) $ export LIBALLOCS=pwd/liballocs $ git clone https://github.com/stephenrkell/libcrunch.git $ cd libcrunch $ make -jn # for your favourite n $ make -C test # if this succeeds, be amazed $ frontend/c/bin/crunchcc -o hello /path/to/hello.c # your code here $ LD_PRELOAD=pwd/lib/libcrunch_preload.so ./hello # marvel!

Tips for non-Debian or non-jessie users:

  • You must have Dave Anderson's (ex-SGI) libdwarf, not elfutils's (libdw1) version. The libdwarfpp build will, by default, look for its dwarf.h and libdwarf.h in /usr/include. If this libdwarf's headers are not in /usr/include (some distros put them in /usr/include/libdwarf instead), set LIBDWARFPP_CONFIGURE_FLAGS to "--with-libdwarf-includes=/path/to/includes" so that liballocs's contrib build process will configure libdwarfpp appropriately.

  • Some problems have been reported with gcc 5.x and later. See gcc bug 78407. For now the recommended gcc is the 4.9 series, although 7.2.x fixes that bug and seems to work. Bug reports for build errors occurring on other versions are welcome.

  • Be careful of build skew with libelf. Again, there are two versions: libelf0 and libelf1. It doesn't much matter which you use, but you should use the same at all times.

  • On *BSD: you must first install g++, and build boost 1.55 from source using it. Add the relevant prefix to CFLAGS, CXXFLAGS and LDFLAGS. This is for library/symbol reasons not compiler reasons: mixing libstdc++ and libc++ in one process doesn't work, and libc++fileno doesn't work with libc++ at present (relevant feature request: a fileno() overload for ofstream/ifstream objects). Note that currently, the liballocs runtime doesn't build or run on the BSDs; however, the tools should do.

  • Changes with cxxabi: again, build skew with these can be problematic, especially if you're relying on a system-supplied build of some C++ library such as libboost* -- since it needn't be built using the same ABI that your currently-installed C++ compiler is using. If you get link errors with C++ symbol names, chances are you have a mismatch of ABI. This is another reason to use g++ 4.9.x for everything (including your own build of boost, as appropriate), since it predates the new cxxabi.

Liballocs models programs during execution in terms of /typed allocations/. It reifies data types, providing fast access to per-allocation metadata.

Libcrunch extends this with check functions, thereby allowing assertions such as

assert(__is_aU(p, &__uniqtype_Widget));

to assert that p points to a Widget, and so on.

For bounds errors, libcrunch instruments /pointer derivation/. This includes array indexing and pointer arithmetic, but not pointer dereference which can safely proceed unchecked. Bad pointer uses are caught and reported in a segfault handler.

A compiler wrapper inserts these checks (and some others) automatically at particular points. The effect is to provide clean error messages on bad pointer casts, bad pointer uses and other operations that would otherwise be corrupting failure (undefined behaviour, in C). Language-wise, libcrunch slightly narrows standard C, such that all live, allocated storage has a well-defined type at any moment (cf. C99 "effective type" which is more liberal). This can be a source of false positives in the quirkiest code; there are some mitigations.

Instrumentation is currently done with CIL. There is also a clang front-end which is less mature (lacks a bounds checker) and currently rather out-of-date, but will be revived at some point.

Type-checking usually only slows execution by about 5--35%. You can also run type-check-instrumented code without the library loaded; in that case the slowdown is usually minimal (a few percent at most).

Usability quirks

  • requires manual identification of alloc functions (or rather, liballocs does)

  • check-on-cast is too eager for some C programming styles ("trap pointer" mechanism for casts in the works; bounds checks already work this way)

  • higher-order (indirect, pointer-to-function) checks are slightly conservative (i.e. a few false positives are possible in these cases)

  • plain crunchcc assumes memory-correct execution and checks only types (use crunchxcc for bounds checking too; temporal correctness is assumed, i.e. use-after-free can break us)

Limitations of metadata

  • no metadata (debug info) for actual parameters passed in varargs (need to maintain a shadow stack for this; am working on it)

  • no metadata (debug info) for address-taken temporaries (significant for C++, but not for C; needs compiler fixes)

  • sizeof scraping is not completely reliable (but is pretty good!)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].