All Projects → seahorn → Sea Dsa

seahorn / Sea Dsa

Licence: other
A new context, field, and array-sensitive heap analysis for LLVM bitcode based on DSA.

Projects that are alternatives of or similar to Sea Dsa

Seahorn
SeaHorn Verification Framework
Stars: ✭ 270 (+200%)
Mutual labels:  static-analysis, llvm, verification
Clang Power Tools
Bringing clang-tidy magic to Visual Studio C++ developers.
Stars: ✭ 285 (+216.67%)
Mutual labels:  static-analysis, llvm
progge.rs
Program analysis playground for a simple, imperative language
Stars: ✭ 29 (-67.78%)
Mutual labels:  llvm, static-analysis
Codechecker
CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy
Stars: ✭ 1,209 (+1243.33%)
Mutual labels:  static-analysis, llvm
OCCAM
OCCAM: Object Culling and Concretization for Assurance Maximization
Stars: ✭ 20 (-77.78%)
Mutual labels:  llvm, static-analysis
clam
Static Analyzer for LLVM bitcode based on Abstract Interpretation
Stars: ✭ 180 (+100%)
Mutual labels:  llvm, static-analysis
Tool lists
Links to tools by subject
Stars: ✭ 270 (+200%)
Mutual labels:  static-analysis, verification
Svf
Static Value-Flow Analysis Framework for Source Code
Stars: ✭ 540 (+500%)
Mutual labels:  static-analysis, llvm
Phasar
A LLVM-based static analysis framework.
Stars: ✭ 503 (+458.89%)
Mutual labels:  static-analysis, llvm
Clangkit
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Stars: ✭ 330 (+266.67%)
Mutual labels:  static-analysis, llvm
flextool
C++ compile-time programming (serialization, reflection, code modification, enum to string, better enum, enum to json, extend or parse language, etc.)
Stars: ✭ 32 (-64.44%)
Mutual labels:  llvm, static-analysis
Stoat
STatic (LLVM) Object file Analysis Tool
Stars: ✭ 44 (-51.11%)
Mutual labels:  static-analysis, llvm
klever
Read-only mirror of the Klever Git repository
Stars: ✭ 18 (-80%)
Mutual labels:  static-analysis, verification
jayhorn
Static checker for Java
Stars: ✭ 54 (-40%)
Mutual labels:  static-analysis, verification
SCAF
A Speculation-Aware Collaborative Dependence Analysis Framework
Stars: ✭ 25 (-72.22%)
Mutual labels:  llvm, static-analysis
Dg
[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
Stars: ✭ 242 (+168.89%)
Mutual labels:  static-analysis, llvm
Crab Llvm
Static Analyzer for LLVM bitcode based on Abstract Interpretation
Stars: ✭ 143 (+58.89%)
Mutual labels:  static-analysis, llvm
Sys
Sys: A Static/Symbolic Tool for Finding Good Bugs in Good (Browser) Code
Stars: ✭ 149 (+65.56%)
Mutual labels:  static-analysis, llvm
Smack
SMACK Software Verifier and Verification Toolchain
Stars: ✭ 305 (+238.89%)
Mutual labels:  llvm, verification
Domtresat
Dominator Tree LLVM Pass to Test Satisfiability
Stars: ✭ 42 (-53.33%)
Mutual labels:  static-analysis, llvm

SeaDsa: A Points-to Analysis for Verification of Low-level C/C++

SeaDsa is a context-, field-, and array-sensitive unification-based points-to analysis for LLVM bitcode inspired by DSA. SeaDsa is an order of magnitude more scalable and precise than Dsa and a previous implementation of SeaDsa thanks to improved handling of context sensitivity, addition of partial flow-sensitivity, and type-awareness.

Although SeaDsa can analyze arbitrary LLVM bitcode, it has been tailored for use in program verification of C/C++ programs. It can be used as a stand-alone tool or together with the SeaHorn verification framework and its analyses.

Requirements

SeaDsa is written in C++ and uses the Boost library. The main requirements are:

  • C++ compiler supporting c++11
  • Boost >= 1.55
  • LLVM 5.0

To run tests, install the following packages:

  • sudo pip install lit OutputCheck
  • sudo easy_install networkx
  • sudo apt-get install libgraphviz-dev
  • sudo easy_install pygraphviz

Project Structure

  1. The main Points-To Graph data structures, Graph, Cell, and Node, are defined in include/Graph.hh and src/Graph.cc.
  2. The Local analysis is in include/Local.hh and src/DsaLocal.cc.
  3. The Bottom-Up analysis is in include/BottomUp.hh and src/DsaBottomUp.cc.
  4. The Top-Down analysis is in include/TopDown.hh and src/DsaTopDown.cc.
  5. The interprocedural node cloner is in include/Cloner.hh and src/Clonner.cc.
  6. Type handling code is in include/FieldType.hh, include/TypeUtils.hh, src/FieldType.cc, and src/TypeUtils.cc.
  7. The allocator function discovery is in include/AllocWrapInfo.hh and src/AllocWrapInfo.cc.

Compilation and Usage

Program Verification benchmarks

Instructions on running program verification benchmarks, together with recipes for building real-world projects and our results, can be found in tea-dsa-extras.

Integration in other C++ projects (for users)

SeaDsa contains two directories: include and src. Since SeaDsa analyzes LLVM bitcode, LLVM header files and libraries must be accessible when building with SeaDsa.

If your project uses cmake then you just need to add in your project's CMakeLists.txt:

 include_directories(sea_dsa/include)
 add_subdirectory(sea_dsa)

Standalone (for developers)

If you already installed llvm-5.0 on your machine:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run -DLLVM_DIR=__here_llvm-5.0__/share/llvm/cmake  ..
cmake --build . --target install

Otherwise:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run ..
cmake --build . --target install

To run tests:

cmake --build . --target test-sea-dsa

Visualizing Memory Graphs and Complete Call Graphs

Consider a C program called tests/c/simple.c:

#include <stdlib.h>

typedef struct S {
  int** x;
  int** y;  
} S;

int g;

int main(int argc, char** argv){

  S s1, s2;

  int* p1 = (int*) malloc(sizeof(int));
  int* q1 = (int*) malloc(sizeof(int));  
  s1.x = &p1;
  s1.y = &q1;    
  *(s1.x) = &g;
  
  return 0;
}   

  1. Generate bitcode:

     clang -O0 -c -emit-llvm -S tests/c/simple.c -o simple.ll
    

The option -O0 is used to disable clang optimizations. In general, it is a good idea to enable clang optimizations. However, for trivial examples like simple.c, clang simplifies too much so nothing useful would be observed. The options -c -emit-llvm -S generate bitcode in human-readable format.

  1. Run sea-dsa on the bitcode and print memory graphs to dot format:

     seadsa -sea-dsa=butd-cs -sea-dsa-type-aware -sea-dsa-dot  simple.ll
    

The options -sea-dsa=butd-cs -sea-dsa-type-aware enable the analysis implemented in our FMCAD'19 paper (see References). This command will generate a FUN.mem.dot file for each function FUN in the bitcode program. In our case, the only function is main and thus, there is one file named main.mem.dot. The file is generated in the current directory. If you want to store the .dot files in a different directory DIR then add the option -sea-dsa-dot-outdir=DIR

  1. Visualize main.mem.dot by transforming it to a pdf file:

     dot -Tpdf main.mem.dot -o main.mem.pdf
     open main.mem.pdf  // replace with you favorite pdf viewer 
    

Example of a memory graph

In our memory model, a pointer is represented by a cell which is a pair of a memory object and offset. Memory objects are represented as nodes in the memory graph. Edges are between cells.

Each node field represents a cell (i.e., an offset in the node). For instance, the node fields <0,i32**> and <8,i32**> pointed by %6 and %15, respectively are two different cells from the same memory object. The field <8,i32**> represents the cell at offset 8 in the corresponding memory object and its type is i32**. Since edges are between cells, they are labeled with a number that represents the offset in the destination node. Blue edges connect formal parameters of the function with a cell. Purple edges connect LLVM pointer variables with cells. Nodes can have markers such as S (stack allocated memory), H (heap allocate memory), M (modified memory), R (read memory), E (externally allocated memory), etc. If a node is red then it means that the analysis lost field sensitivity for that node. The label {void} is used to denote that the node has been allocated but it has not been used by the program.

sea-dsa can also resolve indirect calls. An indirect call is a call where the callee is not known statically. sea-dsa identifies all possible callees of an indirect call and generates a LLVM call graph as output.

Consider this example in tests/c/complete_callgraph_5.c:

struct class_t;
typedef int (*FN_PTR)(struct class_t *, int);
typedef struct class_t {
  FN_PTR m_foo;
  FN_PTR m_bar;
} class_t;

int foo(class_t *self, int x)
{
  if (x > 10) {
    return self->m_bar(self, x + 1);
  } else
    return x;
}

int bar (class_t *self, int y) {
  if (y < 100) {
    return y + self->m_foo(self, 10);
  } else
    return y - 5;
}

int main(void) {
  class_t obj;
  obj.m_foo = &foo;
  obj.m_bar = &bar;
  int res;
  res = obj.m_foo(&obj, 42);
  return 0;
}

Type the commands:

clang -c -emit-llvm -S tests/c/complete_callgraph_5.c  -o ex.ll
sea-dsa --sea-dsa-callgraph-dot ex.ll

It generates a .dot file called callgraph.dot in the current directory. Again, the .dot file can be converted to a .pdf file and opened with the commands:

dot -Tpdf callgraph.dot -o callgraph.pdf
open callgraph.pdf  

Example of a call graph

sea-dsa can also print some statistics about the call graph resolution process (note that you need to call clang with -g to print file,line, and column information):

sea-dsa --sea-dsa-callgraph-stats ex.ll


=== Sea-Dsa CallGraph Statistics === 
** Total number of indirect calls 0
** Total number of resolved indirect calls 3

%16 = call i32 %12(%struct.class_t* %13, i32 %15) at tests/c/complete_callgraph_5.c:14:12
RESOLVED
Callees:
  i32 bar(%struct.class_t*,i32)
  
%15 = call i32 %13(%struct.class_t* %14, i32 10) at tests/c/complete_callgraph_5.c:23:16
RESOLVED
Callees:
  i32 foo(%struct.class_t*,i32)
  
%11 = call i32 %10(%struct.class_t* %2, i32 42) at tests/c/complete_callgraph_5.c:36:9
RESOLVED
Callees:
  i32 foo(%struct.class_t*,i32)

Dealing with C/C++ library and external calls

The pointer semantics of external calls can be defined by writing a wrapper that calls any of these functions defined in sea_dsa/sea_dsa.h:

  • extern void sea_dsa_alias(const void *p, ...);
  • extern void sea_dsa_collapse(const void *p);
  • extern void sea_dsa_mk_seq(const void *p, unsigned sz);

sea_dsa_alias unifies all argument's cells, sea_dsa_collapse tells sea-dsa to collapse (i.e., loss of field-sensitivity) the cell pointed by p, and sea_dsa_mk_seq tells sea-dsa to mark as sequence the node pointed by p with size sz.

For instance, consider an external call foo defined as follows:

extern void* foo(const void*p1, void *p2, void *p3);

Suppose that the returned pointer should be unified to p2 but not to p1. In addition, we would like to collapse the cell corresponding to p3. Then, we can replace the above prototype of foo with the following definition:

#include "sea_dsa/sea_dsa.h"
void* foo(const void*p1, void *p2, void*p3) {
	void* r = sea_dsa_new();
	sea_dsa_alias(r,p2);
	sea_dsa_collapse(p3);
	return r;
}

References

  1. "A Context-Sensitive Memory Model for Verification of C/C++ Programs" by A. Gurfinkel and J. A. Navas. In SAS'17. (Paper) | (Slides)

  2. "Unification-based Pointer Analysis without Oversharing" by J. Kuderski, J. A. Navas and A. Gurfinkel. In FMCAD'19. (Paper)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].