All Projects → autosoft-dev → tree-hugger

autosoft-dev / tree-hugger

Licence: MIT license
A light-weight, extendable, high level, universal code parser built on top of tree-sitter

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
javascript
184084 projects - #8 most used programming language
PHP
23972 projects - #3 most used programming language
java
68154 projects - #9 most used programming language
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to tree-hugger

Uaiso
A multi-language parsing infrastructure with an unified AST
Stars: ✭ 86 (-10.42%)
Mutual labels:  parsing, ast
Down
Blazing fast Markdown / CommonMark rendering in Swift, built upon cmark.
Stars: ✭ 1,895 (+1873.96%)
Mutual labels:  parsing, ast
Libdparse
Library for lexing and parsing D source code
Stars: ✭ 91 (-5.21%)
Mutual labels:  parsing, ast
Estree
The ESTree Spec
Stars: ✭ 3,867 (+3928.13%)
Mutual labels:  parsing, ast
Tree Sitter
An incremental parsing system for programming tools
Stars: ✭ 7,083 (+7278.13%)
Mutual labels:  tree-sitter, parsing
Meriyah
A 100% compliant, self-hosted javascript parser - https://meriyah.github.io/meriyah
Stars: ✭ 690 (+618.75%)
Mutual labels:  parsing, ast
Yacep
yet another csharp expression parser
Stars: ✭ 107 (+11.46%)
Mutual labels:  parsing, ast
kolasu
Kotlin Language Support – AST Library
Stars: ✭ 45 (-53.12%)
Mutual labels:  parsing, ast
SwiftTreeSitter
Swift wrappers for the tree-sitter incremental parsing system
Stars: ✭ 116 (+20.83%)
Mutual labels:  tree-sitter, parsing
Rosie Pattern Language
Rosie Pattern Language (RPL) and the Rosie Pattern Engine have MOVED!
Stars: ✭ 146 (+52.08%)
Mutual labels:  data-mining, parsing
inmemantlr
ANTLR as a libray for JVM based languages
Stars: ✭ 87 (-9.37%)
Mutual labels:  parsing, ast
codeparser
Parse Wolfram Language source code as abstract syntax trees (ASTs) or concrete syntax trees (CSTs)
Stars: ✭ 84 (-12.5%)
Mutual labels:  parsing, ast
hxjsonast
Parse JSON into position-aware AST with Haxe!
Stars: ✭ 28 (-70.83%)
Mutual labels:  parsing, ast
Esprima
ECMAScript parsing infrastructure for multipurpose analysis
Stars: ✭ 6,391 (+6557.29%)
Mutual labels:  parsing, ast
node-typescript-parser
Parser for typescript (and javascript) files, that compiles those files and generates a human understandable AST.
Stars: ✭ 121 (+26.04%)
Mutual labels:  parsing, ast
Graphql Go Tools
Tools to write high performance GraphQL applications using Go/Golang.
Stars: ✭ 96 (+0%)
Mutual labels:  parsing, ast
cppcombinator
parser combinator and AST generator in c++17
Stars: ✭ 20 (-79.17%)
Mutual labels:  parsing, ast
kataw
An 100% spec compliant ES2022 JavaScript toolchain
Stars: ✭ 303 (+215.63%)
Mutual labels:  parsing, ast
Escaya
An blazing fast 100% spec compliant, incremental javascript parser written in Typescript
Stars: ✭ 217 (+126.04%)
Mutual labels:  parsing, ast
ltreesitter
Standalone tree sitter bindings for the Lua language
Stars: ✭ 62 (-35.42%)
Mutual labels:  tree-sitter, parsing

Code mining at scale - tree hugger

Downloads PRs Welcome Support Python Version PyPI version autosoft-dev

For People in a Hurry :)

Open In Colab

tree-hugger

Mine source code repositories at scale. Easily. Tree-hugger is a light-weight, high level library which provides Pythonic APIs to mine trough Git repositories (it works on any collection of supported code files, actually).

Tree-hugger is built on top of tree-sitter.

Covered languages:

  • Python
  • PHP
  • Java
  • JavaScript
  • C++

System Requirement: Python 3.6

Contributors

Made with contributors-img.

Contents

Table of contents

Installation

From pip:

pip install -U tree-hugger PyYAML

From Source:

git clone https://github.com/autosoft-dev/tree-hugger.git

cd tree-hugger

pip install -e .

The installation process is tested in macOS Mojave, we have a separate docker binding for compiling the libraries for Linux and soon this library will be integrated in that as well

You may need to install libgit2. In case you are in mac just use brew install libgit2

Setup

Getting your .so files

Update - 19.11.2021 -

We are not able to support the s3 based download anymore. So the download_libs command does not work. We are making them available via this release - https://github.com/autosoft-dev/tree-hugger/releases/tag/0.10.1 Please download the required zip file from there.

Please note that building the libraries has been tested under a macOS Mojave with Apple LLVM version 10.0.1 (clang-1001.0.46.4). However, they should work on all main stream Linux systems. We have not tested them on Windows.

Environment variables

You can set up TS_LIB_PATH environment variable for the tree-sitter lib path (the .so files you just donwloaded) and then the libary will use them automatically. Otherwise, as an alternative, you can pass it when creating any Parser object.

Hello world example

  1. Generate the librairies : run the above command to generate the libraries.

    In our settings we use the -c flag to copy the generated tree-sitter library's .so file to our workspace. Once copied, we place it under a directory called tslibs (It is in the .gitignore).

    If you are using linux,you will need to use our tree-sitter-docker image and manually copy the final .so file. Unless you are in a debian based distro and in that case you should probably use our pre-compiled version via download_libs command as described above

  2. Setup environment variable (optional) Assuming that you have the necessary environment variable setup. The following line of code will create a Parser object according to the language you want to analyse:

Python

# Python
from tree_hugger.core import PythonParser
pp = PythonParser()
pp.parse_file("tests/assets/file_with_different_functions.py")
pp.get_all_function_names()
Out[4]:
['first_child', 'second_child', 'say_whee', 'wrapper', 'my_decorator', 'parent']

PHP

# PHP
from tree_hugger.core import PHPParser
phpp = PHPParser()
phpp.parse_file("tests/assets/file_with_different_functions.php")
phpp.get_all_function_names() 
Out[5] :
['foo', 'test', 'simple_params', 'variadic_param' ]

Java

# Java 
from tree_hugger.core import JavaParser
jp = JavaParser()
jp.parse_file("tests/assets/file_with_different_methods.java")
jp.get_all_class_names() 
Out[6] :
['HelloWorld','Animal', 'Dog' ]

JavaScript

# JavaScript
from tree_hugger.core import JavascriptParser
jsp = JavascriptParser()
jsp.parse_file("tests/assets/file_with_different_functions.js")
jsp.get_all_function_names() 
Out[7] :
['test', 'utf8_to_b64',	'sum', 'multiply' ]

C++

from tree_hugger.core import CPPParser
cp = CPPParser()
cp.parse_file("tests/assets/file_with_different_functions.cpp")
cp.get_all_function_names() 
Out[8] :
['foo', 'test', 'simple_params', 'variadic_param' ]

API reference

Language Functions Methods Classes
Python get_all_function_names get_all_function_doctrings get_all_function_names_and_params get_all_function_bodies get_all_class_method_names get_all_method_docstrings get_all_method_documentations get_all_class_method_bodies get_all_class_names get_all_class_docstrings
PHP get_all_function_names get_all_function_names_with_params get_all_function_bodies get_all_function_docstrings get_all_function_documentations get_all_class_method_names get_all_method_docstrings get_all_method_documentations get_all_class_method_bodies get_all_class_names get_all_class_docstrings get_all_class_documentations
Java get_all_class_method_names get_all_method_names_with_params get_all_method_bodies get_all_method_javadocs get_all_method_documentations get_all_class_names get_all_class_javadocs get_all_class_documentations
JavaScript get_all_function_names get_all_function_names_with_params get_all_function_bodies get_all_function_jsdocs get_all_function_documentations get_all_class_method_names get_all_method_jsdocs get_all_method_documentations get_all_class_names get_all_class_jsdocs get_all_class_documentations
C++ get_all_function_names get_all_function_names_with_params get_all_function_commentdocs get_all_function_documentations get_all_function_bodies get_all_class_method_names get_all_class_names get_all_class_commentdocs get_all_class_documentations

Extending tree-hugger

Extending tree-hugger for other languages and/or more functionalities for the already provided ones, is easy.

  1. Adding languages:

Parsed languages can be extended through adding a parser class from the BaseParser class. The only mandatory argument that a Parser class should pass to the parent is the language. This is a string. Such as python (lower case). Each parser class must have the options to take in the path of the tree-sitter library (.so file that we are using to parse the code) and the path to the queries yaml file, in their constructor.

The BaseParser class can do few things:

  • Loading and preparing the .so file with respect to the language you just mentioned.
  • Loading, preparing and parsing the query yaml file. (for the queries, we internally use an extended UserDict class)
  • Providing an API to parse a file and prepare it for query. BaseParser.parse_file

It also gives you another (most likely not to be exposed outside) API _run_query_and_get_captures which lets you run any queries and return back the matched results (if any) from the parsed tree.

We use those APIs once we have called parse_file and parsed the file.

  1. Adding queries:

Queries processed on source code are s-expressions, they are listed in a queries.ymlfile for each parser class. Tree-hugger gives you a way to write your queries in yaml file for each language parsed.

Query structure: A name of a query followed by the query itself. Written as an s-expression. Example:

all_function_docstrings:
        "
        (
            function_definition
            name: (identifier) @function.def
            body: (block(expression_statement(string))) @function.docstring
        )
        "

You have to follow yaml grammar while writing these queries. You can see a bit more about writng these queries in the documentation of tree-sitter.

Some example queries, that you will find in the yaml file (and their corresponding API from the PythonParser class) -

* all_function_names => get_all_function_names()

* all_function_docstrings => get_all_function_documentations()

* all_class_methods => get_all_class_method_names()

Roadmap

  • Documentation: tutorial on queries writing

  • Write *Parser class for other languages

Languages Status-Finished Author
Python Shubhadeep
PHP Clément
Java Clément
JavaScript Clément
C++ Clément

If you are using tree-hugger in your project, please consider putting parssr: tree-hugger in your project :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].