JetBrains-Research / Lupa

Licence: Apache-2.0 license
A framework for the large scale analysis of programming language usage.

Programming Languages

Jupyter Notebook
11667 projects
kotlin
9241 projects
python
139335 projects - #7 most used programming language

Lupa 🔍

JetBrains Research Kotlin build Python build

Lupa 🔍 is an extendable framework for analyzing fine-grained language usage on the base of the IntelliJ Platform. Lupa 🔍 is a command line tool that uses the power of the IntelliJ Platform under the hood to perform code analysis using the same industry-level tools that are employed in IntelliJ-based IDEs, such as IntelliJ IDEA, PyCharm, or CLion.

Currently, our framework supports analyzing two languages: Python --- a mature language most popular in data science and machine learning, and Kotlin --- a relatively young but quickly growing language.

How it works

Lupa 🔍 is a platform for large-scale analysis of the programming language usage. Specifically, Lupa 🔍 is implemented as a plugin for the IntelliJ Platform that reuses its API to launch the IDE in the background (without user interface) and run the necessary analysis on every project in the given dataset.

The main pipeline of Lupa 🔍 is demonstrated bellow:

An operating pipeline of the tool

To perform the analysis, the tool needs two obvious components: a dataset and analyzers, i.e., sets of instructions of what PSI tree nodes need to be analyzed and how. To get more information about data collection see the data_collection module. The repository contains several core-modules:

  • lupa-core - functions common to all modules and analyzers;
  • lupa-test - common tests' architecture for all modules;
  • lupa-runner - the module with runners for all analyzers;
  • scripts - common functionality for data gathering, processing and visualization (written in Python).

And several examples of analyzers that we used for our purposes:

  1. Kotlin's analysers:
    • clones - functionality related to clones analysis in Kotlin projects;
    • dependencies - functionality related to dependency analysis in Kotlin projects;
    • gradle - functionality related to code analysis of the Gradle files in Kotlin projects;
    • statistic - functionality related to different code analysis in Kotlin projects, like range analysis;
  2. Python's analysers:
    • callExpressions - functionality related to call expressions (functions, classes, decorators) analysis in Python projects;
    • imports - functionality related to imports analysis in Python projects.

To get more information see these modules (each of them has a README file).

Installation

Clone the repo by git clone https://github.com/JetBrains-Research/Lupa.git.

For analyzers modules and core architecture you should have Kotlin at least 1.5.21 version. For functionality for data gathering, processing and visualization (scripts module) you should have Python 3+ and also run:

  • pip install -r scripts/requirements.txt
  • pip install -r scripts/requirements-test.txt - for tests (optional)
  • pip install -r scripts/requirements-code-style.txt - for code style checkers (optional)

Usage

  1. For analyzers:
  2. For functionality for data gathering, processing and visualization:

Contribution

Please be sure to review project's contributing guidelines to learn how to help the project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].