All Projects → saltudelft → type4py

saltudelft / type4py

Licence: Apache-2.0 license
Type4Py: Deep Similarity Learning-Based Type Inference for Python

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to type4py

Deep Learning With Tensorflow Book
深度学习入门开源书,基于TensorFlow 2.0案例实战。Open source Deep Learning book, based on TensorFlow 2.0 framework.
Stars: ✭ 12,105 (+29424.39%)
Mutual labels:  machinelearning, deeplearning
Clearml Server
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, ML-Ops and Data-Management
Stars: ✭ 186 (+353.66%)
Mutual labels:  machinelearning, deeplearning
Mariana
The Cutest Deep Learning Framework which is also a wonderful Declarative Language
Stars: ✭ 151 (+268.29%)
Mutual labels:  machinelearning, deeplearning
datascience-mashup
In this repo I will try to gather all of the projects related to data science with clean datasets and high accuracy models to solve real world problems.
Stars: ✭ 36 (-12.2%)
Mutual labels:  machinelearning, deeplearning
awesome-conformal-prediction
A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.
Stars: ✭ 998 (+2334.15%)
Mutual labels:  machinelearning, deeplearning
Real Time Ml Project
A curated list of applied machine learning and data science notebooks and libraries across different industries.
Stars: ✭ 143 (+248.78%)
Mutual labels:  machinelearning, deeplearning
Best ai paper 2020
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code
Stars: ✭ 2,140 (+5119.51%)
Mutual labels:  machinelearning, deeplearning
Fasttext.js
FastText for Node.js
Stars: ✭ 127 (+209.76%)
Mutual labels:  machinelearning, deeplearning
Netron
Visualizer for neural network, deep learning, and machine learning models
Stars: ✭ 17,193 (+41834.15%)
Mutual labels:  machinelearning, deeplearning
Awesome Deep Learning And Machine Learning Questions
【不定期更新】收集整理的一些网站中(如知乎、Quora、Reddit、Stack Exchange等)与深度学习、机器学习、强化学习、数据科学相关的有价值的问题
Stars: ✭ 203 (+395.12%)
Mutual labels:  machinelearning, deeplearning
All4nlp
All For NLP, especially Chinese.
Stars: ✭ 141 (+243.9%)
Mutual labels:  machinelearning, deeplearning
Nearest-Celebrity-Face
Tensorflow Implementation of FaceNet: A Unified Embedding for Face Recognition and Clustering to find the celebrity whose face matches the closest to yours.
Stars: ✭ 30 (-26.83%)
Mutual labels:  machinelearning, deeplearning
Xlearning
AI on Hadoop
Stars: ✭ 1,709 (+4068.29%)
Mutual labels:  machinelearning, deeplearning
Machine Learning Tutorials
machine learning and deep learning tutorials, articles and other resources
Stars: ✭ 11,692 (+28417.07%)
Mutual labels:  machinelearning, deeplearning
Horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+29029.27%)
Mutual labels:  machinelearning, deeplearning
Java Deep Learning Cookbook
Code for Java Deep Learning Cookbook
Stars: ✭ 156 (+280.49%)
Mutual labels:  machinelearning, deeplearning
Kamonohashi
AI開発プラットフォームKAMONOHASHI
Stars: ✭ 80 (+95.12%)
Mutual labels:  machinelearning, deeplearning
Monk gui
A Graphical user Interface for deep learning and computer vision over Monk Libraries
Stars: ✭ 120 (+192.68%)
Mutual labels:  machinelearning, deeplearning
Clearml
ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
Stars: ✭ 2,868 (+6895.12%)
Mutual labels:  machinelearning, deeplearning
gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-53.66%)
Mutual labels:  machinelearning, deeplearning

Type4Py: Deep Similarity Learning-Based Type Inference for Python

GH Workflow GH Workflow

This repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.

Dataset

For Type4Py, we use the ManyTypes4Py dataset. You can download the latest version of the dataset here. Also, note that the dataset is already de-duplicated.

Code De-deduplication

If you want to use your own dataset, it is essential to de-duplicate the dataset by using a tool like CD4Py.

Installation Guide

Requirements

Here are the recommended system requirements for training Type4Py on the MT4Py dataset:

  • Linux-based OS (Ubuntu 18.04 or newer)
  • Python 3.6 or newer
  • A high-end NVIDIA GPU (w/ at least 8GB of VRAM)
  • A CPU with 16 threads or higher (w/ at least 64GB of RAM)

Quick Install

git clone https://github.com/saltudelft/type4py.git && cd type4py
pip install .

Usage Guide

Follow the below steps to train and evaluate the Type4Py model.

1. Extraction

NOTE: Skip this step if you're using the ManyTypes4Py dataset.

$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES

Description:

  • $DATA_PATH: The path to the Python corpus or dataset.
  • $OUTPUT_DIR: The path to store processed projects.
  • $DUP_FILES: The path to the duplicate files, i.e., the *.jsonl.gz file produced by CD4Py. [Optional]
  • $CORES: Number of CPU cores to use for processing projects.

2. Preprocessing

$ type4py preprocess --o $OUTPUT_DIR --l $LIMIT

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects. For the MT4Py dataset, use the directory in which the dataset is extracted.
  • $LIMIT: The number of projects to be processed. [Optional]

3. Vectorizing

$ type4py vectorize --o $OUTPUT_DIR

Description:

  • $OUTPUT_DIR: The path that was used in the previous step to store processed projects.

4. Learning

$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE

Description:

  • $OUTPUT_DIR: The path that was used in the previous step to store processed projects.

  • --c: Trains the complete model. Use type4py learn -h to see other configurations.

  • --p $PARAM_FILE: The path to user-provided hyper-parameters for the model. See this file as an example. [Optional]

5. Testing

$ type4py predict --o $OUTPUT_DIR --c

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • --c: Predicts using the complete model. Use type4py predict -h to see other configurations.

6. Evaluating

$ type4py eval --o $OUTPUT_DIR --t c --tp 10

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • --t: Evaluates the model considering different prediction tasks. E.g., --t c considers all predictions tasks, i.e., parameters, return, and variables. [Default: c]
  • --tp 10: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]

Use type4py eval -h to see other options.

Reduce

To reduce the dimension of the created type clusters in step 5, run the following command:

Note: The reduced version of type clusters causes a slight performance loss in type prediction.

$ type4py reduce --o $OUTPUT_DIR --d $DIMENSION

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • $DIMENSION: Reduces the dimension of type clusters to the specified value [Default: 256]

Converting Type4Py to ONNX

To convert the pre-trained Type4Py model to the ONNX format, use the following command:

$ type4py to_onnx --o $OUTPUT_DIR

Description:

  • $OUTPUT_DIR: The path that was used in the usage section to store processed projects and the model.

VSCode Extension

vsm-version

Type4Py can be used in VSCode, which provides ML-based type auto-completion for Python files. The Type4Py's VSCode extension can be installed from the VS Marketplace here.

Using Local Pre-trained Model

Type4Py's pre-trained model can be queried locally by using provided Docker images. See here for usage info.

Type4Py Server

GH Workflow

The Type4Py server is deployed on our server, which exposes a public API and powers the VSCode extension. However, if you would like to deploy the Type4Py server on your own machine, you can adapt the server code here. Also, please feel free to reach out to us for deployment, using the pre-trained Type4Py model and how to train your own model by creating an issue.

Citing Type4Py

@inproceedings{mir2022type4py,
  title={Type4Py: practical deep similarity learning-based type inference for python},
  author={Mir, Amir M and Lato{\v{s}}kinas, Evaldas and Proksch, Sebastian and Gousios, Georgios},
  booktitle={Proceedings of the 44th International Conference on Software Engineering},
  pages={2241--2252},
  year={2022}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].