All Projects → EhsanMashhadi → MSR2021-ProgramRepair

EhsanMashhadi / MSR2021-ProgramRepair

Licence: other
Code of our paper Applying CodeBERT for Automated Program Repair of Java Simple Bugs which is accepted to MSR 2021.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to MSR2021-ProgramRepair

SZZUnleashed
An implementation of the SZZ algorithm, i.e., an approach to identify bug-introducing commits.
Stars: ✭ 90 (+246.15%)
Mutual labels:  mining-software-repositories
german-sentiment
A data set and model for german sentiment classification.
Stars: ✭ 37 (+42.31%)
Mutual labels:  bert-model
bert-movie-reviews-sentiment-classifier
Build a Movie Reviews Sentiment Classifier with Google's BERT Language Model
Stars: ✭ 12 (-53.85%)
Mutual labels:  bert-model
kGenProg
A High-performance, High-extensibility and High-portability APR System
Stars: ✭ 47 (+80.77%)
Mutual labels:  program-repair
german-sentiment-lib
An easy to use python package for deep learning-based german sentiment classification.
Stars: ✭ 33 (+26.92%)
Mutual labels:  bert-model
gender-unbiased BERT-based pronoun resolution
Source code for the ACL workshop paper and Kaggle competition by Google AI team
Stars: ✭ 42 (+61.54%)
Mutual labels:  bert-model
GenPat
This is an automated transformation inference tool that leverages a big code corpus to guide the abstraction of transformation patterns.
Stars: ✭ 19 (-26.92%)
Mutual labels:  program-repair
spiral
A Python 3 module that provides functions for splitting identifiers found in source code files.
Stars: ✭ 37 (+42.31%)
Mutual labels:  mining-software-repositories
RECCON
This repository contains the dataset and the PyTorch implementations of the models from the paper Recognizing Emotion Cause in Conversations.
Stars: ✭ 126 (+384.62%)
Mutual labels:  bert-model
bert-tensorflow-pytorch-spacy-conversion
Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.
Stars: ✭ 26 (+0%)
Mutual labels:  bert-model
R-BERT
Pytorch re-implementation of R-BERT model
Stars: ✭ 59 (+126.92%)
Mutual labels:  bert-model
logifix
Fixing static analysis violations in Java source code using Datalog
Stars: ✭ 17 (-34.62%)
Mutual labels:  program-repair
CPR
CPR: A new automated program repair technique based on concolic execution which works on patch abstraction with the sub-optimal goal of refining the patch to less over-fit the initial test cases.
Stars: ✭ 22 (-15.38%)
Mutual labels:  program-repair
Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (+111.54%)
Mutual labels:  bert-model
codeprep
A toolkit for pre-processing large source code corpora
Stars: ✭ 39 (+50%)
Mutual labels:  mining-software-repositories
FinBERT-QA
Financial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (+169.23%)
Mutual labels:  bert-model
text-classification-transformers
Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)
Stars: ✭ 32 (+23.08%)
Mutual labels:  bert-model
siamese-BERT-fake-news-detection-LIAR
Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset in PyTorch
Stars: ✭ 96 (+269.23%)
Mutual labels:  bert-model
BIFI
[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair
Stars: ✭ 74 (+184.62%)
Mutual labels:  program-repair
taipei-QA-BERT
台北QA問答機器人(使用BERT、ALBERT)
Stars: ✭ 38 (+46.15%)
Mutual labels:  bert-model

MSR2021-ProgramRepair

Paper

You can find the paper here: https://arxiv.org/abs/2103.11626

Data

data folder contains multiple folders and files:

  • repetition folder contains MSR datasets WITH <buggy code, fixed code> duplicate pairs
  • unique folder contains MSR datasets WITHOUT <buggy code, fixed code> duplicate pairs
  • sstubs(Large|Small).json files contain dataset in JSON format
  • sstubs(Large|Small)-(train|test|val).json files contain dataset split in JSON format
  • split/(large|small) folders contain dataset in text format (what the CodeBERT works with)

Running CodeBERT Experiments

  1. Clone the repository
    • git lfs install
    • git clone https://github.com/EhsanMashhadi/MSR2021-ProgramRepair.git
  2. Download the CodeBERT model
    • cd MSR2021-ProgramRepair
    • git clone https://huggingface.co/microsoft/codebert-base
    • use the downloaded model's directory path as pretrained_model variable in script files
  3. Install dependencies
    • pip install torch==1.4.0
    • pip install transformers==2.5.0
  4. Train the model with MSR data
    • bash ./scripts/codebert/train.sh
  5. Evaluate the model
    • bash ./scripts/codebert/test.sh

Running Simple LSTM Experiments

  1. Install OpenNMT-py
    • pip install OpenNMT-py==2.2.0
    • If you face conflicts between pytorch and CUDA version, you can follow this link
  2. Preprocess the MSR data
    • bash ./scripts/simple-lstm/build_vocab.sh
  3. Train the model
    • bash ./scripts/simple-lstm/train.sh
  4. Evaluate the model
    • bash ./scripts/simple-lstm/test.sh

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

(This is the original version used to run the simple LSTM experiments in the paper.)

  1. Install OpenNMT-py legacy
    • pip install OpenNMT-py==1.2.0
  2. Preprocess the MSR data
    • bash ./scripts/simple-lstm/legacy/preprocess.sh
  3. Train the model
    • bash ./scripts/simple-lstm/legacy/train.sh
  4. Evaluate the model
    • bash ./scripts/simple-lstm/legacy/test.sh

How to run all experiments?

  • You can change the size and type variables value in script files to run different experiments (large | small, unique | repetition).

Have trouble running on GPU?

  1. Check the CUDA and PyTorch compatibility
  2. Assign the correct values for CUDA_VISIBLE_DEVICES, gpu_rank, and world_size based on your GPU numbers in all scripts.
  3. Run on GPU by removing the gpu_rank, and world_size options in all scripts.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].