Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.

Stars: ✭ 26 (+0%)

Mutual labels: bert-model

R-BERT

Pytorch re-implementation of R-BERT model

Stars: ✭ 59 (+126.92%)

Mutual labels: bert-model

logifix

Fixing static analysis violations in Java source code using Datalog

Stars: ✭ 17 (-34.62%)

Mutual labels: program-repair

CPR

CPR: A new automated program repair technique based on concolic execution which works on patch abstraction with the sub-optimal goal of refining the patch to less over-fit the initial test cases.

Stars: ✭ 22 (-15.38%)

Mutual labels: program-repair

Transformer Temporal Tagger

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

Stars: ✭ 55 (+111.54%)

Mutual labels: bert-model

codeprep

A toolkit for pre-processing large source code corpora

Stars: ✭ 39 (+50%)

Mutual labels: mining-software-repositories

FinBERT-QA

Financial Domain Question Answering with pre-trained BERT Language Model

Stars: ✭ 70 (+169.23%)

Mutual labels: bert-model

text-classification-transformers

Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)

Stars: ✭ 32 (+23.08%)

Mutual labels: bert-model

siamese-BERT-fake-news-detection-LIAR

Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset in PyTorch

Stars: ✭ 96 (+269.23%)

Mutual labels: bert-model

BIFI

[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair

Stars: ✭ 74 (+184.62%)

Mutual labels: program-repair

taipei-QA-BERT

台北QA問答機器人(使用BERT、ALBERT)

Stars: ✭ 38 (+46.15%)

Mutual labels: bert-model

View All Similar Projects ➔

MSR2021-ProgramRepair

Paper

You can find the paper here: https://arxiv.org/abs/2103.11626

Data

data folder contains multiple folders and files:

repetition folder contains MSR datasets WITH <buggy code, fixed code> duplicate pairs
unique folder contains MSR datasets WITHOUT <buggy code, fixed code> duplicate pairs
sstubs(Large|Small).json files contain dataset in JSON format
sstubs(Large|Small)-(train|test|val).json files contain dataset split in JSON format
split/(large|small) folders contain dataset in text format (what the CodeBERT works with)

Running CodeBERT Experiments

Clone the repository
- git lfs install
- git clone https://github.com/EhsanMashhadi/MSR2021-ProgramRepair.git
Download the CodeBERT model
- cd MSR2021-ProgramRepair
- git clone https://huggingface.co/microsoft/codebert-base
- use the downloaded model's directory path as pretrained_model variable in script files
Install dependencies
- pip install torch==1.4.0
- pip install transformers==2.5.0
Train the model with MSR data
- bash ./scripts/codebert/train.sh
Evaluate the model
- bash ./scripts/codebert/test.sh

Running Simple LSTM Experiments

Install OpenNMT-py
- pip install OpenNMT-py==2.2.0
- If you face conflicts between pytorch and CUDA version, you can follow this link
Preprocess the MSR data
- bash ./scripts/simple-lstm/build_vocab.sh
Train the model
- bash ./scripts/simple-lstm/train.sh
Evaluate the model
- bash ./scripts/simple-lstm/test.sh

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

(This is the original version used to run the simple LSTM experiments in the paper.)

Install OpenNMT-py legacy
- pip install OpenNMT-py==1.2.0
Preprocess the MSR data
- bash ./scripts/simple-lstm/legacy/preprocess.sh
Train the model
- bash ./scripts/simple-lstm/legacy/train.sh
Evaluate the model
- bash ./scripts/simple-lstm/legacy/test.sh

How to run all experiments?

You can change the size and type variables value in script files to run different experiments (large | small, unique | repetition).

Have trouble running on GPU?

Check the CUDA and PyTorch compatibility
Assign the correct values for CUDA_VISIBLE_DEVICES, gpu_rank, and world_size based on your GPU numbers in all scripts.
Run on GPU by removing the gpu_rank, and world_size options in all scripts.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

EhsanMashhadi / MSR2021-ProgramRepair

Programming Languages

Labels

Projects that are alternatives of or similar to MSR2021-ProgramRepair

MSR2021-ProgramRepair

Paper

Data

Running CodeBERT Experiments

Running Simple LSTM Experiments

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

How to run all experiments?

Have trouble running on GPU?