All Projects → wanyao1992 → code_summarization_public

wanyao1992 / code_summarization_public

Licence: other
source code for 'Improving automatic source code summarization via deep reinforcement learning'

Programming Languages

python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
ANTLR
299 projects
perl
6916 projects

Projects that are alternatives of or similar to code summarization public

yode
Yode - Focused Code Editing
Stars: ✭ 28 (-60.56%)
Mutual labels:  code, ast
bright
Blazing fast parser for BrightScript that gives you ESTree like AST
Stars: ✭ 28 (-60.56%)
Mutual labels:  ast, tree-structure
Arrow Meta
Functional companion to Kotlin's Compiler
Stars: ✭ 246 (+246.48%)
Mutual labels:  code, tree-structure
reinforce-js
[INACTIVE] A collection of various machine learning solver. The library is an object-oriented approach (baked with Typescript) and tries to deliver simplified interfaces that make using the algorithms pretty simple.
Stars: ✭ 20 (-71.83%)
Mutual labels:  deep-reinforcement-learning, reinforcement
lowcode
React Lowcode - prototype, develop and maintain internal apps easier
Stars: ✭ 32 (-54.93%)
Mutual labels:  code, ast
Phpgrep
Syntax-aware grep for PHP code.
Stars: ✭ 185 (+160.56%)
Mutual labels:  code, ast
DRL in CV
A course on Deep Reinforcement Learning in Computer Vision. Visit Website:
Stars: ✭ 59 (-16.9%)
Mutual labels:  deep-reinforcement-learning, reinforcement
Structures
Collection of abstract data structures implemented in Java
Stars: ✭ 99 (+39.44%)
Mutual labels:  tree-structure
rl
Reinforcement learning algorithms implemented using Keras and OpenAI Gym
Stars: ✭ 14 (-80.28%)
Mutual labels:  reinforcement
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (+88.73%)
Mutual labels:  deep-reinforcement-learning
c-compiler
A compiler that accepts any valid program written in C. It is made using Lex and Yacc. Returns a symbol table, parse tree, annotated syntax tree and intermediate code.
Stars: ✭ 37 (-47.89%)
Mutual labels:  ast
tradingview-webhooks
Backend service converting tradingview alerts into action.
Stars: ✭ 44 (-38.03%)
Mutual labels:  code
lite-xl
A lightweight text editor written in Lua
Stars: ✭ 2,796 (+3838.03%)
Mutual labels:  code
flaskcode
A web based code editor on python flask framework.
Stars: ✭ 43 (-39.44%)
Mutual labels:  code
DPB
Dynamic Project Builder
Stars: ✭ 22 (-69.01%)
Mutual labels:  code
minerva
An out-of-the-box GUI tool for offline deep reinforcement learning
Stars: ✭ 80 (+12.68%)
Mutual labels:  deep-reinforcement-learning
Carla-ppo
This repository hosts a customized PPO based agent for Carla. The goal of this project is to make it easier to interact with and experiment in Carla with reinforcement learning based agents -- this, by wrapping Carla in a gym like environment that can handle custom reward functions, custom debug output, etc.
Stars: ✭ 122 (+71.83%)
Mutual labels:  deep-reinforcement-learning
ethereum-code-analysis
ethereum-code-analysis
Stars: ✭ 12 (-83.1%)
Mutual labels:  code
SelSum
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Stars: ✭ 36 (-49.3%)
Mutual labels:  summarization
prune
A tree library for Java 8 with functional sensibilities.
Stars: ✭ 22 (-69.01%)
Mutual labels:  tree-structure

Note: This repository is no longer maintained now. If you are interested in the deep learning for program analysis (e.g., code summarization, code retrieval, code completion and type inference), please refer to our new project NaturalCC (https://github.com/CGCL-codes/naturalcc).

Requirement

This repos is developed based on the environment of:

  • Python 2.7
  • PyTorch 0.2

Data folder structure

/media/BACKUP/ghproj_d/code_summarization/github-python/ is the folder to save all the data in this project, please replace it to your own folder. The data files are organized as follows in my computer:

|- /media/BACKUP/ghproj_d/code_summarization/github-python

|--original (used to save the raw data)

|----data_ps.declbodies data_ps.descriptions

|--processed (used to save the preprocessed data)

|----all.code all.comment

|--result (used to save the results)

|--train (get the data files before training)

You need to get these files before you starting to train our model. Here I put the original folder in the dataset foler of this project. You'd better copy them to your own folder.

Data preprocess

cd script/github
python python_process.py -train_portion 0.6 -dev_portion 0.2 > log.python_process

Training

Back to the projector folder

cd ../..

Get the data for training

python run.py preprocess

Training

python run.py train_a2c 10 30 10 hybrid 1 0

Testing

python run.py test_a2c hybrid 1 0

TODO

  • To build the AST, on the data preprocessing, I parse the AST into a json and then parse the json into AST on training. This kind of approach is not elegant.
  • On training, I don't know how to batchify the ASTs, so I have to put the ASTs into a list and encode them one by one. It's unefficient, making the training of one epoch takes about 2-3 hours. Please let me know if you have a better way to accelerate this process.
  • On the encoder side, I am working on applying Tree-CNN and GraphCNN to represent the code in a better way.
  • On the decoder side, GAN network will also be considered for the code summarization task.

Acknowledgement

This repos is based on https://github.com/khanhptnk/bandit-nmt

Please cite our paper if you use this repos.

Bibtex:
@Inproceedings{wan2018improving,
title={Improving automatic source code summarization via deep reinforcement learning},
author={Wan, Yao and Zhao, Zhou and Yang, Min and Xu, Guandong and Ying, Haochao and Wu, Jian and Yu, Philip S},
booktitle={Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering}
pages={397--407},
year={2018},
organization={ACM}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].