All Projects → wangcongcong123 → auto_coding

wangcongcong123 / auto_coding

Licence: Apache-2.0 license
A basic and simple tool for code auto completion

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to auto coding

typed python
An llvm-based framework for generating and calling into high-performance native code from Python.
Stars: ✭ 178 (+323.81%)
Mutual labels:  python-programming
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (+80.95%)
Mutual labels:  generative-model
PREREQ-IAAI-19
Inferring Concept Prerequisite Relations from Online Educational Resources (IAAI-19)
Stars: ✭ 22 (-47.62%)
Mutual labels:  generative-model
90 Python Examples
The best way to learn Python is by practicing examples. The repository contains examples of basic concepts of Python. You are advised to take the references from these examples and try them on your own.
Stars: ✭ 190 (+352.38%)
Mutual labels:  python-programming
GPT2-Telegram-Chatbot
GPT-2 Telegram Chat bot
Stars: ✭ 67 (+59.52%)
Mutual labels:  gpt-2
worlds
Building Virtual Reality Worlds using Three.js
Stars: ✭ 23 (-45.24%)
Mutual labels:  generative-model
Wgan
Tensorflow Implementation of Wasserstein GAN (and Improved version in wgan_v2)
Stars: ✭ 228 (+442.86%)
Mutual labels:  generative-model
feed forward vqgan clip
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
Stars: ✭ 135 (+221.43%)
Mutual labels:  generative-model
glico-learning-small-sample
Generative Latent Implicit Conditional Optimization when Learning from Small Sample ICPR 20'
Stars: ✭ 20 (-52.38%)
Mutual labels:  generative-model
caffe-simnets
The SimNets Architecture's Implementation in Caffe
Stars: ✭ 13 (-69.05%)
Mutual labels:  generative-model
NPTEL-The-Joy-of-Computing-using-Python
Study materials related to this course.
Stars: ✭ 29 (-30.95%)
Mutual labels:  python-programming
Awesome Python Books
📚 Directory of Python books
Stars: ✭ 3,154 (+7409.52%)
Mutual labels:  python-programming
InpaintNet
Code accompanying ISMIR'19 paper titled "Learning to Traverse Latent Spaces for Musical Score Inpaintning"
Stars: ✭ 48 (+14.29%)
Mutual labels:  generative-model
Plagiarism-checker-Python
A python project for checking plagiarism of documents based on cosine similarity
Stars: ✭ 114 (+171.43%)
Mutual labels:  python-programming
AC-VRNN
PyTorch code for CVIU paper "AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction"
Stars: ✭ 21 (-50%)
Mutual labels:  generative-model
Sgan
Stacked Generative Adversarial Networks
Stars: ✭ 240 (+471.43%)
Mutual labels:  generative-model
pistoBot
Create an AI that chats like you
Stars: ✭ 121 (+188.1%)
Mutual labels:  gpt-2
texturize
🤖🖌️ Generate photo-realistic textures based on source images. Remix, remake, mashup! Useful if you want to create variations on a theme or elaborate on an existing texture.
Stars: ✭ 495 (+1078.57%)
Mutual labels:  generative-model
eccv16 attr2img
Torch Implemention of ECCV'16 paper: Attribute2Image
Stars: ✭ 93 (+121.43%)
Mutual labels:  generative-model
trVAE
Conditional out-of-distribution prediction
Stars: ✭ 47 (+11.9%)
Mutual labels:  generative-model

AutoCoder

Contributions welcome

A basic and simple tool for code auto completion, fine-tuned from the pytorch pre-trained GPT-2 variants offered by the awesome 🤗 transformers library.

Demo

demo

Play on 🤗HF's Model Hub👇

Features

  • Write with Python or Java.

Blog linked to this project

Quick Start

Here provides three ways of quick-start. Before that,

Load from 🤗transformers models

Now there are two fine-tuned models uploded to 🤗transformers models library. They can be used easily as long as you pip install transformers

from transformers import AutoTokenizer,AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("congcongwang/gpt2_medium_fine_tuned_coder")
model = AutoModelWithLMHead.from_pretrained("congcongwang/gpt2_medium_fine_tuned_coder")
# or
# tokenizer = AutoTokenizer.from_pretrained("congcongwang/distilgpt2_fine_tuned_coder")
# model = AutoModelWithLMHead.from_pretrained("congcongwang/distilgpt2_fine_tuned_coder")
use_cuda=True
context="def factorial"
lang="python" # can be java as well.

if use_cuda:
    model.to("cuda")

input_ids = tokenizer.encode("<python> " + context,
                                     return_tensors='pt') if lang == "python" else tokenizer.encode(
            "<java> " + context, return_tensors='pt')
outputs = model.generate(input_ids=input_ids.to("cuda") if use_cuda else input_ids,
                         max_length=128,
                         temperature=0.7,
                         num_return_sequences=1)

decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)

Ready-to-go Interaction

git clone https://github.com/wangcongcong123/auto_coding.git
pip install -r requirements.txt
  1. Download the fine-tuned models, here are two versions provided.
  2. Unzip the model and move it to ./model (create it first)
  3. Run the interact: python interact.py

Fine-tuning yours

git clone <this repository>
pip install -r requirements.txt
  1. Preparing the dataset
  2. Start fine-tuning model: python train.py --model_select distilgpt2
  3. After fine-tuning, the model will be saved to ./model/distilgpt2_fine_tuned_coder/0_GPTSingleHead which is exactly the fine-tuned version as provided in Ready-to-go Interaction.

* For more params setting of training, python train.py -h

Generative examples

Good Python generation examples by fine-tuned GPT2-medium

-------------Example 1--------------------------------------
Context code: def factorial
Generated: 
def factorial(n):
    return 1 if n == 1 else n * factorial(n - 1)

--------------Example 2-------------------------------------
Context code: def sum_of_series(num_of_terms:int, power: int)
Generated:
def sum_of_series(num_of_terms:int, power: int) -> float:
    """Calculate Sum of Series.
    >>> sum_of_series(1, 1, 10)
    55.0
    >>> sum_of_series(1, 10, 100)
    49600.0
    """
    sum = (num_of_terms / 2) * power
    return sum

--------------Example 3-------------------------------------
Context code: def quick_sort(collection)
Generated:
def quick_sort(collection):
    """
    >>> quick_sort([0, 5, 3, 2, 2])
    [0, 2, 2, 3, 5]
    >>> quick_sort([])
    []
    >>> quick_sort([-2, -5, -45])
    [-45, -5, -2]
    """
    length = len(collection)
    if length <= 1:
        return collection
    else:
        # Use the last element as the first pivot
        pivot = collection.pop()
        # Put elements greater than pivot in greater list
        # Put elements lesser than pivot in lesser list
        greater, lesser = [], []
        for element in collection:
            if element > pivot:
                ...

Good Java generation examples by fine-tuned GPT2-medium

--------------Example 1-------------------------------------
Context code: Scanner input= new Scanner(System.in)
Generated:
Scanner input= new Scanner(System.in)
System.out.println("Enter the Octal number: ");
// Take octal number as input from user in a string
String oct = input.next();

--------------Example 2-------------------------------------
Context code: private static int CountCharacters(String str)
Generated:
private static int CountCharacters(String str) {
        return str.replaceAll("\\s", "").length();
}

* Although some generated examples look good, it needs to take a grain of salt to judge the model's actual performance. The model may simply "remembers" existing code in the training set well.

TODO list

  • Expand the dataset (and construct the dataset more carefeully) and increase context window. Try larger generative models like GPT-2 large or even GPT-3 variants as proposed recently if the computational resources are allowed.
  • Remove overlapping between training examples and dev examples for contamination studies. That says, to what extent the model memorizes examples rigidly or at surface heuristics level during training.
  • Try some adversarial examples (more complicated for model's reasoning capability testing purpose) to test the robustness of the model.
  • Integrate this into real-life use case such as a code editor - Sublime Text, where a threshold of joint probability may need to be studied for code snippet recommendations.
  • Try some ideas of location-aware code generation. For example, if a human coder is sitting writing a comment, the autocoder should be aware of the coder's context (left and right if available) to help complete the corresponding content.
  • Model size and inference efficiency is a problem in real-life use cases.
  • Do research in this problem domain to grab a general idea of what work has done in the literature for this particular problem.

Extra notes

  • For mutli-GPU training, it only works when torch==1.4.0. It will be not working when torch==1.5.0. No idea so far how to fix this issue.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].