All Projects → saltudelft → ml4se

saltudelft / ml4se

Licence: other
A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

Projects that are alternatives of or similar to ml4se

ck-env
CK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:
Stars: ✭ 67 (+45.65%)
Mutual labels:  tools, datasets
Paper-Notes
Paper notes in deep learning/machine learning and computer vision
Stars: ✭ 37 (-19.57%)
Mutual labels:  research, papers
Recsech
Recsech is a tool for doing Footprinting and Reconnaissance on the target web. Recsech collects information such as DNS Information, Sub Domains, HoneySpot Detected, Subdomain takeovers, Reconnaissance On Github and much more you can see in Features in tools .
Stars: ✭ 173 (+276.09%)
Mutual labels:  research, tools
Code Switching Papers
A curated list of research papers and resources on code-switching
Stars: ✭ 122 (+165.22%)
Mutual labels:  research, papers
best AI papers 2021
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.
Stars: ✭ 2,740 (+5856.52%)
Mutual labels:  research, papers
Top 10 Computer Vision Papers 2020
A list of the top 10 computer vision papers in 2020 with video demos, articles, code and paper reference.
Stars: ✭ 132 (+186.96%)
Mutual labels:  research, papers
awesome-mobile-robotics
Useful links of different content related to AI, Computer Vision, and Robotics.
Stars: ✭ 243 (+428.26%)
Mutual labels:  research, datasets
Learn Something Every Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+686.96%)
Mutual labels:  research, software-engineering
plur
PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.
Stars: ✭ 67 (+45.65%)
Mutual labels:  research, software-engineering
CodeWars
Daily Coding Exercises to sharpen problem solving skills
Stars: ✭ 67 (+45.65%)
Mutual labels:  code, software-engineering
Zotsite
Export Zotero to a stand-alone web site
Stars: ✭ 117 (+154.35%)
Mutual labels:  research, papers
streamdb-readings
Readings in Stream Processing
Stars: ✭ 62 (+34.78%)
Mutual labels:  research, papers
Papers
Publications from the MathJax project
Stars: ✭ 6 (-86.96%)
Mutual labels:  research, papers
System Design Papers
A list of papers on system design.
Stars: ✭ 136 (+195.65%)
Mutual labels:  research, papers
Ios
Most usable tools for iOS penetration testing
Stars: ✭ 563 (+1123.91%)
Mutual labels:  research, tools
Zr Obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Stars: ✭ 219 (+376.09%)
Mutual labels:  research, datasets
Awesome Transit
Community list of transit APIs, apps, datasets, research, and software 🚌🌟🚋🌟🚂
Stars: ✭ 713 (+1450%)
Mutual labels:  tools, datasets
CodeAndQuestsEveryDay
Regular research on the Quest for developers.
Stars: ✭ 27 (-41.3%)
Mutual labels:  research, code
awesome-end2end-speech-recognition
💬 A list of End-to-End speech recognition, including papers, codes and other materials
Stars: ✭ 49 (+6.52%)
Mutual labels:  code, papers
awesome-list-of-awesomes
A curated list of all the Awesome --Topic Name-- lists I've found till date relevant to Data lifecycle, ML and DL.
Stars: ✭ 259 (+463.04%)
Mutual labels:  research, papers

Machine Learning for Software Engineering

This repository contains a curated list of papers, datasets, and tools that are devoted to research on Machine Learning for Software Engineering. The papers are organized into popular research areas so that researchers can find recent papers and state-of-the-art approaches easily.

Please feel free to send a pull request to add papers and relevant content that are not listed here.

Content

Papers

Type Inference

  • Type Inference as Optimization (2021), NeurIPS'21 AIPLANS, Pandi, Irene Vlassi, et al. [pdf]
  • SimTyper: Sound Type Inference for Ruby using Type Equality Prediction (2021), OOPSLA'21, Kazerounian, Milod, et al.
  • Learning type annotation: is big data enough? (2021), FSE 2021, Jesse, Kevin, et al.
  • Cross-Lingual Adaptation for Type Inference (2021), arxiv 2021, Li, Zhiming, et al. [pdf]
  • PYInfer: Deep Learning Semantic Type Inference for Python Variables (2021), arxiv 2021, Cui, Siwei, et al. [pdf]
  • HiTyper: A Hybrid Static Type Inference Framework with Neural Prediction (2021), arxiv 2021, Peng, Yun, et al. [pdf]
  • Type4Py: Deep Similarity Learning-Based Type Inference for Python (2021), arxiv 2021, Mir, Amir, et al. [pdf]
  • Advanced Graph-Based Deep Learning for Probabilistic Type Inference (2020), arxiv 2020, Ye, Fangke, et al. [pdf]
  • Typilus: Neural Type Hints (2020), PLDI 2020, Allamanis, Miltiadis, et al. [pdf]
  • LambdaNet: Probabilistic Type Inference using Graph Neural Networks (2020), arxiv 2020, Wei, Jiayi, et al. [pdf]
  • TypeWriter: Neural Type Prediction with Search-based Validation (2019), arxiv 2019, Pradel, Michael, et al. [pdf]
  • NL2Type: Inferring JavaScript Function Types from Natural Language Information (2019), ICSE 2019, Malik, Rabee S., et al. [pdf]
  • Deep Learning Type Inference (2018), ESEC/FSE 2018, Hellendoorn, Vincent J., et al. [pdf]
  • Python Probabilistic Type Inference with Natural Language Support (2016), FSE 2016, Xu, Zhaogui, et al.
  • Predicting Program Properties from “Big Code” (2015) ACM SIGPLAN 2015, Raychev, Veselin, et al. [pdf]

Code Completion

  • CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences, ICSE'22, Izadi, Maliheh, et al. [pdf] [code]
  • Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs (2021), AAAI'21, Wang, Yanlin, et al. [pdf]
  • Code Prediction by Feeding Trees to Transformers (2021), ICSE'21, Kim, Seohyun, et al. [pdf]
  • Fast and Memory-Efficient Neural Code Completion (2020), arxiv 2020, Svyatkovskoy, Alexey, et al. [pdf]
  • Pythia: AI-assisted Code Completion System (2019), KDD'19, Svyatkovskiy, Alexey, et al. [pdf]
  • Code Completion with Neural Attention and Pointer Networks (2018), arxiv 2018, Li, Jian, et al. [pdf]

Code Generation

  • Predictive Synthesis of API-Centric Code (2022), arxiv 2022, Nam, Daye, et al. [pdf]
  • Evaluating Large Language Models Trained on Code (2021), arxiv 2021, Chen, Mark, et al. [pdf] [code]
  • Code Prediction by Feeding Trees to Transformers (2020), arxiv 2020, Kim, Seohyun, et al. [pdf]
  • TreeGen: A Tree-Based Transformer Architecture for Code Generation (2019), arxiv 2019, Zhu, Qihao, et al. [pdf]
  • A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation (2017), arxiv 2017, Barone, Antonio V. M., et al. [pdf]

Code Summarization

  • Project-Level Encoding for Neural Source Code Summarization of Subroutines (2021), ICPC 2021, Bansal, Aakash, et al. [pdf]
  • Code Structure Guided Transformer for Source Code Summarization (2021), arxiv 2021, Gao, Shuzheng, et al. [pdf]
  • Source Code Summarization Using Attention-Based Keyword Memory Networks (2020), IEEE BigComp 2020, Choi, YunSeok, et al.
  • A Transformer-based Approach for Source Code Summarization (2020), arxiv 2020, Ahmad, Wasi Uddin, et al. [pdf]
  • Learning to Represent Programs with Graphs (2018), ICLR'18, Allamanis, Miltiadis, et al. [pdf]
  • A Convolutional Attention Network for Extreme Summarization of Source Code (2016), ICML 2016, Allamanis, Miltiadis, et al. [pdf]

Code Embeddings

  • SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations (2022), ICSE'22, Niu, Changan, et al. [pdf]
  • TreeCaps: Tree-Based Capsule Networks for Source Code Processing (2021), AAAI'21, Bui, Nghi DQ, et al. [pdf] [code]
  • Language-Agnostic Representation Learning of Source Code from Structure and Context (2021), ICLR'21, Zügner, Daniel, et al. [pdf]
  • Learning and Evaluating Contextual Embedding of Source Code (2020), ICML 2020, Kanade, Aditya, et al. [pdf]
  • Learning Semantic Program Embeddings with Graph Interval Neural Network (2020), OOPSLA'20, Wang, Yu, et al.
  • Contrastive Code Representation Learning (2020), arxiv 2020, Jain, Paras, et al. [pdf]
  • Codebert: A Pre-trained Model for Programming and Natural Languages (2020), arxiv 2020, Feng, Zhangyin, et al. [pdf]
  • SCELMo: Source Code Embeddings from Language Models (2020), arxiv 2020, Karampatsis, Rafael-Michael, et al. [pdf]
  • code2vec: Learning Distributed Representations of Code (2019), ACM POPL 2019, Alon, Uri, et al. [pdf]
  • COSET: A Benchmark for Evaluating Neural Program Embeddings (2019), arxiv 2019, Wang, Ke, et al. [pdf]
  • A Literature Study of Embeddings on Source Code (2019), arxiv 2019, Chen, Zimin, et al. [pdf]
  • code2seq: Generating Sequences from Structured Representations of Code (2018), arxiv 2018, Alon, Uri, et al. [pdf]
  • Neural Code Comprehension: A Learnable Representation of Code Semantics (2018), NIPS 2018, Ben-Nun, Tal, et al. [pdf]

Code Changes

  • Commit2Vec: Learning Distributed Representations of Code Changes (2021), SN Computer Science, Lozoya, Rocío Cabrera, et al.
  • CODIT: Code Editing with Tree-Based Neural Models (2020), TSE 2020, Chakraborty, Saikat, et al.
  • On learning meaningful code changes via neural machine translation (2019), ICSE 2019, Tufano, Michele, et al.

Bug/Vulnerability Detection

  • Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks (2022), ICSE'22, Patra, Jibesh, et al. [pdf]
  • Hoppity: Learning graph transformations to detect and fix bugs in programs (2020), ICLR 2020, Dinella, Elizabeth, et al. [pdf]
  • Deep Learning based Software Defect Prediction (2020), Neurocomputing, Qiao, Lei, et al.
  • Software Vulnerability Discovery via Learning Multi-domain Knowledge Bases (2019), IEEE TDSC, Lin, Guanjun, et al.
  • Neural Bug Finding: A Study of Opportunities and Challenges (2019), arxiv 2019, Habib, Andrew, et al. [pdf]
  • Automated Vulnerability Detection in Source Code Using Deep Representation Learning (2018), ICMLA 2018, Russell, Rebecca, et al.
  • DeepBugs: A Learning Approach to Name-based Bug Detection (2018), ACM PL 2018, Pradel, Michael, et al. [pdf]
  • Automatically Learning Semantic Features for Defect Prediction (2016), ICSE 2016, Wang, Song, et al.

Source Code Modeling

  • Multilingual training for Software Engineering, ICSE'22, Ahmed, Toufique, et al. [pdf]
  • Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code, ICSE'20, Karampatsis, Rafael-Michael, et al.
  • Maybe Deep Neural Networks are the Best Choice for Modeling Source Code (2019), arxiv 2019, Karampatsis, Rafael-Michael, et al. [pdf]
  • Are Deep Neural Networks the Best Choice for Modeling Source Code? (2017), FSE 2017, Hellendoorn, Vincent J., et al. [pdf]

Program Repair

  • TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer (2021), ICML'21, Berabi, Berkay, et al. [pdf]
  • Neural Transfer Learning for Repairing Security Vulnerabilities in C Code (2021), Chen, Zimin, et al. [pdf]
  • Generating Bug-Fixes Using Pretrained Transformers (2021), arxiv 2021, Drain, Dawn, et al. [pdf]
  • Global Relational Models of Source Code (2020), ICLR'20, Hellendoorn, Vincent J., et al. [pdf]
  • Neural Program Repair by Jointly Learning to Localize and Repair (2019), arxiv 2019, Vasic, Marko, et al. [pdf]

Program Translation

  • Multilingual Code Snippets Training for Program Translation (2022), AAAI'22, Zhu, Ming, et al. [pdf]
  • Semantics-Recovering Decompilation through Neural Machine Translation (2021), arxiv 2021, Liang, Ruigang, et al. [pdf]
  • Unsupervised Translation of Programming Languages (2020), arxiv 2020, Lachaux, Marie-Anne et al. [pdf]

Code Clone Detection

  • funcGNN: A Graph Neural Network Approach to Program Similarity (2020), ESEM'20, Nair, Aravind, et al. [pdf]
  • Cross-Language Clone Detection by Learning Over Abstract Syntax Trees (2019), MSR'19, Perez, Daniel, et al.
  • The Adverse Effects of Code Duplication in Machine Learning Models of Code (2019), Onward! 2019, Allamanis, Miltiadis, [pdf]

Empirical Studies

  • Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code, ASE'21, Paltenghi, M. & Pradel, M.
  • An Empirical Study of Transformers for Source Code, FSE'21, Chirkova, N., & Troshin, S.
  • An Empirical Study on the Usage of Transformer Models for Code Completion, MSR'21, Ciniselli, Matteo, et al.

Surveys

  • A Survey on Machine Learning Techniques for Source Code Analysis (2021), arxiv 2021, Sharma, Tushar, et al. [pdf]
  • Deep Learning & Software Engineering: State of Research and Future Directions (2020), arxiv 2020, Devanbu, Prem, et al. [pdf]
  • A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research (2020), arxiv 2020, Watson, Cody, et al. [pdf]
  • Machine Learning for Software Engineering: A Systematic Mapping (2020), arxiv 2020, Shafiq, Saad, et al. [pdf]
  • Synergy between Machine/Deep Learning and Software Engineering: How Far Are We? (2020), arxiv 2020, Wang, Simin, et al. [pdf]
  • Software Engineering Meets Deep Learning: A Literature Review (2020), arxiv 2020, Ferreira, Fabio, et al. [pdf]
  • Software Vulnerability Detection Using Deep Neural Networks: A Survey (2020), Proceedings of the IEEE, Lin, Guanjun, et al.
  • Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges (2020), arxiv 2020, Le, Triet HM, et al. [pdf]
  • A Survey of Machine Learning for Big Code and Naturalness (2018), ACM Computing Surveys, Allamanis, Miltiadis, et al. [pdf]

PhD Theses

  • Learning to Find Bugs in Programs and their Documentation (2021), Andrew Habib [pdf]
  • Machine Learning and the Science of Software Engineering (2020), Vincent Hellendoorn
  • Deep learning for compilers (2020), Christopher E. Cummins [pdf]
  • Deep Learning in Software Engineering (2020), Cody Watson [pdf]
  • Learning Code Transformations via Neural Machine Translation (2019), Michele Tufano [pdf]
  • Improving the Usability of Static Analysis Tools Using Machine Learning (2019), Ugur Koc [pdf]
  • Learning Natural Coding Conventions (2016), Miltiadis Allamanis [pdf]

Talks

  • Machine Learning for Software Engineering: AMA, MSR 2020 [video]
  • Understanding Source Code with Deep Learning, FOSDEM 2019 [video]

Datasets

Tools

Research Groups

Venues

  • ICSE, the International Conference on Software Engineering
  • FSE, Symposium on the Foundations of Software Engineering
  • ASE, the International Conference on Automated Software Engineering
  • MSR, the Mining Software Repositories conference
  • ICLR, the International Conference on Learning Representations
  • ICML, the International Conference on Machine Learning
  • AAAI, Association for the Advancement of Artificial Intelligence
  • OOPSLA, the ACM Conference on Systems, Programming, Languages, and Applications
  • TSE, the IEEE Transactions on Software Engineering
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].