All Projects → mauricioaniche → Repodriller

mauricioaniche / Repodriller

a tool to support researchers on mining software repositories studies

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Repodriller

My Configurations
Chris Hough's .dot files + workstation setup
Stars: ✭ 40 (-73.86%)
Mutual labels:  software-engineering
Hack The Media
This repo collects examples of intentional and unintentional hacks of media sources
Stars: ✭ 1,194 (+680.39%)
Mutual labels:  software-engineering
Seai
CMU Lecture: Machine Learning In Production / AI Engineering / Software Engineering for AI-Enabled Systems (SE4AI)
Stars: ✭ 114 (-25.49%)
Mutual labels:  software-engineering
Monocle
Detect anomalies in your GitHub/Gerrit projects
Stars: ✭ 50 (-67.32%)
Mutual labels:  software-engineering
Software Development Resources
Curated list of Software Development resources
Stars: ✭ 67 (-56.21%)
Mutual labels:  software-engineering
Soft Eng Interview Prep
Everything you need to know for a Software Engineering interview
Stars: ✭ 1,341 (+776.47%)
Mutual labels:  software-engineering
Algos And Data Structures
Collection of Test Specs and Implementation of various algorithms and data structures from the Princeton Coursera course: Intro to Algorithms part 1 and 2
Stars: ✭ 31 (-79.74%)
Mutual labels:  software-engineering
Refactoring Summary 2nd Javascript
Summary of "Refactoring: Improving the Design of Existing Code (2nd Edition)" by Martin Fowler
Stars: ✭ 142 (-7.19%)
Mutual labels:  software-engineering
The Engineering Managers Booklist
Books for people who are or aspire to manage/lead team(s) of software engineers
Stars: ✭ 1,180 (+671.24%)
Mutual labels:  software-engineering
When Ts
When: recombinant design pattern for state machines based on gene expression with a temporal model
Stars: ✭ 112 (-26.8%)
Mutual labels:  software-engineering
Lectures
Lecture scripts and slides I use during the Software Engineering course at TU Dresden
Stars: ✭ 52 (-66.01%)
Mutual labels:  software-engineering
Sttp Book
The "Software Testing: From Theory to Practice" book (source)
Stars: ✭ 65 (-57.52%)
Mutual labels:  software-engineering
Awesome Software Engineer Topics
A list of useful articles and videos generated from my Instapaper archived list on Software Design, Testing, Public Speaking, etc.
Stars: ✭ 97 (-36.6%)
Mutual labels:  software-engineering
Awesome Cto
A curated and opinionated list of resources for Chief Technology Officers, with the emphasis on startups
Stars: ✭ 10,834 (+6981.05%)
Mutual labels:  software-engineering
Privacyflash Pro
Generate a privacy policy for your iOS app
Stars: ✭ 114 (-25.49%)
Mutual labels:  software-engineering
App Academy
My solutions to all of App Academy's software engineering curriculum's coding challenges & projects.
Stars: ✭ 37 (-75.82%)
Mutual labels:  software-engineering
Fccss
Computer Science SCHOOL resources
Stars: ✭ 84 (-45.1%)
Mutual labels:  software-engineering
Jetson
Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Stars: ✭ 151 (-1.31%)
Mutual labels:  software-engineering
Ida for mac green
IDA Pro for macOS绿化
Stars: ✭ 129 (-15.69%)
Mutual labels:  software-engineering
Awesome Technical Debt
A curated list of Technical Debt talks, articles and books.
Stars: ✭ 110 (-28.1%)
Mutual labels:  software-engineering

(Before looking into RepoDriller, I suggest you to check Pydriller, a Python version of RepoDriller, which is now faster and easier to use! I am keeping this repo here for historical purposes, but I don't plan to update it anymore!)

RepoDriller

Build Status

RepoDriller is a Java framework that helps developers on mining software repositories. With it, you can easily extract information from any Git repository, such as commits, developers, modifications, diffs, and source codes, and quickly export CSV files.

Take a look at our manual folder and our many examples. Or talk to us in our mailing list.

Advice to researchers

Difficulties in mining git

You should read this paper:

  • Bird, Christian, et al. "The promises and perils of mining git." Mining Software Repositories, 2009. MSR'09. 6th IEEE International Working Conference on. IEEE, 2009. Link.

FAQs

Why use an MSR framework?

There's no question that Mining Software Repositories (MSR) studies benefit from automation. The datasets are too large to analyze manually.

So the choice is whether to use an MSR framework or to write your own scripts. An MSR framework offers two benefits:

  • The researcher can focus on their questions and not on the infrastructure.
  • Coding against a framework improves standardization and therefore reproducibility (see Robles, Gregorio. "Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings." Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on. IEEE, 2010.).

How is RepoDriller different from other MSR frameworks?

RepoDriller is a minimalist's MSR framework, a lightweight tool for flexible analysis.

  • RepoDriller is lightweight:
    1. It's a straightforward Java framework with the APIs you need -- no more, no less.
    2. You pay for storage and computation when you need to. No significant pre-processing stage, no giant database.
  • RepoDriller is flexible:
    1. Write arbitrary analyses in the popular Java programming language.
    2. RepoDriller has the right knobs -- tune which commits you visit, how much concurrency you want, etc.

Here's how it compares to some other MSR frameworks and tools:

  • GHTorrent lets you query GitHub events.
    1. You are restricted to querying projects on GitHub.
    2. You are restricted to the information exposed in a GitHub API.
  • Boa lets you query ASTs on a pre-defined set of repositories.
    1. You are restricted to the repositories tracked by Boa.
    2. You must write queries in the Boa language, largely against ASTs.
    3. If you roll your own Boa cluster, you are restricted to repositories with languages that Boa can import (i.e. parse into ASTs).
  • Alitheia Core is a scalable platform for MSR.
    1. Alitheia-Core is a heavyweight approach. You pay a lot of up-front costs (configuration, pre-processing, etc.) in exchange for a scalable analysis. If you're doing exploratory research, the overhead may not be worth it.
    2. Alitheia Core is no longer being maintained.

How do I cite RepoDriller?

For now, cite the repository.

Is there a discussion forum?

You can subscribe to our mailing list: https://groups.google.com/forum/#!forum/repodriller.

How do I contribute?

Required: Git, Maven.

git clone https://github.com/mauricioaniche/repodriller.git
cd repodriller/test-repos
unzip \*.zip

Then, you can:

  • compile : mvn clean compile
  • test : mvn test
  • eclipse : mvn eclipse:eclipse
  • build : mvn clean compile assembly:single

License

This software is licensed under the Apache 2.0 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].