All Projects → lorenzwalthert → gitsum

lorenzwalthert / gitsum

Licence: MIT license
parse and summarise git repository history

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to gitsum

ParseCareKit
Securely synchronize any CareKit 2.1+ based app to a Parse Server Cloud. Compatible with parse-hipaa.
Stars: ✭ 28 (-34.88%)
Mutual labels:  parse
krokus
A library to format numbers and a collection for localization patterns.
Stars: ✭ 16 (-62.79%)
Mutual labels:  parse
libdvbtee
dvbtee: a digital television streamer / parser / service information aggregator supporting various interfaces including telnet CLI & http control
Stars: ✭ 65 (+51.16%)
Mutual labels:  parse
limelight
A php Japanese language text analyzer and parser.
Stars: ✭ 76 (+76.74%)
Mutual labels:  parse
icc
JavaScript module to parse International Color Consortium (ICC) profiles
Stars: ✭ 37 (-13.95%)
Mutual labels:  parse
HttpUtility
HttpUtility is an open source MIT license project which is helpful in making HTTP requests and returns a decoded object from server. Right now this utility only parses JSON.
Stars: ✭ 28 (-34.88%)
Mutual labels:  parse
jgeXml
The Just-Good-Enough XML Toolkit
Stars: ✭ 20 (-53.49%)
Mutual labels:  parse
CROHME extractor
CROHME dataset extractor for OFFLINE-text-recognition task.
Stars: ✭ 77 (+79.07%)
Mutual labels:  parse
Astview
Astview is a graphical viewer for abstract syntax trees
Stars: ✭ 20 (-53.49%)
Mutual labels:  parse
parse-commit-message
(!! moved to tunnckoCore/opensource !! try `parse-commit-message@canary`) Parse, stringify or validate a commit messages that follows Conventional Commits Specification
Stars: ✭ 31 (-27.91%)
Mutual labels:  parse
abstract-syntax-tree
A library for working with abstract syntax trees.
Stars: ✭ 77 (+79.07%)
Mutual labels:  parse
xml-to-json
Simple API that converts dynamic XML feeds to JSON through a URL or pasting the raw XML data. Made 100% in PHP.
Stars: ✭ 38 (-11.63%)
Mutual labels:  parse
mtgsqlive
MTGJSON build scripts to generate alternative data formats
Stars: ✭ 40 (-6.98%)
Mutual labels:  parse
elm-html-parser
Parse HTML in Elm!
Stars: ✭ 44 (+2.33%)
Mutual labels:  parse
pyhaproxy
Python library to parse haproxy configurations
Stars: ✭ 50 (+16.28%)
Mutual labels:  parse
parse
Parse with an Eloquent-like interface for Laravel
Stars: ✭ 15 (-65.12%)
Mutual labels:  parse
fluent-plugin-http-pull
The input plugin of fluentd to pull log from rest api.
Stars: ✭ 19 (-55.81%)
Mutual labels:  parse
der-parser
BER/DER parser written in pure Rust. Fast, zero-copy, safe.
Stars: ✭ 73 (+69.77%)
Mutual labels:  parse
astutils
Bare essentials for building abstract syntax trees, and skeleton classes for PLY lexers and parsers.
Stars: ✭ 13 (-69.77%)
Mutual labels:  parse
marc4js
A Node.js API for handling MARC
Stars: ✭ 35 (-18.6%)
Mutual labels:  parse

Package is work in progress! If you encounter errors / problems, please file an issue or make a PR.

codecov Build Status AppVeyor Build Status

Introduction

This package parses a git repository history to collect comprehensive information about the activity in the repo. The parsed data is made available to the user in a tabular format. The package can also generate reports based on the parse data. You can install the development version from GitHub.

remotes::install_github("lorenzwalthert/gitsum")

There are two main functions for parsing the history, both return tabular data:

  • parse_log_simple() is a relatively fast parser and returns a tibble with one commit per row. There is no file-specific information.
  • parse_log_detailed() outputs a nested tibble and for each commit, the names of the amended files, number of lines changed ect. available. This function is slower.

report_git() creates a html, pdf, or word report with the parsed log data according to a template. Templates can be created by the user or a template from the gitsum package can be used.

Let’s see the package in action.

library("gitsum")
library("tidyverse")
library("forcats")

We can obtain a parsed log like this:

init_gitsum()
tbl <- parse_log_detailed() %>%
  select(short_hash, short_message, total_files_changed, nested)
tbl 
#> # A tibble: 149 x 4
#>    short_hash short_message        total_files_changed nested           
#>    <chr>      <chr>                              <int> <list>           
#>  1 243f       initial commit                         7 <tibble [7 × 5]> 
#>  2 f8ee       add log example data                   1 <tibble [1 × 5]> 
#>  3 6328       add parents                            3 <tibble [3 × 5]> 
#>  4 dfab       intermediate                           1 <tibble [1 × 5]> 
#>  5 7825       add licence                            1 <tibble [1 × 5]> 
#>  6 2ac3       add readme                             2 <tibble [2 × 5]> 
#>  7 7a2a       document log data                      1 <tibble [1 × 5]> 
#>  8 943c       add helpfiles                         10 <tibble [10 × 5]>
#>  9 917e       update infrastructur                   3 <tibble [3 × 5]> 
#> 10 4fc0       remove garbage                         6 <tibble [6 × 5]> 
#> # ... with 139 more rows

Since we used parse_log_detailed(), there is detailed file-specific information available for every commit:

tbl$nested[[3]]
#> # A tibble: 3 x 5
#>   changed_file edits insertions deletions is_exact
#>   <chr>        <int>      <int>     <int> <lgl>   
#> 1 DESCRIPTION      6          5         1 T       
#> 2 NAMESPACE        3          2         1 T       
#> 3 R/get_log.R     19         11         8 T

Since the data has such a high resolution, various graphs, tables etc. can be produced from it to provide insights into the git history.

Examples

Since the output of git_log_detailed() is a nested tibble, you can work on it as you work on any other tibble. Let us first have a look at who comitted to this repository:

log <- parse_log_detailed()
log %>%
group_by(author_name) %>%
  summarize(n = n())
#> # A tibble: 3 x 2
#>   author_name         n
#>   <chr>           <int>
#> 1 Jon Calder          2
#> 2 jonmcalder          6
#> 3 Lorenz Walthert   141

We can also investigate how the number of lines of each file in the R directory evolved. For that, we probaly want to view files with changed names as one file. Also, we probably don’t want to see boring plots for files that got changed only a few times. Let’s focus on files that were changed in at least five commits.

lines <- log %>%
  unnest_log() %>%
  set_changed_file_to_latest_name() %>%
  add_line_history()
#> The following name changes were identified (11 in total):
#> ● man/{get_log.Rd => get_log_simple.Rd}
#> ● man/{parse_log.Rd => parse_log_one.Rd}
#> ● man/{get_pattern.Rd => get_pattern_multiple.Rd}
#> ● man/{get_log_regex.Rd => git_log_detailed.Rd}
#> ● man/{rmd_simple.Rd => git_report.Rd}
#> ● R/{gitsum.R => gitsum-package.R}
#> ● man/{git_log_detailed.Rd => parse_log_detailed.Rd}
#> ● man/{git_log_simple.Rd => parse_log_simple.Rd}
#> ● man/{ensure_gitusm_repo.Rd => ensure_gitsum_repo.Rd}
#> ● man/{log.Rd => gitsumlog.Rd}
#> ● man/{git_report.Rd => report_git.Rd}

r_files <- grep("^R/", lines$changed_file, value = TRUE)

to_plot <- lines %>%
  filter(changed_file %in% r_files) %>%
  add_n_times_changed_file() %>%
  filter(n_times_changed_file >= 10)
ggplot(to_plot, aes(x = date, y = current_lines)) + 
  geom_step() + 
  scale_y_continuous(name = "Number of Lines", limits = c(0, NA)) + 
  facet_wrap(~changed_file, scales = "free_y")

Next, we want to see which files were contained in most commits:

log %>%
  unnest_log() %>%
  mutate(changed_file = fct_lump(fct_infreq(changed_file), n = 10)) %>%
  filter(changed_file != "Other") %>%
  ggplot(aes(x = changed_file)) + geom_bar() + coord_flip() + 
  theme_minimal()

We can also easily get a visual overview of the number of insertions & deletions in commits over time:

commit.dat <- data.frame(
    edits = rep(c("Insertions", "Deletions"), each = nrow(log)),
    commit = rep(1:nrow(log), 2),
    count = c(log$total_insertions, -log$total_deletions))
    
ggplot(commit.dat, aes(x = commit, y = count, fill = edits)) + 
  geom_bar(stat = "identity", position = "identity") +  
  theme_minimal()

Or the number of commits broken down by day of the week:

log %>%
  mutate(weekday = factor(weekday, c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))) %>% 
  ggplot(aes(x = weekday)) + geom_bar() + 
  theme_minimal()

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].