All Projects → tidyverse → Dplyr

tidyverse / Dplyr

Licence: other
dplyr: A grammar of data manipulation

Programming Languages

r
7636 projects
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Dplyr

brain-brew
Automated Anki flashcard creation and extraction to/from Csv
Stars: ✭ 55 (-98.6%)
Mutual labels:  data-manipulation
python-programming-for-data-science
Content from the University of British Columbia's Master of Data Science course DSCI 511.
Stars: ✭ 29 (-99.26%)
Mutual labels:  data-manipulation
learning R
List of resources for learning R
Stars: ✭ 32 (-99.18%)
Mutual labels:  data-manipulation
pewmethods
Pew Research Center Methods team R package of miscellaneous functions
Stars: ✭ 121 (-96.92%)
Mutual labels:  data-manipulation
database
Aplus Framework Database Library
Stars: ✭ 147 (-96.25%)
Mutual labels:  data-manipulation
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (-98.88%)
Mutual labels:  data-manipulation
datar
A Grammar of Data Manipulation in python
Stars: ✭ 142 (-96.38%)
Mutual labels:  data-manipulation
IndexedTables.jl
Flexible tables with ordered indices
Stars: ✭ 108 (-97.25%)
Mutual labels:  data-manipulation
Table-Extractor-From-Image
This repository contains the code that extracts a table from an image and exports it to an Excel.
Stars: ✭ 46 (-98.83%)
Mutual labels:  data-manipulation
fastverse
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
Stars: ✭ 123 (-96.86%)
Mutual labels:  data-manipulation
OpenOmics
A bioinformatics API and web-app to integrate multi-omics datasets & interface with public databases.
Stars: ✭ 22 (-99.44%)
Mutual labels:  data-manipulation
dqlab-career-track
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-98.65%)
Mutual labels:  data-manipulation
rl trading
No description or website provided.
Stars: ✭ 14 (-99.64%)
Mutual labels:  data-manipulation
hacksaw
Extra tidyverse-like functionality
Stars: ✭ 33 (-99.16%)
Mutual labels:  data-manipulation
CRC-manipulator
Change CRC checksums of your files.
Stars: ✭ 73 (-98.14%)
Mutual labels:  data-manipulation
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-99.29%)
Mutual labels:  data-manipulation
Interactive Data Editor
A Software to interactively edit data in a graphical manner
Stars: ✭ 35 (-99.11%)
Mutual labels:  data-manipulation
Cyberchef
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
Stars: ✭ 13,674 (+248.56%)
Mutual labels:  data-manipulation

dplyr

CRAN status R build status Codecov test coverage

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

If you are new to dplyr, the best place to start is the data transformation chapter in R for data science.

Backends

In addition to data frames/tibbles, dplyr makes working with other computational backends accessible and efficient. Below is a list of alternative backends:

  • dtplyr: for large, in-memory datasets. Translates your dplyr code to high performance data.table code.

  • dbplyr: for data stored in a relational database. Translates your dplyr code to SQL.

  • sparklyr: for very large datasets stored in Apache Spark.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dplyr:
install.packages("dplyr")

Development version

To get a bug fix or to use a feature from the development version, you can install the development version of dplyr from GitHub.

# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

Cheat Sheet

Usage

library(dplyr)

starwars %>% 
  filter(species == "Droid")
#> # A tibble: 6 × 14
#>   name   height  mass hair_color skin_color  eye_color birth_year sex   gender  
#>   <chr>   <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr> <chr>   
#> 1 C-3PO     167    75 <NA>       gold        yellow           112 none  masculi…
#> 2 R2-D2      96    32 <NA>       white, blue red               33 none  masculi…
#> 3 R5-D4      97    32 <NA>       white, red  red               NA none  masculi…
#> 4 IG-88     200   140 none       metal       red               15 none  masculi…
#> 5 R4-P17     96    NA none       silver, red red, blue         NA none  feminine
#> # … with 1 more row, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

starwars %>% 
  select(name, ends_with("color"))
#> # A tibble: 87 × 4
#>   name           hair_color skin_color  eye_color
#>   <chr>          <chr>      <chr>       <chr>    
#> 1 Luke Skywalker blond      fair        blue     
#> 2 C-3PO          <NA>       gold        yellow   
#> 3 R2-D2          <NA>       white, blue red      
#> 4 Darth Vader    none       white       yellow   
#> 5 Leia Organa    brown      light       brown    
#> # … with 82 more rows

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
#> # A tibble: 87 × 4
#>   name           height  mass   bmi
#>   <chr>           <int> <dbl> <dbl>
#> 1 Luke Skywalker    172    77  26.0
#> 2 C-3PO             167    75  26.9
#> 3 R2-D2              96    32  34.7
#> 4 Darth Vader       202   136  33.3
#> 5 Leia Organa       150    49  21.8
#> # … with 82 more rows

starwars %>% 
  arrange(desc(mass))
#> # A tibble: 87 × 14
#>   name    height  mass hair_color skin_color  eye_color  birth_year sex   gender
#>   <chr>    <int> <dbl> <chr>      <chr>       <chr>           <dbl> <chr> <chr> 
#> 1 Jabba …    175  1358 <NA>       green-tan,… orange          600   herm… mascu…
#> 2 Grievo…    216   159 none       brown, whi… green, ye…       NA   male  mascu…
#> 3 IG-88      200   140 none       metal       red              15   none  mascu…
#> 4 Darth …    202   136 none       white       yellow           41.9 male  mascu…
#> 5 Tarfful    234   136 brown      brown       blue             NA   male  mascu…
#> # … with 82 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(
    n > 1,
    mass > 50
  )
#> # A tibble: 8 × 3
#>   species      n  mass
#>   <chr>    <int> <dbl>
#> 1 Droid        6  69.8
#> 2 Gungan       3  74  
#> 3 Human       35  82.8
#> 4 Kaminoan     2  88  
#> 5 Mirialan     2  53.1
#> # … with 3 more rows

Getting help

If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub. For questions and other discussion, please use community.rstudio.com or the manipulatr mailing list.


Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].