All Projects → pbeshai → Tidy

pbeshai / Tidy

Licence: mit
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse

Programming Languages

typescript
32286 projects

Projects that are alternatives of or similar to Tidy

Moderndive book
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Stars: ✭ 527 (+71.66%)
Mutual labels:  tidyverse, dplyr
R4ds Exercise Solutions
Exercise solutions to "R for Data Science"
Stars: ✭ 226 (-26.38%)
Mutual labels:  tidyverse, dplyr
Tidyquant
Bringing financial analysis to the tidyverse
Stars: ✭ 635 (+106.84%)
Mutual labels:  tidyverse, dplyr
Tidyquery
Query R data frames with SQL
Stars: ✭ 138 (-55.05%)
Mutual labels:  tidyverse, dplyr
datar
A Grammar of Data Manipulation in python
Stars: ✭ 142 (-53.75%)
Mutual labels:  dplyr, tidyverse
Tidylog
Tidylog provides feedback about dplyr and tidyr operations. It provides wrapper functions for the most common functions, such as filter, mutate, select, and group_by, and provides detailed output for joins.
Stars: ✭ 428 (+39.41%)
Mutual labels:  tidyverse, dplyr
Tidyheatmap
Draw heatmap simply using a tidy data frame
Stars: ✭ 151 (-50.81%)
Mutual labels:  tidyverse, dplyr
Timetk
A toolkit for working with time series in R
Stars: ✭ 371 (+20.85%)
Mutual labels:  tidyverse, dplyr
implyr
SQL backend to dplyr for Impala
Stars: ✭ 74 (-75.9%)
Mutual labels:  dplyr, tidyverse
CSSS508
CSSS508: Introduction to R for Social Scientists
Stars: ✭ 28 (-90.88%)
Mutual labels:  dplyr, tidyverse
eeguana
A package for manipulating EEG data in R.
Stars: ✭ 16 (-94.79%)
Mutual labels:  dplyr, tidyverse
casewhen
Create reusable dplyr::case_when() functions
Stars: ✭ 64 (-79.15%)
Mutual labels:  dplyr, tidyverse
advanced-data-wrangling-in-R-legacy
Advanced-data-wrangling-in-R, Workshop
Stars: ✭ 14 (-95.44%)
Mutual labels:  dplyr, tidyverse
parcours-r
Valise pédagogique pour la formation à R
Stars: ✭ 25 (-91.86%)
Mutual labels:  dplyr, tidyverse
Introduction Datascience Python Book
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications
Stars: ✭ 275 (-10.42%)
Mutual labels:  data
Surveykit
Android library to create beautiful surveys (aligned with ResearchKit on iOS)
Stars: ✭ 288 (-6.19%)
Mutual labels:  data
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-11.07%)
Mutual labels:  data
Covid Tracking Data
Stars: ✭ 271 (-11.73%)
Mutual labels:  data
Baize
白泽自动化运维系统:配置管理、网络探测、资产管理、业务管理、CMDB、CD、DevOps、作业编排、任务编排等功能,未来将添加监控、报警、日志分析、大数据分析等部分内容
Stars: ✭ 296 (-3.58%)
Mutual labels:  data
Android Nosql
Lightweight, simple structured NoSQL database for Android
Stars: ✭ 284 (-7.49%)
Mutual labels:  data

tidy.js

CircleCI npm

Tidy up your data with JavaScript! Inspired by dplyr and the tidyverse, tidy.js attempts to bring the ergonomics of data manipulation from R to javascript (and typescript). The primary goals of the project are:

  • Readable code. Tidy.js prioritizes making your data transformations readable, so future you and your teammates can get up and running quickly.

  • Standard transformation verbs. Tidy.js is built using battle-tested verbs from the R community that can handle any data wrangling need.

  • Work with plain JS objects. No wrapper classes needed — all tidy.js needs is an array of plain old-fashioned JS objects to get started. Simple in, simple out.

Secondarily, this project aims to provide acceptable types for the functions provided.

Quick Links

Related work

Be sure to check out a very similar project, Arquero, from UW Data.

Getting started

To start using tidy, your best bet is to install from npm:

npm install @tidyjs/tidy
# or
yarn add @tidyjs/tidy

Then import the functions you need:

import { tidy, mutate, arrange, desc } from '@tidyjs/tidy'

Note if you're just trying tidy in a browser, you can use the UMD version hosted on unpkg (codesandbox example):

<script src="https://d3js.org/d3-array.v2.min.js"></script>
<script src="https://www.unpkg.com/@tidyjs/tidy/dist/umd/tidy.min.js"></script>
<script>
  const { tidy, mutate, arrange, desc } = Tidy;
  // ...
</script>  

And use them on an array of objects:

const data = [
  { a: 1, b: 10 }, 
  { a: 3, b: 12 }, 
  { a: 2, b: 10 }
]

const results = tidy(
  data, 
  mutate({ ab: d => d.a * d.b }),
  arrange(desc('ab'))
)

The output is:

[
  { a: 3, b: 12, ab: 36},
  { a: 2, b: 10, ab: 20},
  { a: 1, b: 10, ab: 10}
]

All tidy.js code is wrapped in a tidy flow via the tidy() function. The first argument is the array of data, followed by the transformation verbs to run on the data. The actual functions passed to tidy() can be anything so long as they fit the form:

(items: object[]) => object[]

For example, the following is valid:

tidy(
  data, 
  items => items.filter((d, i) => i % 2 === 0),
  arrange(desc('value'))
)

All tidy verbs fit this style, with the exception of exports from groupBy, discussed below.

Grouping data with groupBy

Besides manipulating flat lists of data, tidy provides facilities for wrangling grouped data via the groupBy() function.

import { tidy, summarize, sum, groupBy } from '@tidyjs/tidy'

const data = [
  { key: 'group1', value: 10 }, 
  { key: 'group2', value: 9 }, 
  { key: 'group1', value: 7 }
]

tidy(
  data,
  groupBy('key', [
    summarize({ total: sum('value') })
  ])
)

The output is:

[
  { "key": "group1", "total": 17 },
  { "key": "group2", "total": 9 },
]

The groupBy() function works similarly to tidy() in that it takes a flow of functions as its second argument (wrapped in an array). Things get really fun when you use groupBy's third argument for exporting the grouped data into different shapes.

For example, exporting data as a nested object, we can use groupBy.object() as the third argument to groupBy().

const data = [
  { g: 'a', h: 'x', value: 5 },
  { g: 'a', h: 'y', value: 15 },
  { g: 'b', h: 'x', value: 10 },
  { g: 'b', h: 'x', value: 20 },
  { g: 'b', h: 'y', value: 30 },
]

tidy(
  data,
  groupBy(
    ['g', 'h'], 
    [
      mutate({ key: d => `\${d.g}\${d.h}`})
    ], 
    groupBy.object() // <-- specify the export
  )
);

The output is:

{
  "a": {
    "x": [{"g": "a", "h": "x", "value": 5, "key": "ax"}],
    "y": [{"g": "a", "h": "y", "value": 15, "key": "ay"}]
  },
  "b": {
    "x": [
      {"g": "b", "h": "x", "value": 10, "key": "bx"},
      {"g": "b", "h": "x", "value": 20, "key": "bx"}
    ],
    "y": [{"g": "b", "h": "y", "value": 30, "key": "by"}]
  }
}

Or alternatively as { key, values } entries-objects via groupBy.entriesObject():

tidy(data,
  groupBy(
    ['g', 'h'], 
    [
      mutate({ key: d => `\${d.g}\${d.h}`})
    ], 
    groupBy.entriesObject() // <-- specify the export
  )
);

The output is:

[
  {
    "key": "a",
    "values": [
      {"key": "x", "values": [{"g": "a", "h": "x", "value": 5, "key": "ax"}]},
      {"key": "y", "values": [{"g": "a", "h": "y", "value": 15, "key": "ay"}]}
    ]
  },
  {
    "key": "b",
    "values": [
      {
        "key": "x",
        "values": [
          {"g": "b", "h": "x", "value": 10, "key": "bx"},
          {"g": "b", "h": "x", "value": 20, "key": "bx"}
        ]
      },
      {"key": "y", "values": [{"g": "b", "h": "y", "value": 30, "key": "by"}]}
    ]
  }
]

It's common to be left with a single leaf in a groupBy set, especially after running summarize(). To prevent your exported data having its values wrapped in an array, you can pass the single option to it.

tidy(input,
  groupBy(['g', 'h'], [
    summarize({ total: sum('value') })
  ], groupBy.object({ single: true }))
);

The output is:

{
  "a": {
    "x": {"total": 5, "g": "a", "h": "x"},
    "y": {"total": 15, "g": "a", "h": "y"}
  },
  "b": {
    "x": {"total": 30, "g": "b", "h": "x"},
    "y": {"total": 30, "g": "b", "h": "y"}
  }
}

Visit the API reference docs to learn more about how each function works and all the options they take. Be sure to check out the levels export, which can let you mix-and-match different export types based on the depth of the data. For quick reference, other available groupBy exports include:

  • groupBy.entries()
  • groupBy.entriesObject()
  • groupBy.grouped()
  • groupBy.levels()
  • groupBy.object()
  • groupBy.keys()
  • groupBy.map()
  • groupBy.values()

Developing

clone the repo:

git clone [email protected]:pbeshai/tidy.git

install dependencies:

yarn

initialize lerna:

lerna bootstrap

build tidy:

yarn run build

test all of tidy:

yarn run test

test:watch a single package

yarn workspace @tidyjs/tidy test:watch

Conventional commits

This library uses conventional commits, following the angular convention. Prefixes are:

  • build: Changes that affect the build system or external dependencies (example scopes: yarn, npm)
  • ci: Changes to our CI configuration files and scripts (e.g. CircleCI)
  • chore
  • docs: Documentation only changes
  • feat : A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • test: Adding missing tests or correcting existing tests

Docs website

start the local site:

yarn start:web

build the site:

yarn build:web

deploy the site via github-pages:

USE_SSH=true GIT_USER=pbeshai yarn workspace @tidyjs/tidy-website deploy

Ideally we can automate this via github actions one day!


Shout out to Netflix

I want to give a big shout out to Netflix, my current employer, for giving me the opportunity to work on this project and to open source it. It's a great place to work and if you enjoy tinkering with data-related things, I'd strongly recommend checking out our analytics department. – Peter Beshai

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].