All Projects → uwdata → Arquero

uwdata / Arquero

Licence: bsd-3-clause
Query processing and transformation of array-backed data tables.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Arquero

Aresdb
A GPU-powered real-time analytics storage and query engine.
Stars: ✭ 2,814 (+632.81%)
Mutual labels:  query, database, data
Sheetjs
📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Stars: ✭ 28,479 (+7316.41%)
Mutual labels:  database, data, table
Postgui
A React web application to query and share any PostgreSQL database.
Stars: ✭ 260 (-32.29%)
Mutual labels:  database, data
Inquiry Deprecated
[DEPRECATED]: Prefer Room by Google, or SQLDelight by Square.
Stars: ✭ 264 (-31.25%)
Mutual labels:  query, database
Awesome Cybersecurity Datasets
A curated list of amazingly awesome Cybersecurity datasets
Stars: ✭ 380 (-1.04%)
Mutual labels:  dataframe, data
json-to-html-converter
Converts JSON data to HTML table with collapsible details view for nested objects.
Stars: ✭ 13 (-96.61%)
Mutual labels:  table, transform
DataFrame
DataFrame Library for Java
Stars: ✭ 51 (-86.72%)
Mutual labels:  table, dataframe
Altair
✨⚡️ A beautiful feature-rich GraphQL Client for all platforms.
Stars: ✭ 3,827 (+896.61%)
Mutual labels:  database, data
Datasheets
Read data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+54.43%)
Mutual labels:  dataframe, data
Django Smuggler
Django Smuggler is a pluggable application for Django Web Framework that helps you to import/export fixtures via the automatically-generated administration interface.
Stars: ✭ 350 (-8.85%)
Mutual labels:  database, data
Pystore
Fast data store for Pandas time-series data
Stars: ✭ 325 (-15.36%)
Mutual labels:  dataframe, database
Semanticmediawiki
🔗 Semantic MediaWiki turns MediaWiki into a knowledge management platform with query and export capabilities
Stars: ✭ 359 (-6.51%)
Mutual labels:  query, database
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-60.94%)
Mutual labels:  dataframe, database
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+68.49%)
Mutual labels:  dataframe, data
fastener
Functional Zipper for manipulating JSON
Stars: ✭ 54 (-85.94%)
Mutual labels:  query, transform
Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+59.11%)
Mutual labels:  dataframe, data
Preql
An interpreted relational query language that compiles to SQL.
Stars: ✭ 257 (-33.07%)
Mutual labels:  query, database
React Query
⚛️ Hooks for fetching, caching and updating asynchronous data in React
Stars: ✭ 24,427 (+6261.2%)
Mutual labels:  query, data
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (+53.65%)
Mutual labels:  dataframe, data
Pandasvault
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (-17.71%)
Mutual labels:  dataframe, table

Arquero

Arquero is a JavaScript library for query processing and transformation of array-backed data tables. Following the relational algebra and inspired by the design of dplyr, Arquero provides a fluent API for manipulating column-oriented data frames. Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations.

  • Fast: process data tables with million+ rows.
  • Flexible: query over arrays, typed arrays, array-like objects, or Apache Arrow columns.
  • Full-Featured: perform a variety of wrangling and analysis tasks.
  • Extensible: add new column types or functions, including aggregate & window operations.
  • Lightweight: small size, minimal dependencies.

To get up and running, start with the Introducing Arquero tutorial, part of the Arquero notebook collection.

Arquero is Spanish for "archer": if datasets are arrows, Arquero helps their aim stay true. 🏹 Arquero also refers to a goalkeeper: safeguard your data from analytic "own goals"! 🥅 ✋ ⚽

API Documentation

  • Top-Level API - All methods in the top-level Arquero namespace.
  • Table - Table access and output methods.
  • Verbs - Table transformation verbs.
  • Op Functions - All functions, including aggregate and window functions.
  • Expressions - Parsing and generation of table expressions.
  • Extensibility - Extend Arquero with new expression functions or table verbs.

Example

The core abstractions in Arquero are data tables, which model each column as an array of values, and verbs that transform data and return new tables. Verbs are table methods, allowing method chaining for multi-step transformations. Though each table is unique, many verbs reuse the underlying columns to limit duplication.

import { all, desc, op, table } from 'arquero';

// Average hours of sunshine per month, from https://usclimatedata.com/.
const dt = table({
  'Seattle': [69,108,178,207,253,268,312,281,221,142,72,52],
  'Chicago': [135,136,187,215,281,311,318,283,226,193,113,106],
  'San Francisco': [165,182,251,281,314,330,300,272,267,243,189,156]
});

// Sorted differences between Seattle and Chicago.
// Table expressions use arrow function syntax.
dt.derive({
    month: d => op.row_number(),
    diff:  d => d.Seattle - d.Chicago
  })
  .select('month', 'diff')
  .orderby(desc('diff'))
  .print();

// Is Seattle more correlated with San Francisco or Chicago?
// Operations accept column name strings outside a function context.
dt.rollup({
    corr_sf:  op.corr('Seattle', 'San Francisco'),
    corr_chi: op.corr('Seattle', 'Chicago')
  })
  .print();

// Aggregate statistics per city, as output objects.
// Reshape (fold) the data to a two column layout: city, sun.
dt.fold(all(), { as: ['city', 'sun'] })
  .groupby('city')
  .rollup({
    min:  d => op.min(d.sun), // functional form of op.min('sun')
    max:  d => op.max(d.sun),
    avg:  d => op.average(d.sun),
    med:  d => op.median(d.sun),
    // functional forms permit flexible table expressions
    skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0
  })
  .objects()

Usage

In Browser

To use in the browser, you can load Arquero from a content delivery network:

<script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>

Arquero will be imported into the aq global object. The default browser bundle does not include the Apache Arrow library. To perform Arrow encoding using toArrow() or binary file loading using loadArrow(), import Apache Arrow first:

<script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>

Alternatively, you can build and import arquero.min.js from the dist directory, or build your own application bundle. When building custom application bundles for the browser, the module bundler should draw from the browser property of Arquero's package.json file. For example, if using rollup, pass the browser: true option to the node-resolve plugin.

Arquero uses modern JavaScript features, and so will not work with some outdated browsers. To use Arquero with older browsers including Internet Explorer, set up your project with a transpiler such as Babel.

In Node.js or Application Bundles

First install arquero as a dependency, via npm install arquero --save or yarn add arquero. Arquero assumes Node version 12 or higher.

Import using CommonJS module syntax:

const aq = require('arquero');

Import using ES module syntax, import all exports into a single object:

import * as aq from 'arquero';

Import using ES module syntax, with targeted imports:

import { op, table } from 'arquero';

Build Instructions

To build and develop Arquero locally:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].