All Projects → datasweet → Datatable

datasweet / Datatable

Licence: apache-2.0
A go in-memory table

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Datatable

daany
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
Stars: ✭ 49 (-77.21%)
Mutual labels:  series, dataframe
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+332.09%)
Mutual labels:  dataframe, dataset
tv
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Stars: ✭ 1,763 (+720%)
Mutual labels:  datatable, dataframe
Weihanli.npoi
NPOI Extensions, excel/csv importer/exporter for IEnumerable<T>/DataTable, fluentapi(great flexibility)/attribute configuration
Stars: ✭ 157 (-26.98%)
Mutual labels:  dataset, datatable
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (-76.74%)
Mutual labels:  series, dataframe
Tech.ml.dataset
A Clojure high performance data processing system
Stars: ✭ 205 (-4.65%)
Mutual labels:  dataframe, dataset
Trump Lies
Tutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (-6.51%)
Mutual labels:  dataset
Mini Imagenet Tools
Tools for generating mini-ImageNet dataset and processing batches
Stars: ✭ 209 (-2.79%)
Mutual labels:  dataset
Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+957.67%)
Mutual labels:  dataframe
Inspectdf
🛠️ 📊 Tools for Exploring and Comparing Data Frames
Stars: ✭ 195 (-9.3%)
Mutual labels:  dataframe
Dialogrpt
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (+0.47%)
Mutual labels:  dataset
Pynasa
Stars: ✭ 212 (-1.4%)
Mutual labels:  dataset
Covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Stars: ✭ 208 (-3.26%)
Mutual labels:  dataset
Semantic Segmentation Suite
Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
Stars: ✭ 2,395 (+1013.95%)
Mutual labels:  dataset
Charlatan
Create fake data in R
Stars: ✭ 209 (-2.79%)
Mutual labels:  dataset
Awesome Json Datasets
A curated list of awesome JSON datasets that don't require authentication.
Stars: ✭ 2,421 (+1026.05%)
Mutual labels:  dataset
Ava downloader
⏬ Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
Stars: ✭ 214 (-0.47%)
Mutual labels:  dataset
Peroxide
Rust numeric library with R, MATLAB & Python syntax
Stars: ✭ 191 (-11.16%)
Mutual labels:  dataframe
React Table
⚛️ Hooks for building fast and extendable tables and datagrids for React
Stars: ✭ 15,739 (+7220.47%)
Mutual labels:  datatable
Omnianomaly
KDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Stars: ✭ 208 (-3.26%)
Mutual labels:  dataset

datatable

Circle CI Go Report Card GoDoc GitHub stars GitHub license

datasweet-logo

datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements

Installation

go get github.com/datasweet/datatable

Features

  • Create custom Series (ie custom columns). Currently available, serie.Int, serie.String, serie.Time, serie.Float64.
  • Apply expressions
  • Selects (head, tail, subset)
  • Sorting
  • InnerJoin, LeftJoin, RightJoin, OuterJoin, Concats
  • Aggregate
  • Import from CSV
  • Export to map, slice

Creating a DataTable

package main

import (
	"fmt"

	"github.com/datasweet/datatable"
)

func main() {
	dt := datatable.New("test")
	dt.AddColumn("champ", datatable.String, datatable.Values("Malzahar", "Xerath", "Teemo"))
	dt.AddColumn("champion", datatable.String, datatable.Expr("upper(`champ`)"))
	dt.AddColumn("win", datatable.Int, datatable.Values(10, 20, 666))
	dt.AddColumn("loose", datatable.Int, datatable.Values(6, 5, 666))
	dt.AddColumn("winRate", datatable.Float64, datatable.Expr("`win` * 100 / (`win` + `loose`)"))
	dt.AddColumn("winRate %", datatable.String, datatable.Expr(" `winRate` ~ \" %\""))
	dt.AddColumn("sum", datatable.Float64, datatable.Expr("sum(`win`)"))

	fmt.Println(dt)
}

/*
CHAMP <NULLSTRING>      CHAMPION <NULLSTRING>   WIN <NULLINT>   LOOSE <NULLINT> WINRATE <NULLFLOAT64>   WINRATE % <NULLSTRING>  SUM <NULLFLOAT64> 
Malzahar                MALZAHAR                10              6               62.5                    62.5 %                  696              
Xerath                  XERATH                  20              5               80                      80 %                    696              
Teemo                   TEEMO                   666             666             50                      50 %                    696    
*/

Reading a CSV and aggregate

package main

import (
	"fmt"
	"log"
	"os"
	"time"

	"github.com/datasweet/datatable"
	"github.com/datasweet/datatable/import/csv"
)

func main() {
	dt, err := csv.Import("csv", "phone_data.csv",
		csv.HasHeader(true),
		csv.AcceptDate("02/01/06 15:04"),
		csv.AcceptDate("2006-01"),
	)
	if err != nil {
		log.Fatalf("reading csv: %v", err)
	}

	dt.Print(os.Stdout, datatable.PrintMaxRows(24))

	dt2, err := dt.Aggregate(datatable.AggregateBy{Type: datatable.Count, Field: "index"})
	if err != nil {
		log.Fatalf("aggregate COUNT('index'): %v", err)
	}
	fmt.Println(dt2)

	groups, err := dt.GroupBy(datatable.GroupBy{
		Name: "year",
		Type: datatable.Int,
		Keyer: func(row datatable.Row) (interface{}, bool) {
			if d, ok := row["date"]; ok {
				if tm, ok := d.(time.Time); ok {
					return tm.Year(), true
				}
			}
			return nil, false
		},
	})
	if err != nil {
		log.Fatalf("GROUP BY 'year': %v", err)
	}
	dt3, err := groups.Aggregate(
		datatable.AggregateBy{Type: datatable.Sum, Field: "duration"},
		datatable.AggregateBy{Type: datatable.CountDistinct, Field: "network"},
	)
	if err != nil {
		log.Fatalf("Aggregate SUM('duration'), COUNT_DISTINCT('network') GROUP BY 'year': %v", err)
	}
	fmt.Println(dt3)
}

Creating a custom serie

To create a custom serie you must provide:

  • a caster function, to cast a generic value to your serie value. The signature must be func(i interface{}) T
  • a comparator, to compare your serie value. The signature must be func(a, b T) int

Example with a NullInt

// IntN is an alis to create the custom Serie to manage IntN
func IntN(v ...interface{}) Serie {
	s, _ := New(NullInt{}, asNullInt, compareNullInt)
	if len(v) > 0 {
		s.Append(v...)
	}
	return s
}

type NullInt struct {
	Int   int
	Valid bool
}

// Interface() to render the current struct as a value.
// If not provided, the serie.All() or serie.Get() wills returns the embedded value
// IE: NullInt{}
func (i NullInt) Interface() interface{} {
	if i.Valid {
		return i.Int
	}
	return nil
}

// asNullInt is our caster function
func asNullInt(i interface{}) NullInt {
	var ni NullInt
	if i == nil {
		return ni
	}

	if v, ok := i.(NullInt); ok {
		return v
	}

	if v, err := cast.ToIntE(i); err == nil {
		ni.Int = v
		ni.Valid = true
	}
	return ni
}

// compareNullInt is our comparator function
// used to sort
func compareNullInt(a, b NullInt) int {
	if !b.Valid {
		if !a.Valid {
			return Eq
		}
		return Gt
	}
	if !a.Valid {
		return Lt
  }
  if a.Int == b.Int {
		return Eq
	}
	if a.Int < b.Int {
		return Lt
	}
	return Gt
}

Who are we ?

We are Datasweet, a french startup providing full service (big) data solutions.

Questions ? problems ? suggestions ?

If you find a bug or want to request a feature, please create a GitHub Issue.

Contributors


Cléo Rebert

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Copyright 2017-2020 Datasweet <http://www.datasweet.fr>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].