All Projects → rocketlaunchr → Dataframe Go

rocketlaunchr / Dataframe Go

Licence: other
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Programming Languages

python
139335 projects - #7 most used programming language
go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Dataframe Go

Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+280.08%)
Mutual labels:  data-science, statistics, pandas, pandas-dataframe
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (+21.15%)
Mutual labels:  dataframe, data-science, pandas, pandas-dataframe
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+1610.27%)
Mutual labels:  data-science, statistics, pandas, pandas-dataframe
Datasheets
Read data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+21.77%)
Mutual labels:  dataframe, data-science, pandas
Data Science Projects With Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
Stars: ✭ 198 (-59.34%)
Mutual labels:  data-science, pandas, pandas-dataframe
Prettypandas
A Pandas Styler class for making beautiful tables
Stars: ✭ 376 (-22.79%)
Mutual labels:  data-science, pandas, pandas-dataframe
10 Simple Hacks To Speed Up Your Data Analysis In Python
Some useful Tips and Tricks to speed up the data analysis process in Python.
Stars: ✭ 45 (-90.76%)
Mutual labels:  data-science, pandas, pandas-dataframe
Foxcross
AsyncIO serving for data science models
Stars: ✭ 18 (-96.3%)
Mutual labels:  dataframe, data-science, pandas
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+1011.29%)
Mutual labels:  dataframe, data-science, statistics
Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (-21.77%)
Mutual labels:  data-science, statistics, pandas
Tablesaw
Java dataframe and visualization library
Stars: ✭ 2,785 (+471.87%)
Mutual labels:  dataframe, data-science, statistics
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+351.13%)
Mutual labels:  data-science, statistics, pandas
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-74.33%)
Mutual labels:  data-science, pandas, pandas-dataframe
Danfojs
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (+167.76%)
Mutual labels:  dataframe, data-science, pandas
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+525.05%)
Mutual labels:  dataframe, data-science, pandas
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+70.02%)
Mutual labels:  dataframe, data-science, pandas
Just Pandas Things
An ongoing list of pandas quirks
Stars: ✭ 660 (+35.52%)
Mutual labels:  data-science, pandas, pandas-dataframe
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-43.94%)
Mutual labels:  data-science, pandas, pandas-dataframe
Boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-95.28%)
Mutual labels:  dataframe, data-science, pandas
Algorithmic-Trading
I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repository contains the code of what I have learnt on the way. It starts form some basic simple statistics and will lead up to complex machine learning algorithms.
Stars: ✭ 47 (-90.35%)
Mutual labels:  statistics, pandas-dataframe, pandas

dataframe-go

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet. This package is designed to be light-weight and intuitive.

⚠️ The package is production ready but the API is not stable yet. Once stability is reached, version 1.0.0 will be tagged. It is recommended your package manager locks to a commit id instead of the master branch directly. ⚠️

the project to show your appreciation.

Features

  1. Importing from CSV, JSONL, Parquet, MySQL & PostgreSQL
  2. Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL
  3. Developer Friendly
  4. Flexible - Create custom Series (custom data types)
  5. Performant
  6. Interoperability with gonum package.
  7. pandas sub-package Help Required
  8. Fake data generation
  9. Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange)
  10. Time-series Forecasting (SES, Holt-Winters)
  11. Math functions
  12. Plotting (cross-platform)

See Tutorial here.

Installation

go get -u github.com/rocketlaunchr/dataframe-go
import dataframe "github.com/rocketlaunchr/dataframe-go"

DataFrames

Creating a DataFrame


s1 := dataframe.NewSeriesInt64("day", nil, 1, 2, 3, 4, 5, 6, 7, 8)
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2, nil, nil, 84.2, 72, 89)
df := dataframe.NewDataFrame(s1, s2)

fmt.Print(df.Table())
  
OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   1   |  50.3   |
| 1:  |   2   |  23.4   |
| 2:  |   3   |  56.2   |
| 3:  |   4   |   NaN   |
| 4:  |   5   |   NaN   |
| 5:  |   6   |  84.2   |
| 6:  |   7   |   72    |
| 7:  |   8   |   89    |
+-----+-------+---------+
| 8X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Insert and Remove Row


df.Append(nil, 9, 123.6)

df.Append(nil, map[string]interface{}{
	"day":   10,
	"sales": nil,
})

df.Remove(0)

OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   2   |  23.4   |
| 1:  |   3   |  56.2   |
| 2:  |   4   |   NaN   |
| 3:  |   5   |   NaN   |
| 4:  |   6   |  84.2   |
| 5:  |   7   |   72    |
| 6:  |   8   |   89    |
| 7:  |   9   |  123.6  |
| 8:  |  10   |   NaN   |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Update Row


df.UpdateRow(0, nil, map[string]interface{}{
	"day":   3,
	"sales": 45,
})

Sorting


sks := []dataframe.SortKey{
	{Key: "sales", Desc: true},
	{Key: "day", Desc: true},
}

df.Sort(ctx, sks)

OUTPUT:
+-----+-------+---------+
|     |  DAY  |  SALES  |
+-----+-------+---------+
| 0:  |   9   |  123.6  |
| 1:  |   8   |   89    |
| 2:  |   6   |  84.2   |
| 3:  |   7   |   72    |
| 4:  |   3   |  56.2   |
| 5:  |   2   |  23.4   |
| 6:  |  10   |   NaN   |
| 7:  |   5   |   NaN   |
| 8:  |   4   |   NaN   |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+

Go Playground

Iterating

You can change the step and starting row. It may be wise to lock the DataFrame before iterating.

The returned value is a map containing the name of the series (string) and the index of the series (int) as keys.


iterator := df.ValuesIterator(dataframe.ValuesOptions{0, 1, true}) // Don't apply read lock because we are write locking from outside.

df.Lock()
for {
	row, vals, _ := iterator()
	if row == nil {
		break
	}
	fmt.Println(*row, vals)
}
df.Unlock()

OUTPUT:
0 map[day:1 0:1 sales:50.3 1:50.3]
1 map[sales:23.4 1:23.4 day:2 0:2]
2 map[day:3 0:3 sales:56.2 1:56.2]
3 map[1:<nil> day:4 0:4 sales:<nil>]
4 map[day:5 0:5 sales:<nil> 1:<nil>]
5 map[sales:84.2 1:84.2 day:6 0:6]
6 map[day:7 0:7 sales:72 1:72]
7 map[day:8 0:8 sales:89 1:89]

Go Playground

Statistics

You can easily calculate statistics for a Series using the gonum or montanaflynn/stats package.

SeriesFloat64 and SeriesTime provide access to the exported Values field to seamlessly interoperate with external math-based packages.

Example

Some series provide easy conversion using the ToSeriesFloat64 method.

import "gonum.org/v1/gonum/stat"

s := dataframe.NewSeriesInt64("random", nil, 1, 2, 3, 4, 5, 6, 7, 8)
sf, _ := s.ToSeriesFloat64(ctx)

Mean

mean := stat.Mean(sf.Values, nil)

Median

import "github.com/montanaflynn/stats"
median, _ := stats.Median(sf.Values)

Standard Deviation

std := stat.StdDev(sf.Values, nil)

Plotting (cross-platform)

import (
	chart "github.com/wcharczuk/go-chart"
	"github.com/rocketlaunchr/dataframe-go/plot"
	wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart"
)

sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89)
cs, _ := wc.S(ctx, sales, nil, nil)

graph := chart.Chart{Series: []chart.Series{cs}}

plt, _ := plot.Open("Monthly sales", 450, 300)
graph.Render(chart.SVG, plt)
plt.Display(plot.None)
<-plt.Closed

Output:

plot

Math Functions

import "github.com/rocketlaunchr/dataframe-go/math/funcs"

res := 24
sx := dataframe.NewSeriesFloat64("x", nil, utils.Float64Seq(1, float64(res), 1))
sy := dataframe.NewSeriesFloat64("y", &dataframe.SeriesInit{Size: res})
df := dataframe.NewDataFrame(sx, sy)

fn := funcs.RegFunc("sin(2*𝜋*x/24)")
funcs.Evaluate(ctx, df, fn, 1)

Go Playground

Output:

sine wave

Importing Data

The imports sub-package has support for importing csv, jsonl, parquet, and directly from a SQL database. The DictateDataType option can be set to specify the true underlying data type. Alternatively, InferDataTypes option can be set.

CSV

csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-05-07,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr))

OUTPUT:
+-----+----------------+------------+-------+---------+-------+
|     |    COUNTRY     |    DATE    |  AGE  | AMOUNT  |  ID   |
+-----+----------------+------------+-------+---------+-------+
| 0:  | United States  | 2012-02-01 |  50   |  112.1  | 1234  |
| 1:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 2:  | United Kingdom | 2012-02-01 |  17   |  18.2   | 12345 |
| 3:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 4:  | United Kingdom | 2015-05-07 |  NaN  |  18.2   | 12345 |
| 5:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 6:  | United States  | 2012-02-01 |  32   | 321.31  | 54320 |
| 7:  |     Spain      | 2012-02-01 |  66   | 555.42  |  241  |
+-----+----------------+------------+-------+---------+-------+
| 8X5 |     STRING     |    TIME    | INT64 | FLOAT64 | INT64 |
+-----+----------------+------------+-------+---------+-------+

Go Playground

Exporting Data

The exports sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database.

Optimizations

  • If you know the number of rows in advance, you can set the capacity of the underlying slice of a series using SeriesInit{}. This will preallocate memory and provide speed improvements.

Generic Series

Out of the box, there is support for string, time.Time, float64 and int64. Automatic support exists for float32 and all types of integers. There is a convenience function provided for dealing with bool. There is also support for complex128 inside the xseries subpackage.

There may be times that you want to use your own custom data types. You can either implement your own Series type (more performant) or use the Generic Series (more convenient).

civil.Date

import "time"
import "cloud.google.com/go/civil"

sg := dataframe.NewSeriesGeneric("date", civil.Date{}, nil, civil.Date{2018, time.May, 01}, civil.Date{2018, time.May, 02}, civil.Date{2018, time.May, 03})
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2)

df := dataframe.NewDataFrame(sg, s2)

OUTPUT:
+-----+------------+---------+
|     |    DATE    |  SALES  |
+-----+------------+---------+
| 0:  | 2018-05-01 |  50.3   |
| 1:  | 2018-05-02 |  23.4   |
| 2:  | 2018-05-03 |  56.2   |
+-----+------------+---------+
| 3X2 | CIVIL DATE | FLOAT64 |
+-----+------------+---------+

Tutorial

Create some fake data

Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate.

import "golang.org/x/exp/rand"
import "rocketlaunchr/dataframe-go/utils/faker"

src := rand.NewSource(uint64(time.Now().UTC().UnixNano()))
df := faker.NewDataFrame(8, src, faker.S("name", 0, "Name"), faker.S("title", 0.5, "JobTitle"), faker.S("base rate", 0, "Number", 15, 50))
+-----+----------------+----------------+-----------+
|     |      NAME      |     TITLE      | BASE RATE |
+-----+----------------+----------------+-----------+
| 0:  | Cordia Jacobi  |   Consultant   |    42     |
| 1:  | Nickolas Emard |      NaN       |    22     |
| 2:  | Hollis Dickens | Representative |    22     |
| 3:  | Stacy Dietrich |      NaN       |    43     |
| 4:  |  Aleen Legros  |    Officer     |    21     |
| 5:  |  Adelia Metz   |   Architect    |    18     |
| 6:  | Sunny Gerlach  |      NaN       |    28     |
| 7:  | Austin Hackett |      NaN       |    39     |
+-----+----------------+----------------+-----------+
| 8X3 |     STRING     |     STRING     |   INT64   |
+-----+----------------+----------------+-----------+

Apply Function

Let's give a promotion to everyone by doubling their salary.

s := df.Series[2]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
	return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
+-----+----------------+----------------+-----------+
|     |      NAME      |     TITLE      | BASE RATE |
+-----+----------------+----------------+-----------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |
| 1:  | Nickolas Emard |      NaN       |    44     |
| 2:  | Hollis Dickens | Representative |    44     |
| 3:  | Stacy Dietrich |      NaN       |    86     |
| 4:  |  Aleen Legros  |    Officer     |    42     |
| 5:  |  Adelia Metz   |   Architect    |    36     |
| 6:  | Sunny Gerlach  |      NaN       |    56     |
| 7:  | Austin Hackett |      NaN       |    78     |
+-----+----------------+----------------+-----------+
| 8X3 |     STRING     |     STRING     |   INT64   |
+-----+----------------+----------------+-----------+

Create a Time series

Let's inform all employees separately on sequential days.

import "rocketlaunchr/dataframe-go/utils/utime"

mts, _ := utime.NewSeriesTime(ctx, "meeting time", "1D", time.Now().UTC(), false, utime.NewSeriesTimeOptions{Size: &[]int{8}[0]})
df.AddSeries(mts, nil)
+-----+----------------+----------------+-----------+--------------------------------+
|     |      NAME      |     TITLE      | BASE RATE |          MEETING TIME          |
+-----+----------------+----------------+-----------+--------------------------------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |   2020-02-02 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 1:  | Nickolas Emard |      NaN       |    44     |   2020-02-03 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 2:  | Hollis Dickens | Representative |    44     |   2020-02-04 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 3:  | Stacy Dietrich |      NaN       |    86     |   2020-02-05 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 4:  |  Aleen Legros  |    Officer     |    42     |   2020-02-06 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 5:  |  Adelia Metz   |   Architect    |    36     |   2020-02-07 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 6:  | Sunny Gerlach  |      NaN       |    56     |   2020-02-08 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 7:  | Austin Hackett |      NaN       |    78     |   2020-02-09 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
+-----+----------------+----------------+-----------+--------------------------------+
| 8X4 |     STRING     |     STRING     |   INT64   |              TIME              |
+-----+----------------+----------------+-----------+--------------------------------+

Filtering

Let's filter out our senior employees (they have titles) for no reason.

filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) {
	if vals["title"] == nil {
		return dataframe.DROP, nil
	}
	return dataframe.KEEP, nil
})

seniors, _ := dataframe.Filter(ctx, df, filterFn)
+-----+----------------+----------------+-----------+--------------------------------+
|     |      NAME      |     TITLE      | BASE RATE |          MEETING TIME          |
+-----+----------------+----------------+-----------+--------------------------------+
| 0:  | Cordia Jacobi  |   Consultant   |    84     |   2020-02-02 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 1:  | Hollis Dickens | Representative |    44     |   2020-02-04 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 2:  |  Aleen Legros  |    Officer     |    42     |   2020-02-06 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
| 3:  |  Adelia Metz   |   Architect    |    36     |   2020-02-07 23:13:53.015324   |
|     |                |                |           |           +0000 UTC            |
+-----+----------------+----------------+-----------+--------------------------------+
| 4X4 |     STRING     |     STRING     |   INT64   |              TIME              |
+-----+----------------+----------------+-----------+--------------------------------+

Other useful packages

  • awesome-svelte - Resources for killing react
  • dbq - Zero boilerplate database operations for Go
  • electron-alert - SweetAlert2 for Electron Applications
  • google-search - Scrape google search results
  • igo - A Go transpiler with cool new syntax such as fordefer (defer for for-loops)
  • mysql-go - Properly cancel slow MySQL queries
  • react - Build front end applications using Go
  • remember-go - Cache slow database queries
  • testing-go - Testing framework for unit testing

Legal Information

The license is a modified MIT license. Refer to LICENSE file for more details.

© 2018-21 PJ Engineering and Business Solutions Pty. Ltd.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].