All Projects → frictionlessdata → tableschema-go

frictionlessdata / tableschema-go

Licence: MIT License
A Go library for working with Table Schema.

Programming Languages

go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to tableschema-go

datapackage-go
A Go library for working with Data Package.
Stars: ✭ 22 (-46.34%)
Mutual labels:  csv, tabular-data, table-schema, frictionlessdata
Tableqa
AI Tool for querying natural language on tabular data.
Stars: ✭ 109 (+165.85%)
Mutual labels:  csv, tabular-data
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+2863.41%)
Mutual labels:  csv, tabular-data
Miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+11200%)
Mutual labels:  csv, tabular-data
Rows
A common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+1702.44%)
Mutual labels:  csv, tabular-data
Faster Than Csv
Faster CSV on Python 3
Stars: ✭ 52 (+26.83%)
Mutual labels:  csv, tabular-data
Tad
A desktop application for viewing and analyzing tabular data
Stars: ✭ 2,275 (+5448.78%)
Mutual labels:  csv, tabular-data
Csvreader
csvreader library / gem - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
Stars: ✭ 169 (+312.2%)
Mutual labels:  csv, tabular-data
fastapi-csv
🏗️ Create APIs from CSV files within seconds, using fastapi
Stars: ✭ 46 (+12.2%)
Mutual labels:  csv, tabular-data
tv
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Stars: ✭ 1,763 (+4200%)
Mutual labels:  csv, tabular-data
tabular-stream
Detects tabular data (spreadsheets, dsv or json, 20+ different formats) and emits normalized objects.
Stars: ✭ 34 (-17.07%)
Mutual labels:  csv, tabular-data
Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+11134.15%)
Mutual labels:  csv, tabular-data
Meza
A Python toolkit for processing tabular data
Stars: ✭ 374 (+812.2%)
Mutual labels:  csv, tabular-data
Csvpack
csvpack library / gem - tools 'n' scripts for working with tabular data packages using comma-separated values (CSV) datafiles in text with meta info (that is, schema, datatypes, ..) in datapackage.json; download, read into and query CSV datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
Stars: ✭ 71 (+73.17%)
Mutual labels:  csv, tabular-data
datapackage-m
Power Query M functions for working with Tabular Data Packages (Frictionless Data) in Power BI and Excel
Stars: ✭ 26 (-36.59%)
Mutual labels:  tabular-data, frictionlessdata
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+1956.1%)
Mutual labels:  csv, tabular-data
awesome-csv
Awesome Comma-Separated Values (CSV) - What's Next? - Frequently Asked Questions (F.A.Q.s) - Libraries & Tools
Stars: ✭ 46 (+12.2%)
Mutual labels:  csv, frictionlessdata
x86-csv
A machine-readable representation of the Intel x86 Instruction Set Reference.
Stars: ✭ 20 (-51.22%)
Mutual labels:  csv
klar-EDA
A python library for automated exploratory data analysis
Stars: ✭ 15 (-63.41%)
Mutual labels:  csv
openmrs-module-initializer
The OpenMRS Initializer module is an API-only module that processes the content of the configuration folder when it is found inside OpenMRS' application data directory.
Stars: ✭ 18 (-56.1%)
Mutual labels:  csv

tableschema-go

Build Status Coverage Status Go Report Card Gitter chat GoDoc Sourcegraph Codebase Support

Table schema tooling in Go.

Getting started

Installation

This package uses semantic versioning 2.0.0.

Go version >= 1.11

Please use go modules if you're using a version that supports it. To know which version of Go you're running, please run:

$ go version
go version go1.14 linux/amd64

If you're running go1.13+, you're good to go!

If you can not upgrade right now, you need to make sure your environment is using Go modules by setting the GO111MODULE environment variable. In bash, that could be done with the following command:

$ export GO111MODULE=on

Go version >= 1.8 && < 1.11

$ dep init
$ dep ensure -add github.com/frictionlessdata/tableschema-go/csv@>=0.1

Main Features

Tabular Data Load

Have tabular data stored in local files? Remote files? Packages like the csv are going to help on loading the data you need and making it ready for processing.

package main

import "github.com/frictionlessdata/tableschema-go/csv"

func main() {
   tab, err := csv.NewTable(csv.Remote("myremotetable"), csv.LoadHeaders())
   // Error handling.
}

Supported physical representations:

You would like to use tableschema-go but the physical representation you use is not listed here? No problem! Please create an issue before start contributing. We will be happy to help you along the way.

Schema Inference and Configuration

Got that new dataset and wants to start getting your hands dirty ASAP? No problems, let the schema package try to infer the data types based on the table data.

package main

import (
   "github.com/frictionlessdata/tableschema-go/csv"
   "github.com/frictionlessdata/tableschema-go/schema"
)

func main() {
   tab, _ := csv.NewTable(csv.Remote("myremotetable"), csv.LoadHeaders())
   sch, _ := schema.Infer(tab)
   fmt.Printf("%+v", sch)
}

Want to go faster? Please give InferImplicitCasting a try and let us know how it goes.

There might be cases in which the inferred schema is not correct. One of those cases is when your data use strings like "N/A" to represent missing cells. That would usually make our inferential algorithm think the field is a string.

When that happens, you can manually perform those last minutes tweaks Schema.

   sch.MissingValues = []string{"N/A"}
   sch.GetField("ID").Type = schema.IntegerType

After all that, you could persist your schema to disk:

sch.SaveToFile("users_schema.json")

And use the local schema later:

sch, _ := sch.LoadFromFile("users_schema.json")

Finally, if your schema is saved remotely, you can also use it:

sch, _ := schema.LoadRemote("http://myfoobar/users/schema.json")

Processing Tabular Data

Once you have the data, you would like to process using language data types. schema.CastTable and schema.CastRow are your friends on this journey.

package main

import (
   "github.com/frictionlessdata/tableschema-go/csv"
   "github.com/frictionlessdata/tableschema-go/schema"
)

type user struct {
   ID   int
   Age  int
   Name string
}

func main() {
   tab, _ := csv.NewTable(csv.FromFile("users.csv"), csv.LoadHeaders())
   sch, _ := schema.Infer(tab)
   var users []user
   sch.CastTable(tab, &users)
   // Users slice contains the table contents properly raw into
   // language types. Each row will be a new user appended to the slice.
}

If you have a lot of data and can no load everything in memory, you can easily iterate trough it:

...
   iter, _ := sch.Iter()
   for iter.Next() {
      var u user
      sch.CastRow(iter.Row(), &u)
      // Variable u is now filled with row contents properly raw
      // to language types.
   }
...

If you store data in a GZIP file, you can load it compressed using the same csv.FromFile:

...
   tab, _ := csv.NewTable(csv.FromFile("users.csv.gz"), csv.LoadHeaders())
...

Even better if you could do it regardless the physical representation! The table package declares some interfaces that will help you to achieve this goal:

Field

Class represents field in the schema.

For example, data values can be castd to native Go types. Decoding a value will check if the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema.

{
    'name': 'birthday',
    'type': 'date',
    'format': 'default',
    'constraints': {
        'required': True,
        'minimum': '2015-05-30'
    }
}

The following example will raise exception the passed-in is less than allowed by minimum constraints of the field. Errors will be returned as well when the user tries to cast values which are not well formatted dates.

date, err := field.Cast("2014-05-29")
// uh oh, something went wrong

Values that can't be castd will return an error. Casting a value that doesn't meet the constraints will return an error.

Available types, formats and resultant value of the cast:

Type Formats Casting result
any default interface{}
object default interface{}
array default []interface{}
boolean default bool
duration default time.Time
geopoint default, array, object [float64, float64]
integer default int64
number default float64
string default, uri, email, binary string
date default, any, <PATTERN> time.Time
datetime default, any, <PATTERN> time.Time
time default, any, <PATTERN> time.Time
year default time.Time
yearmonth default time.Time

Saving Tabular Data

Once you're done processing the data, it is time to persist results. As an example, let us assume we have a remote table schema called summary, which contains two fields:

import (
   "github.com/frictionlessdata/tableschema-go/csv"
   "github.com/frictionlessdata/tableschema-go/schema"
)


type summaryEntry struct {
    Date time.Time
    AverageAge float64
}

func WriteSummary(summary []summaryEntry, path string) {
   sch, _ := schema.LoadRemote("http://myfoobar/users/summary/schema.json")

   f, _ := os.Create(path)
   defer f.Close()

   w := csv.NewWriter(f)
   defer w.Flush()

   w.Write([]string{"Date", "AverageAge"})
   for _, summ := range summary{
       row, _ := sch.UncastRow(summ)
       w.Write(row)
   }
}

API Reference and More Examples

More detailed documentation about API methods and plenty of examples is available at https://godoc.org/github.com/frictionlessdata/tableschema-go

Contributing

Found a problem and would like to fix it? Have that great idea and would love to see it in the repository?

Please open an issue before start working

That could save a lot of time from everyone and we are super happy to answer questions and help you alonge the way. Furthermore, feel free to join frictionlessdata Gitter chat room and ask questions.

This project follows the Open Knowledge International coding standards

  • Before start coding:

    • Fork and pull the latest version of the master branch
    • Make sure you have go 1.8+ installed and you're using it
    • Make sure you dep installed
  • Before sending the PR:

$ cd $GOPATH/src/github.com/frictionlessdata/tableschema-go
$ dep ensure
$ go test ./..

And make sure your all tests pass.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].