All Projects → frictionlessdata → datapackage-go

frictionlessdata / datapackage-go

Licence: MIT license
A Go library for working with Data Package.

Programming Languages

go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to datapackage-go

tableschema-go
A Go library for working with Table Schema.
Stars: ✭ 41 (+86.36%)
Mutual labels:  csv, tabular-data, table-schema, frictionlessdata
datapackage-m
Power Query M functions for working with Tabular Data Packages (Frictionless Data) in Power BI and Excel
Stars: ✭ 26 (+18.18%)
Mutual labels:  tabular-data, frictionlessdata, datapackage
awesome-csv
Awesome Comma-Separated Values (CSV) - What's Next? - Frequently Asked Questions (F.A.Q.s) - Libraries & Tools
Stars: ✭ 46 (+109.09%)
Mutual labels:  csv, frictionlessdata
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+5422.73%)
Mutual labels:  csv, tabular-data
Tad
A desktop application for viewing and analyzing tabular data
Stars: ✭ 2,275 (+10240.91%)
Mutual labels:  csv, tabular-data
Rows
A common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+3259.09%)
Mutual labels:  csv, tabular-data
Faster Than Csv
Faster CSV on Python 3
Stars: ✭ 52 (+136.36%)
Mutual labels:  csv, tabular-data
Csvreader
csvreader library / gem - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
Stars: ✭ 169 (+668.18%)
Mutual labels:  csv, tabular-data
Tableqa
AI Tool for querying natural language on tabular data.
Stars: ✭ 109 (+395.45%)
Mutual labels:  csv, tabular-data
fastapi-csv
🏗️ Create APIs from CSV files within seconds, using fastapi
Stars: ✭ 46 (+109.09%)
Mutual labels:  csv, tabular-data
tv
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
Stars: ✭ 1,763 (+7913.64%)
Mutual labels:  csv, tabular-data
Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+20836.36%)
Mutual labels:  csv, tabular-data
Meza
A Python toolkit for processing tabular data
Stars: ✭ 374 (+1600%)
Mutual labels:  csv, tabular-data
Csvpack
csvpack library / gem - tools 'n' scripts for working with tabular data packages using comma-separated values (CSV) datafiles in text with meta info (that is, schema, datatypes, ..) in datapackage.json; download, read into and query CSV datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more
Stars: ✭ 71 (+222.73%)
Mutual labels:  csv, tabular-data
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+3731.82%)
Mutual labels:  csv, tabular-data
Miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+20959.09%)
Mutual labels:  csv, tabular-data
tabular-stream
Detects tabular data (spreadsheets, dsv or json, 20+ different formats) and emits normalized objects.
Stars: ✭ 34 (+54.55%)
Mutual labels:  csv, tabular-data
gems
Ruby Football Week 2021, June 11th to June 17th - 7 Days of Ruby (Sports) Gems ++ Best of Ruby Gems Series
Stars: ✭ 76 (+245.45%)
Mutual labels:  csv, datapackage
Tabula
🈸 Pretty printer for maps/structs collections (Elixir)
Stars: ✭ 85 (+286.36%)
Mutual labels:  tabular-data
react-keyview
React components to display the list, table, and grid, without scrolling, use the keyboard keys to navigate through the data
Stars: ✭ 16 (-27.27%)
Mutual labels:  tabular-data

datapackage-go

Build Status Coverage Status Go Report Card GoDoc Sourcegraph Codebase Support

A Go library for working with Data Packages.

Install

$ go get -u github.com/frictionlessdata/datapackage-go/...

Main Features

Loading and validating tabular data package descriptors

A data package is a collection of resources. The datapackage.Package provides various capabilities like loading local or remote data package, saving a data package descriptor and many more.

Consider we have some local csv file and a JSON descriptor in a data directory:

data/population.csv

city,year,population
london,2017,8780000
paris,2017,2240000
rome,2017,2860000

data/datapackage.json

{
    "name": "world",
    "resources": [
      {
        "name": "population",
        "path": "population.csv",
        "profile":"tabular-data-resource",
        "schema": {
          "fields": [
            {"name": "city", "type": "string"},
            {"name": "year", "type": "integer"},
            {"name": "population", "type": "integer"}
          ]
        }
      }
    ]
  }

Let's create a data package based on this data using the datapackage.Package class:

pkg, err := datapackage.Load("data/datapackage.json")
// Check error.

Accessing data package resources

Once the data package is loaded, we could use the datapackage.Resource class to read data resource's contents:

resource := pkg.GetResource("population")
contents, _ := resource.ReadAll()
fmt.Println(contents)
// [[london 2017 8780000] [paris 2017 2240000] [rome 20172860000]]

Or you could cast to Go types, making it easier to perform further processing:

type Population struct {
    City string `tableheader:"city"`
    Year  string `tableheader:"year"`
    Population   int    `tableheader:"population"`
}

var cities []Population
resource.Cast(&cities, csv.LoadHeaders())
fmt.Printf("+v", cities)
// [{City:london Year:2017 Population:8780000} {City:paris Year:2017 Population:2240000} {City:rome Year:2017 Population:2860000}]

If the data is to big to be loaded at once or if you would like to perform line-by-line processing, you could iterate through the resource contents:

iter, _ := resource.Iter(csv.LoadHeaders())
sch, _ := resource.GetSchema()
for iter.Next() {
    var p Population
    sch.CastRow(iter.Row(), &cp)
    fmt.Printf("%+v\n", p)
}
// {City:london Year:2017 Population:8780000
// {City:paris Year:2017 Population:2240000}
// {City:rome Year:2017 Population:2860000}]

Or you might want to process specific columns, for instance to perform an statical analysis:

var population []float64
resource.CastColumn("population", &population, csv.LoadHeaders())
fmt.Println(ages)
// Output: [8780000 2240000 2860000]

Loading zip bundles

It is very common to store the data in zip bundles containing the descriptor and data files. Those are natively supported by our the datapackage.Load method. For example, lets say we have the following package.zip bundle:

|- package.zip
    |- datapackage.json
    |- data.csv

We could load this package by simply:

pkg, err := datapackage.Load("package.zip")
// Check error.

And the library will unzip the package contents to a temporary directory and wire everything up for us.

A complete example can be found here.

Creating a zip bundle with the data package.

You could also easily create a zip file containing the descriptor and all the data resources. Let's say you have a datapackage.Package instance, to create a zip file containing all resources simply:

err := pkg.Zip("package.zip")
// Check error.

This call also download remote resources. A complete example can be found here

CSV dialect support

Basic support for configuring CSV dialect has been added. In particular delimiter, skipInitialSpace and header fields are supported. For instance, lets assume the population file has a different field delimiter:

data/population.csv

city,year,population
london;2017;8780000
paris;2017;2240000
rome;2017;2860000

One could easily parse by adding following dialect property to the world resource:

    "dialect":{
        "delimiter":";"
    }

A complete example can be found here.

Loading multipart resources

Sometimes you have data scattered across many local or remote files. Datapackage-go offers an easy way you to deal all those file as one big file. We call it multipart resources. To use this feature, simply list your files in the path property of the resource. For example, lets say our population data is now split between north and south hemispheres. To deal with this, we only need change to change the package descriptor:

data/datapackage.json

{
    "name": "world",
    "resources": [
      {
        "name": "population",
        "path": ["north.csv","south.csv"],
        "profile":"tabular-data-resource",
        "schema": {
          "fields": [
            {"name": "city", "type": "string"},
            {"name": "year", "type": "integer"},
            {"name": "population", "type": "integer"}
          ]
        }
      }
    ]
  }

And all the rest of the code would still be working.

A complete example can be found here.

Loading non-tabular resources

A Data package is a container format used to describe and package a collection of data. Even though there is additional support for dealing with tabular resources, it can be used to package any kind of data.

For instance, lets say an user needs to load JSON-LD information along with some tabular data (for more on this use case, please take a look at this issue). That can be packed together in a data package descriptor:

{
    "name": "carp-lake",
    "title": "Carp Lake Title",
    "description": "Tephra and Lithology from Carp Lake",
    "resources": [
      {
        "name":"data",
        "path": "data/carpLakeCoreStratigraphy.csv",
        "format": "csv",
        "schema": {
          "fields": [
            {"name": "depth", "type": "number"},
            {"name": "notes", "type": "text"},
            {"name": "core_segments", "type": "text"}
          ]
        }
      },
      {
        "name": "schemaorg",
        "path": "data/schemaorg-ld.json",
        "format": "application/ld+json"
      }
    ]
}

The package loading proceeds as usual.

pkg, err := datapackage.Load("data/datapackage.json")
// Check error.

Once the data package is loaded, we could use the Resource.RawRead method to access schemaorg resource contents as a byte slice.

so := pkg.GetResource("schemaorg")
rc, _ := so.RawRead()
defer rc.Close()
contents, _ := ioutil.ReadAll(rc)
// Use contents. For instance, one could validate the JSON-LD schema and unmarshal it into a data structure.

data := pkg.GetResource("data")
dataContents, err := data.ReadAll()
// As data is a tabular resource, its content can be loaded as [][]string.

Manipulating data packages programatically

The datapackage-go library also makes it easy to save packages. Let's say you're creating a program that produces data packages and would like to add or remove resource:

descriptor := map[string]interface{}{
    "resources": []interface{}{
        map[string]interface{}{
            "name":    "books",
            "path":    "books.csv",
            "format":  "csv",
            "profile": "tabular-data-resource",
            "schema": map[string]interface{}{
                "fields": []interface{}{
                    map[string]interface{}{"name": "author", "type": "string"},
                    map[string]interface{}{"name": "title", "type": "string"},
                    map[string]interface{}{"name": "year", "type": "integer"},
                },
            },
        },
    },
}
pkg, err := datapackage.New(descriptor, ".", validator.InMemoryLoader())
if err != nil {
    panic(err)
}
// Removing resource.
pkg.RemoveResource("books")

// Adding new resource.
pkg.AddResource(map[string]interface{}{
    "name":    "cities",
    "path":    "cities.csv",
    "format":  "csv",
    "profile": "tabular-data-resource",
    "schema": map[string]interface{}{
        "fields": []interface{}{
            map[string]interface{}{"name": "city", "type": "string"},
            map[string]interface{}{"name": "year", "type": "integer"},
            map[string]interface{}{"name": "population", "type": "integer"}
        },
    },
})

// Printing resource contents.
cities, _ := pkg.GetResource("cities").ReadAll()
fmt.Println(cities)
// [[london 2017 8780000] [paris 2017 2240000] [rome 20172860000]]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].