All Projects → csvreader → Csvpack

csvreader / Csvpack

Licence: cc0-1.0
csvpack library / gem - tools 'n' scripts for working with tabular data packages using comma-separated values (CSV) datafiles in text with meta info (that is, schema, datatypes, ..) in datapackage.json; download, read into and query CSV datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Csvpack

Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+6387.32%)
Mutual labels:  csv, sqlite, tabular-data
Datasette
An open source multi-tool for exploring and publishing data
Stars: ✭ 5,640 (+7843.66%)
Mutual labels:  csv, sqlite
Spreadsheet architect
Spreadsheet Architect is a library that allows you to create XLSX, ODS, or CSV spreadsheets super easily from ActiveRecord relations, plain Ruby objects, or tabular data.
Stars: ✭ 1,160 (+1533.8%)
Mutual labels:  csv, activerecord
Sqlitebiter
A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
Stars: ✭ 601 (+746.48%)
Mutual labels:  csv, sqlite
Meza
A Python toolkit for processing tabular data
Stars: ✭ 374 (+426.76%)
Mutual labels:  csv, tabular-data
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+494.37%)
Mutual labels:  csv, sqlite
Daff
align and compare tables
Stars: ✭ 598 (+742.25%)
Mutual labels:  csv, sqlite
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-57.75%)
Mutual labels:  csv, sqlite
Nano Sql
Universal database layer for the client, server & mobile devices. It's like Lego for databases.
Stars: ✭ 717 (+909.86%)
Mutual labels:  csv, sqlite
Rows
A common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+940.85%)
Mutual labels:  csv, tabular-data
Row boat
Import CSVs into your ActiveRecord models
Stars: ✭ 12 (-83.1%)
Mutual labels:  csv, activerecord
Active importer
Define importers that load tabular data from spreadsheets or CSV files into any ActiveRecord-like ORM.
Stars: ✭ 333 (+369.01%)
Mutual labels:  activerecord, tabular-data
Pgloader
Migrate to PostgreSQL in a single command!
Stars: ✭ 3,754 (+5187.32%)
Mutual labels:  csv, sqlite
csv-to-sqlite
A desktop app to convert CSV files to SQLite databases!
Stars: ✭ 68 (-4.23%)
Mutual labels:  csv, sqlite
Fifa Fut Data
Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
Stars: ✭ 55 (-22.54%)
Mutual labels:  csv, dataset
Csvs To Sqlite
Convert CSV files into a SQLite database
Stars: ✭ 568 (+700%)
Mutual labels:  csv, sqlite
datapackage-go
A Go library for working with Data Package.
Stars: ✭ 22 (-69.01%)
Mutual labels:  csv, tabular-data
tableschema-go
A Go library for working with Table Schema.
Stars: ✭ 41 (-42.25%)
Mutual labels:  csv, tabular-data
Uhttbarcodereference
Universe-HTT barcode reference
Stars: ✭ 634 (+792.96%)
Mutual labels:  csv, dataset
Faster Than Csv
Faster CSV on Python 3
Stars: ✭ 52 (-26.76%)
Mutual labels:  csv, tabular-data

csvpack

tools 'n' scripts for working with tabular data packages using comma-separated values (CSV) datafiles in text with meta info (that is, schema, datatypes, ..) in datapackage.json; download, read into and query CSV datafiles with your SQL database (e.g. SQLite, PostgreSQL, ...) of choice and much more

Usage

What's a tabular data package?

Tabular Data Package is a simple structure for publishing and sharing tabular data with the following key features:

  • Data is stored in CSV (comma separated values) files
  • Metadata about the dataset both general (e.g. title, author) and the specific data files (e.g. schema) is stored in a single JSON file named datapackage.json which follows the Data Package format

(Source: Tabular Data Packages @ Frictionless Data Project • Data Hub.io • Open Knowledge Foundation • Data Protocols.org)

Here's a minimal example of a tabular data package holding two files, that is, data.csv and datapackage.json:

beer/data.csv:

Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
...

beer/datapackage.json:

{
  "name": "beer",
  "resources": [
    {
      "path": "data.csv",
      "schema": {
        "fields": [{ "name": "Brewery",   "type": "string" },
                   { "name": "City",      "type": "string" },
                   { "name": "Name",      "type": "string" },
                   { "name": "Abv",       "type": "number" }]
      }
    }
  ]
}

Where to find data packages?

For some 100+ real world examples see the Data Packages Listing (Sources, Registry) at the Data Hub.io • Frictionless Data Project website for a start. Tabular data packages include:

Name Comments
country-codes Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
language-codes ISO Language Codes (639-1 and 693-2)
currency-codes ISO 4217 Currency Codes
gdb Country, Regional and World GDP (Gross Domestic Product)
s-and-p-500-companies S&P 500 Companies with Financial Information
un-locode UN-LOCODE Codelist
bond-yields-uk-10y 10 Year UK Government Bond Yields (Long-Term Interest Rate)
gold-prices Gold Prices (Monthly in USD)
oil-prices Brent crude and WTI oil prices from US EIA
co2-emissions Annual info about co2 emissions per nation
co2-fossil-global Global CO2 Emissions from fossil-fuels annually since 1751 till 2014

and many more

Code, Code, Code - Script Your Data Workflow with Ruby

require 'csvpack'

CsvPack.import(
  's-and-p-500-companies',
  'gdb'
)

Using CsvPack.import will:

  1. download all data packages to the ./pack folder

  2. (auto-)add all tables to an in-memory SQLite database using SQL create_table commands via ActiveRecord migrations e.g.

create_table :constituents do |t|
  t.string :symbol          # Symbol         (string)
  t.string :name            # Name           (string)
  t.string :sector          # Sector         (string)
end
  1. (auto-)import all datasets using SQL inserts e.g.
INSERT INTO constituents
  (symbol,
   name,
   sector)
VALUES  
  ('MMM',
   '3M Company',
   'Industrials')
  1. (auto-)add ActiveRecord models for all tables.

So what? Now you can use all the "magic" of ActiveRecord to query the datasets. Example:

pp Constituent.count

# SELECT COUNT(*) FROM "constituents"
# => 496


pp Constituent.first

# SELECT  "constituents".* FROM "constituents" ORDER BY "constituents"."id" ASC LIMIT 1
# => #<Constituent:0x9f8cb78
#         id:     1,
#         symbol: "MMM",
#         name:   "3M Company",
#         sector: "Industrials">


pp Constituent.find_by!( symbol: 'MMM' )

# SELECT  "constituents".*
#         FROM "constituents"
#         WHERE "constituents"."symbol" = "MMM"
#         LIMIT 1
# => #<Constituent:0x9f8cb78
#         id:     1,
#         symbol: "MMM",
#         name:   "3M Company",
#         sector: "Industrials">


pp Constituent.find_by!( name: '3M Company' )

# SELECT  "constituents".*
#          FROM "constituents"
#          WHERE "constituents"."name" = "3M Company"
#          LIMIT 1
# => #<Constituent:0x9f8cb78
#         id:     1,
#         symbol: "MMM",
#         name:   "3M Company",
#         sector: "Industrials">


pp Constituent.where( sector: 'Industrials' ).count

# SELECT COUNT(*) FROM "constituents"
#         WHERE "constituents"."sector" = "Industrials"
# => 63


pp Constituent.where( sector: 'Industrials' ).all

# SELECT "constituents".*
#         FROM "constituents"
#         WHERE "constituents"."sector" = "Industrials"
# => [#<Constituent:0x9f8cb78
#          id:     1,
#          symbol: "MMM",
#          name:   "3M Company",
#          sector: "Industrials">,
#      #<Constituent:0xa2a4180
#          id:     8,
#          symbol: "ADT",
#          name:   "ADT Corp (The)",
#          sector: "Industrials">,...]

and so on

Frequently Asked Questions (F.A.Qs) and Answers

Q: How to dowload a data package ("by hand")?

Use the CsvPack::Downloader class to download a data package to your disk (by default data packages get stored in ./pack).

dl = CsvPack::Downloader.new
dl.fetch( 'language-codes' )
dl.fetch( 's-and-p-500-companies' )
dl.fetch( 'un-locode')

Will result in:

-- pack
   |-- language-codes
   |   |-- data
   |   |   |-- ietf-language-tags.csv
   |   |   |-- language-codes-3b2.csv
   |   |   |-- language-codes-full.csv
   |   |   `-- language-codes.csv
   |   `-- datapackage.json
   |-- s-and-p-500-companies
   |   |-- data
   |   |   `-- constituents.csv
   |   `-- datapackage.json
   `-- un-locode
       |-- data
       |   |-- code-list.csv
       |   |-- country-codes.csv
       |   |-- function-classifiers.csv
       |   |-- status-indicators.csv
       |   `-- subdivision-codes.csv
       `-- datapackage.json

Q: How to add and import a data package ("by hand")?

Use the CsvPack::Pack class to read-in a data package and add and import into an SQL database.

pack = CsvPack::Pack.new( './pack/un-locode/datapackage.json' )
pack.tables.each do |table|
  table.up!      # (auto-) add table  using SQL create_table via ActiveRecord migration
  table.import!  # import all records using SQL inserts
end

Q: How to connect to a different SQL database?

You can connect to any database supported by ActiveRecord. If you do NOT establish a connection in your script - the standard (default fallback) is using an in-memory SQLite3 database.

SQLite

For example, to create an SQLite3 database on disk - lets say mine.db - use in your script (before the CsvPack.import statement):

ActiveRecord::Base.establish_connection( adapter:  'sqlite3',
                                         database: './mine.db' )
PostgreSQL

For example, to connect to a PostgreSQL database use in your script (before the CsvPack.import statement):

require 'pg'       ##  pull-in PostgreSQL (pg) machinery

ActiveRecord::Base.establish_connection( adapter:  'postgresql'
                                         username: 'ruby',
                                         password: 'topsecret',
                                         database: 'database' )

Install

Just install the gem:

$ gem install csvpack

Alternatives

See the Tools and Plugins for working with Data Packages page at the Frictionless Data Project.

License

The csvpack scripts are dedicated to the public domain. Use it as you please with no restrictions whatsoever.

Questions? Comments?

Send them along to the wwwmake forum. Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].