All Projects → csvspecs → awesome-csv

csvspecs / awesome-csv

Licence: CC0-1.0 license
Awesome Comma-Separated Values (CSV) - What's Next? - Frequently Asked Questions (F.A.Q.s) - Libraries & Tools

Projects that are alternatives of or similar to awesome-csv

Csvreader
csvreader library / gem - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
Stars: ✭ 169 (+267.39%)
Mutual labels:  csv, tab
Flatfiles
Reads and writes CSV, fixed-length and other flat file formats with a focus on schema definition, configuration and speed.
Stars: ✭ 275 (+497.83%)
Mutual labels:  schema, csv
transferdb
TransferDB 支持异构数据库 schema 转换、全量数据导出导入以及增量数据同步功能( Oracle 数据库 -> MySQL/TiDB 数据库)
Stars: ✭ 30 (-34.78%)
Mutual labels:  schema, csv
Uhttbarcodereference
Universe-HTT barcode reference
Stars: ✭ 634 (+1278.26%)
Mutual labels:  csv, opendata
awesome-yaml
YAML awesomeness
Stars: ✭ 36 (-21.74%)
Mutual labels:  opendata, txt
Kalulu
Uganda Elections Tools and Resources
Stars: ✭ 24 (-47.83%)
Mutual labels:  csv, opendata
DataAnalyzer.app
✨🚀 DataAnalyzer.app - Convert JSON/CSV to Typed Data Interfaces - Automatically!
Stars: ✭ 23 (-50%)
Mutual labels:  schema, csv
json-next
json-next gem - read generation y / next generation json versions (HanSON, SON, JSONX/JSON11, etc.) with comments, unquoted keys, multi-line strings, trailing commas, optional commas, and more
Stars: ✭ 30 (-34.78%)
Mutual labels:  opendata, frictionlessdata
awesome-json-next
A Collection of What's Next for Awesome JSON (JavaScript Object Notation) for Structured (Meta) Data in Text - JSON5, HJSON, HanSON, TJSON, SON, CSON, USON, JSONX/JSON11 & Many More
Stars: ✭ 50 (+8.7%)
Mutual labels:  opendata, txt
Omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (+221.74%)
Mutual labels:  schema, csv
Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+9913.04%)
Mutual labels:  csv, opendata
yt-videos-list
Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.
Stars: ✭ 64 (+39.13%)
Mutual labels:  csv, txt
tableschema-go
A Go library for working with Table Schema.
Stars: ✭ 41 (-10.87%)
Mutual labels:  csv, frictionlessdata
England
Football data for England (and Wales) incl. English Premier League, The Football League (Championship, League One, League Two), Football Conference etc.
Stars: ✭ 117 (+154.35%)
Mutual labels:  csv, opendata
datapackage-go
A Go library for working with Data Package.
Stars: ✭ 22 (-52.17%)
Mutual labels:  csv, frictionlessdata
data-models
Collection of various biomedical data models in parseable formats.
Stars: ✭ 23 (-50%)
Mutual labels:  schema, csv
Specs
Technical specifications and guidelines for implementing Frictionless Data.
Stars: ✭ 403 (+776.09%)
Mutual labels:  schema, csv
municipios-br
Dados em formato aberto sobre municípios e unidades federativas do Brasil.
Stars: ✭ 58 (+26.09%)
Mutual labels:  csv, opendata
gems
Ruby Football Week 2021, June 11th to June 17th - 7 Days of Ruby (Sports) Gems ++ Best of Ruby Gems Series
Stars: ✭ 76 (+65.22%)
Mutual labels:  csv, opendata
DDDToolbox
A set of Roslyn refactorings supporting DDD design
Stars: ✭ 31 (-32.61%)
Mutual labels:  records

Awesome Comma-Separated Values (CSV) - What's Next? - Frequently Asked Questions (F.A.Q.s) - Libraries & Tools

A collection about the comma-separated values (CSV) world for rich structured data in (plain) text

Contributions welcome. Anything missing? Send in a pull request. Thanks.

CSV Family

Formats, Formats, Formats

CSV RFC 4180 "Strict"CSV v1.0 "The Right Way"CSV <3 NumericsCSV <3 JSONCSV <3 YAMLCSV v1.1 "Modern"

Related

CSV RFC 4180 "Strict"

ID,Name,Code,Area,Pop
ca,Canada,CAN,9984670,34278406
us,United States,USA,9629091,314167157
mx,México [Mexico],MEX,1972550,112322757
...
  • IETF RFC #4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files - by Y. Shafranovich (SolidMatrix Technologies, Inc.), October 2005

Why the CSV RFC 4180 "Strict" Memo Is Dangerous?

People and (simplistic) CSV parser writers (and fanatics) use it to claim that it is the ultimate (and only) CSV format and use it to end all discussions if the code breaks when you add a space before a quote or mixed quotes and so on. It's way too simplistic (no spaces, no comments, no blank lines, no semicolon for separator, no modern two-byte characters, and so on).

Next time someone bring ups:

Have you read the [strict] RFC 4180 [CSV format memo]? The quoting rules are in there.

Why not ask back: Have you read it? :-) Let's start at the beginning (together):

This memo provides information for the internet community. It does not specify an internet standard of any kind. It does not specify an internet standard of any kind. It does not specify an internet standard of any kind.

Can I use __? (RFC 4180 "Strict")

Q: Can I use blank lines?

A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you CANNOT use blank lines. Why? Blank lines are "ambiguous". Might be a blank record or a blank line.

Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" practical common sense versions.

Q: Can I use comments?

A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you CANNOT use comments. Why? The original CSV format was intended just for machine reading and not for human mere mortals.

Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions.

Q: Can I use "literal" geo coordinates e.g. 48°51'24"N?

A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you CANNOT use "literal" double quotes (") e.g. 48°51'24"N - you MUST double quote the geo coordinates and double up the double quote ("") e.g. "48°51'24""N". Example:

New York City,"40°42'46""N","74°00'21""W"
Paris,"48°51'24""N","2°21'03""E"

Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions. Example:

New York City, 40°42'46"N, 74°00'21"W
Paris,         48°51'24"N, 2°21'03"E

Q: Can I use "unix-style" escaping with backslashes (e.g. \" for "")?

A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you MUST double up the double quote ("") inside double quotes. Period. Example:

1,"Hamlet says, ""Seems,"" madam! Nay it is; I know not ""seems."""

Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions. Use as you like it. Example:

1, "Hamlet says, \"Seems,\" madam! Nay it is; I know not \"seems.\""

Q: Can I use mixed quotes (that is, single '...' or double quotes "...")?

A: No. No. No. In the "simplistic" CSV RFC 4180 "Strict" format you MUST always use double quotes ("") and double up the double quote inside double quotes. Period.

Yes. Yes. Yes. See CSV v1.0 or CSV v1.1 for "modern" human versions. Use as you like it. Example:

1, "Hamlet says, 'Seems,' madam! Nay it is; I know not 'seems.'"
2, 'Hamlet says, "Seems," madam! Nay it is; I know not "seems."'

CSV v1.0 "The Right Way"

Q: What's CSV the right way? What best practices can I use?

Use best practices out-of-the-box with zero-configuration. Do you know how to skip blank lines or how to add # single-line comments? Or how to trim leading and trailing spaces? No worries. It's turned on by default.

Yes, you can. Use

#######
# try with some comments
#   and blank lines even before header (first row)

Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%

Bayerische Staatsbrauerei Weihenstephan,  Freising,  Hefe Weissbier,   5.4%
Brauerei Spezial,                         Bamberg,   Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%

instead of strict "classic" (no blank lines, no comments, no leading and trailing spaces, etc.):

Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%

Or use the ARFF (attribute-relation file format)-like alternative style with % for comments and @-directives for "meta data" in the header (before any records):

%%%%%%%%%%%%%%%%%%
% try with some comments
%   and blank lines even before @-directives in header

@RELATION Beer

@ATTRIBUTE Brewery
@ATTRIBUTE City
@ATTRIBUTE Name
@ATTRIBUTE Abv

@DATA
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%

Bayerische Staatsbrauerei Weihenstephan,  Freising,  Hefe Weissbier,   5.4%
Brauerei Spezial,                         Bamberg,   Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%

Or use the ARFF (attribute-relation file format)-like alternative style with @-directives inside comments (for easier backwards compatibility with old readers) for "meta data" in the header (before any records):

##########################
# try with some comments
#   and blank lines even before @-directives in header
#
# @RELATION Beer
#
# @ATTRIBUTE Brewery
# @ATTRIBUTE City
# @ATTRIBUTE Name
# @ATTRIBUTE Abv

Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%

Bayerische Staatsbrauerei Weihenstephan,  Freising,  Hefe Weissbier,   5.4%
Brauerei Spezial,                         Bamberg,   Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%

Or use the CSV meta data in CSV style:

##########################
# try with some comments
#   and blank lines even header

Col, Name
1,   Brewery
2,   City
3,   Name
4,   Abv


Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%

Bayerische Staatsbrauerei Weihenstephan,  Freising,  Hefe Weissbier,   5.4%
Brauerei Spezial,                         Bamberg,   Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%

CSV <3 Numerics

CSV Numerics Format - Comma-Separated Values (CSV) Line-by-Line Records with Auto-Converted Numerics (Float Numbers) Encoding Rules - A Modern (Simple) Tabular Data Format incl. Numbers, Comments and More

See csvspecs/csv-numerics »

CSV <3 JSON

CSV JSON Format - Comma-Separated Values (CSV) Line-by-Line Records with JSON Encoding Rules - A Modern (Simple) Tabular Data Format incl. Arrays, Numbers, Booleans, Nulls, Nested Structures, Comments and More

Examples:

# "Vanilla" CSV <3 JSON

1,"John","12 Totem Rd. Aspen",true
2,"Bob",null,false
3,"Sue","Bigsby, 345 Carnival, WA 23009",false

or

# "Vanilla" CSV <3 JSON (Pretty Printed)

1, "John", "12 Totem Rd. Aspen",            true
2, "Bob",  null,                            false
3, "Sue", "Bigsby, 345 Carnival, WA 23009", false

See csvspecs/csv-json »

CSV <3 YAML

CSV YAML Format - Comma-Separated Values (CSV) Line-by-Line Records with YAML Encoding Rules - A Modern (Simple) Tabular Data Format incl. Arrays, Numbers, Booleans, Nulls, Nested Structures, Comments and More

Examples:

# "Vanilla" CSV <3 YAML

1,John,12 Totem Rd. Aspen,true
2,Bob,null,false
3,Sue,"Bigsby, 345 Carnival, WA 23009",false

or

# "Vanilla" CSV <3 YAML (Pretty Printed)

1, John, 12 Totem Rd. Aspen,               true
2, Bob,  null,                             false
3, Sue,  "Bigsby, 345 Carnival, WA 23009", false

See csvspecs/csv-yaml »

CSV v1.1 "Modern"

#####################
# North America

# area (in sq km), pop(ulation)

ca, Canada,          CAN,   9 984 670 km²,  34 278 406
us, United States,   USA,   9 629 091 km², 314 167 157
mx, México [Mexico], MEX,   1 972 550 km², 112 322 757
...

Can I use __ ? (v1.1)

Q: Can I use blank lines?

A: Yes, of course. A blank line is just a blank line. Use freely to format (beautify) your data.

Q: Can I use comments?

A: Yes, of course. Use # for comments. See the example above.

Delimiter / Separator

Why use commas, commas, commas?

SpaceTabField Separator (FS)Other

Space

Did you know? In the English (or German) language the most popular word delimiter / separator is - surprise, surprise - space. You're looking at spaces in action right now and right here ;-)

Why not use spaces?

  • Spaces work great (and are the best) only if you do NOT use s p a c e s · i n · v a l u e s. For example, is United States one value or two? See?
  • If it's only one value than you need to quote it e.g. "United States".

By using commas you do NOT need to quote spaces in values, that is, use us, United States, USA instead of us "United States" USA.

Tab

In theory the tab (\t) separator is perfect. Values never use tabs, don't they? So why hasn't the tab separator taken off?

In practice tab separators are invisible or look like spaces and often you cannot tell if a space is a tab or not.

Thus, tab works great only and only (like space) if your values do NOT use spaces and you treat a tab like a space.

Field Separator (FS)

Again in theory the untypeable (unprintable) field separator (ASCII Code 31) is perfect. Values never use ASCII field separators.

In practice if the field separator is invisible and unprintable how do you type it on your keyboard?

Remember: The point of comma-separated values (CSV) is an easy-to-write and easy-to-read format for humans first (not for machines).

Other

  • Pipe (|)
  • Semicolon (;)
  • Colon (:)
  • Interpunct (·)
  • Double Interpunct
  • And many others

Articles

Wikipedia

More

Initiatives

Frictionless DataCSV on the WebCSV 1.1 / CSV Next

Frictionless Data

lightweight standards and tooling to make it effortless to get, share, and validate data

by Open Knowledge Foundation (OKFN)

web: frictionlessdata.io, github: frictionlessdata

CSV on the Web

by World Wide Web Consortium (W3C)

github: w3c/csvw

CSV v1.1 / CSV Next

  • CSV Next Notes - Better Comma-Separated Values (CSV) Notes; Adding Comments, Named Values, Multi-Line Records, and More.

Libraries & Tools

Ruby

  • CSV Reader Library - (Source) - modern alternative to the broken ruby csv standard library

  • Honey Format Library / Tool - (Source), (Doc) by Jacob Burenstam -- Makes working with CSVs as smooth as honey. Proper objects for CSV headers and rows, convert column values, filter columns and rows, small(-ish) perfomance overhead, no dependencies other than Ruby standard library.

csv_string = <<~CSV
  email,name,born,country
  [email protected],John,2000-03-03,SE
  [email protected],Jane,1970-03-03,SE
  [email protected],Chris,1980-03-03,DK
CSV

# Print all rows where born is before 1990 and country code is 'SE'
csv = HoneyFormat::CSV.new(csv_string, type_map: { born: :date })
csv_string = csv.to_csv(columns: %i[born country]) do |row|
  row.country == 'SE' && row.born < Date.new(1990, 1, 1)
end
puts csv_string
  • Ruby Toolbox CSV Category - (Link)

Python

Perl

JavaScript

Racket

Rust

Conferences

CSV,Conf

A conference for data makers everywhere

web: csvconf.com, github: csvconf, twitter: CSVConference

  • csv,conf,v4 - 2019 - May 8+9 @ Portland, Oregon, United States
  • csv,conf,v3 - 2017 - May 2+3 @ Portland, Oregon, United States
  • csv,conf,v2 - 2016 - May 3+4 @ Berlin, Germany
  • csv,conf,v1 - 2014 - July 15 @ Berlin, Germany

Awesome Awesomeness

A curated list of awesome lists.

Meta

License

The awesome list is dedicated to the public domain. Use it as you please with no restrictions whatsoever.

Questions? Comments?

Post them to the wwwmake forum. Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].