All Projects → jkeen → comma_splice

jkeen / comma_splice

Licence: MIT license
Fixes CSVs with unquoted commas in values

Programming Languages

ruby
36898 projects - #4 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to comma splice

Cursively
A CSV reader for .NET. Fast, RFC 4180 compliant, and fault tolerant. UTF-8 only.
Stars: ✭ 34 (-49.25%)
Mutual labels:  csv-files, csv-reading, csv-parser
gpx-converter
python package for manipulating gpx files and easily converting gpx to other different formats
Stars: ✭ 54 (-19.4%)
Mutual labels:  csv-converter, csv-files, csv-parser
Clevercsv
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Stars: ✭ 887 (+1223.88%)
Mutual labels:  csv-files, csv-parser
Intellij Csv Validator
CSV validator, highlighter and formatter plugin for JetBrains Intellij IDEA, PyCharm, WebStorm, ...
Stars: ✭ 198 (+195.52%)
Mutual labels:  csv-files, csv-parser
csvlixir
A CSV reading/writing application for Elixir.
Stars: ✭ 32 (-52.24%)
Mutual labels:  csv-reading, csv-parser
CsvTextFieldParser
A simple CSV parser based on Microsoft.VisualBasic.FileIO.TextFieldParser.
Stars: ✭ 40 (-40.3%)
Mutual labels:  csv-reading, csv-parser
Awesomecsv
🕶️A curated list of awesome tools for dealing with CSV.
Stars: ✭ 305 (+355.22%)
Mutual labels:  csv-files, csv-parser
Filehelpers
The FileHelpers are a free and easy to use .NET library to read/write data from fixed length or delimited records in files, strings or streams
Stars: ✭ 917 (+1268.66%)
Mutual labels:  csv-files, csv-parser
Csv File Validator
🔧🔦 Validation of CSV file against user defined schema (returns back object with data and invalid messages)
Stars: ✭ 60 (-10.45%)
Mutual labels:  csv-files, csv-parser
Importexportfree
Improve default Magento 2 Import / Export features - cron jobs, CSV , XML , JSON , Excel , mapping of any format, Google Sheet, data and price modification, improved speed and a lot more!
Stars: ✭ 160 (+138.81%)
Mutual labels:  csv-files
covid-19-usa-by-state
CSV files of COVID-19 total daily confirmed cases and deaths in the USA by state and county. All data from Johns Hopkins & NYT..
Stars: ✭ 35 (-47.76%)
Mutual labels:  csv-files
Winmerge
WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
Stars: ✭ 2,358 (+3419.4%)
Mutual labels:  csv-files
po-csv
Convert gettext PO files from CSV files and merge them back in.
Stars: ✭ 32 (-52.24%)
Mutual labels:  csv-files
Adaptivetablelayout
Library that makes it possible to read, edit and write CSV files
Stars: ✭ 1,871 (+2692.54%)
Mutual labels:  csv-files
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (+52.24%)
Mutual labels:  csv-files
datapackage-m
Power Query M functions for working with Tabular Data Packages (Frictionless Data) in Power BI and Excel
Stars: ✭ 26 (-61.19%)
Mutual labels:  csv-files
brain-brew
Automated Anki flashcard creation and extraction to/from Csv
Stars: ✭ 55 (-17.91%)
Mutual labels:  csv-converter
Csvimporter
Import CSV files line by line with ease
Stars: ✭ 120 (+79.1%)
Mutual labels:  csv-files
CSV2RDF
Streaming, transforming, SPARQL-based CSV to RDF converter. Apache license.
Stars: ✭ 48 (-28.36%)
Mutual labels:  csv-converter
Windmill
A library to parse or write Excel and CSV files through a fluent API
Stars: ✭ 19 (-71.64%)
Mutual labels:  csv-parser

Comma Splice

This gem tackles one very specific problem: when CSVs have commas in the values and the values haven't been quoted. This determines which commas separate fields and which commas are part of a value, and corrects the file.

For example, given the following CSV

timestamp,artist,title,albumtitle,label
01-27-2019 @ 12:34:00,Lester Sterling, Lynn Taitt & The Jets,Check Point Charlie,Merritone Rock Steady 3: Bang Bang Rock Steady 1966-1968,Dub Store,
01-27-2019 @ 12:31:00,Lester Sterling,Lester Sterling Special,Merritone Rock Steady 2: This Music Got Soul 1966-1967,Dub Store,

which parses incorrectly as:

timestamp artist title albumtitle label
01-27-2019 @ 12:34:00 Lester Sterling Lynn Taitt & The Jets Check Point Charlie Merritone Rock Steady 3: Bang Bang Rock Steady 1966-1968
01-27-2019 @ 12:31:00 Lester Sterling Lester Sterling Special Merritone Rock Steady 2: This Music Got Soul 1966-1967 Dub Store

Running this through comma_splice correct /path/to/file will return this corrected content:

timestamp,artist,title,albumtitle,label
01-27-2019 @ 12:34:00,"Lester Sterling, Lynn Taitt & The Jets",Check Point Charlie,Merritone Rock Steady 3: Bang Bang Rock Steady 1966-1968,Dub Store,
01-27-2019 @ 12:31:00,Lester Sterling,Lester Sterling Special,Merritone Rock Steady 2: This Music Got Soul 1966-1967,Dub Store,
timestamp artist title albumtitle label
01-27-2019 @ 12:34:00 Lester Sterling, Lynn Taitt & The Jets Check Point Charlie Merritone Rock Steady 3: Bang Bang Rock Steady 1966-1968 Dub Store
01-27-2019 @ 12:31:00 Lester Sterling Lester Sterling Special Merritone Rock Steady 2: This Music Got Soul 1966-1967 Dub Store

If it can't determine where the comma should go, it prompts you for the possible options

given the following CSV:

playid,playtype,genre,timestamp,artist,title,albumtitle,label,prepost,programtype,iswebcast,isrequest
16851097,,,12-09-2017 @ 09:57:00,10,000 Maniacs and Michael Stipe,To Sir with Love,Campfire Songs,Rhino,post,live,y,
16851096,,,12-09-2017 @ 09:44:00,Fran Jeffries,Mine Eyes,Fran Can Really Hang You Up the Most,Warwick,post,live,y,

It prompts:

Which one of these is correct?

(1)  artist    : 10
     title     : 000 Maniacs and Michael Stipe
     albumtitle: To Sir with Love
     label     : "Campfire Songs,Rhino"

(2)  artist    : 10
     title     : 000 Maniacs and Michael Stipe
     albumtitle: "To Sir with Love,Campfire Songs"
     label     : Rhino

(3)  artist    : 10
     title     : "000 Maniacs and Michael Stipe,To Sir with Love"
     albumtitle: Campfire Songs
     label     : Rhino

(4)  artist    : "10,000 Maniacs and Michael Stipe"
     title     : To Sir with Love
     albumtitle: Campfire Songs
     label     : Rhino

Select an option (4), and it returns:

playid,playtype,genre,timestamp,artist,title,albumtitle,label,prepost,programtype,iswebcast,isrequest
16851097,,,12-09-2017 @ 09:57:00,"10,000 Maniacs and Michael Stipe",To Sir with Love,Campfire Songs,Rhino,post,live,y,
16851096,,,12-09-2017 @ 09:44:00,Fran Jeffries,Mine Eyes,Fran Can Really Hang You Up the Most,Warwick,post,live,y,

Usage

You can use this in a ruby program by using installing the comma_splice gem, or you can install it on your system and use the comma_splice command line utility.

Return the number of bad lines in a file
  CommaSplice::FileCorrector.new(file_path).bad_lines.size

  #you can specify another separator
  CommaSplice::FileCorrector.new(file_path, separator: ';').bad_lines.size
  comma_splice bad_line_count /path/to/file.csv
Display the fixed contents
  CommaSplice::FileCorrector.new(file_path).corrected
  
  #you can specify another separator
  CommaSplice::FileCorrector.new(file_path, separator: ';').corrected
  comma_splice correct /path/to/file.csv
Process a file and save the fixed version
  CommaSplice::FileCorrector.new(file_path).save(save_path)
  
  #you can specify another separator
  CommaSplice::FileCorrector.new(file_path, separator: ';').save(save_path)
  comma_splice fix /path/to/file.csv /path/to/save

Installation

Add this line to your application's Gemfile:

gem 'comma_splice'

And then execute:

$ bundle

Or install it yourself as:

$ gem install comma_splice

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/jkeen/comma_splice.

License

The gem is available as open source under the terms of the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].