All Projects → ngrunwald → meta-csv

ngrunwald / meta-csv

Licence: EPL-2.0 license
A Clojure smart reader for CSV files

Programming Languages

clojure
4091 projects

Projects that are alternatives of or similar to meta-csv

Octosql
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Stars: ✭ 2,579 (+12795%)
Mutual labels:  csv, data-analysis
Dataproofer
A proofreader for your data
Stars: ✭ 628 (+3040%)
Mutual labels:  csv, data-analysis
Notebooks
All of our computational notebooks
Stars: ✭ 292 (+1360%)
Mutual labels:  csv, data-analysis
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (+105%)
Mutual labels:  csv, data-analysis
Volbx
Graphical tool for data manipulation written in C++/Qt
Stars: ✭ 187 (+835%)
Mutual labels:  csv, data-analysis
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (+525%)
Mutual labels:  csv, data-analysis
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+4735%)
Mutual labels:  csv, data-analysis
Tad
A desktop application for viewing and analyzing tabular data
Stars: ✭ 2,275 (+11275%)
Mutual labels:  csv, data-analysis
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+4115%)
Mutual labels:  csv, data-analysis
Fraud-Detection-in-Online-Transactions
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting
Stars: ✭ 41 (+105%)
Mutual labels:  data-analysis
inspector-metrics
Typescript metrics / monitoring library
Stars: ✭ 19 (-5%)
Mutual labels:  csv
MiniExcel
Fast, Low-Memory, Easy Excel .NET helper to import/export/template spreadsheet
Stars: ✭ 996 (+4880%)
Mutual labels:  csv
stock-market-scraper
Scraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)
Stars: ✭ 110 (+450%)
Mutual labels:  csv
csv-nix-tools
List system information as CSV, manipulate it, pretty print, or export.
Stars: ✭ 22 (+10%)
Mutual labels:  csv
spyql
Query data on the command line with SQL-like SELECTs powered by Python expressions
Stars: ✭ 694 (+3370%)
Mutual labels:  csv
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+150%)
Mutual labels:  data-analysis
computational-neuroscience
Short undergraduate course taught at University of Pennsylvania on computational and theoretical neuroscience. Provides an introduction to programming in MATLAB, single-neuron models, ion channel models, basic neural networks, and neural decoding.
Stars: ✭ 36 (+80%)
Mutual labels:  data-analysis
fql
Formatted text processing with SQL
Stars: ✭ 20 (+0%)
Mutual labels:  csv
csv2latex
🔧 Simple script in python to convert CSV files to LaTeX table
Stars: ✭ 54 (+170%)
Mutual labels:  csv
osm-data-classification
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
Stars: ✭ 23 (+15%)
Mutual labels:  data-analysis

meta-csv

A smart reader for CSV files, spiritual successor to ultra-csv.

Features

  • Smart statistical heuristics to guess pretty much anything about your csv file, from delimiter to quotes and whether a header is present
  • Handles for you the boring but dangerous stuff, like encoding detection and bom skipping if present, but also embedded new lines and quote escaping
  • Coerces the numerical values that have been recognised. The types and coercions can be extended by the user to dates, phone numbers, etc.
  • Designed to be both very easy to use in an exploratory way to get a quick feel for the data, and then be put into production with almost the same code

Installation

meta-csv is available as a Maven artifact from Clojars:

In your project.clj dependencies for leiningen:

Clojars Project

Usage

The easiest way to use when hacking at the REPL is simply:

(require '[meta-csv.core :as csv])
(first (csv/read-csv "./dev-resources/samples/marine-economy-2007-18.csv"))

=> {:year 2007,
    :category "Fisheries and aquaculture",
    :variable "Cont. to ME Wage and salary earners",
    :units "Proportion",
    :magnitude "Actual",
    :source "LEED",
    :data_value 43.1,
    :flag "R"}

If the file has a header, this returns a lazy seq of maps of field names to values.

If any field name would be problematic as keyword, then all field names will be strings instead:

(first (csv/read-csv "./dev-resources/samples/sales-records-sample.csv"))

=> {"Region" "Australia and Oceania",
    "Country" "Tuvalu",
    "Item Type" "Baby Food",
    "Sales Channel" "Offline",
    "Order Priority" "H",
    "Order Date" "5/28/2010",
    "Order ID" 669165933,
    "Ship Date" "6/27/2010",
    "Units Sold" 9925,
    "Unit Price" 255.28,
    "Unit Cost" 159.42,
    "Total Revenue" 2533654.0,
    "Total Cost" 1582243.5,
    "Total Profit" 951410.5}

The maps are array-maps, which means the order of the keys is the same as the order of the fields in the file.

If no header is present, the rows will be returned as a seq of vectors, in the same fashion as clojure.data.csv/read-csv.

A lot of options are available, as an optional second argument spec. Check the docstring for a more or less exhaustive description.

This spec can actually be created by another noteworthy function, guess-spec.

(csv/guess-spec "./dev-resources/samples/marine-economy-2007-18.csv")

=> {:fields
    [{:field :year, :type :long}
     {:field :category, :type :string}
     {:field :variable, :type :string}
     {:field :units, :type :string}
     {:field :magnitude, :type :string}
     {:field :source, :type :string}
     {:field :data_value, :type :double}
     {:field :flag, :type :string}],
    :delimiter \,,
    :bom :none,
    :encoding "ISO-8859-1",
    :skip-analysis? true,
    :header? true,
    :quoted? false}

Then the :fields vector describing the processing on each field can be customized to produce exactly the right format of data. This spec can be used directly as the second argument to read-csv.

The useful functions are extensively documented in the docstrings of the API Documentation.

cljdoc badge

The test file also contains interesting examples.

Tips and tricks

Need to get out put as an array like clojure.data.csv but with type coercions? The :skip param skips the first line and the false :header? returns arrays.

(first (csv/read-csv "./dev-resources/samples/marine-economy-2007-18.csv" {:skip 1 :header? false}))

=> [2007
    "Fisheries and aquaculture"
    "Cont. to ME Wage and salary earners"
    "Proportion"
    "Actual"
    "LEED"
    43.1
    "R"]

One of the differences with ultra-csv is that meta-csv makes no attempt at validating output data. Validation is an important concern but should not be handled by the file format parser, even a smart one. I recommend however in production using something like spec-provider to generate specs and validating the data with them when they come from a manual source.

In the same spirit, I tend to use read-csv at the REPL when doing analysis work, but when and if going to production, I generate a spec with guess-spec and uses that with read-csv, to make the process more reliable if the input file format presents problems at a future time.

License

Copyright © 2019-2020 Nils Grunwald

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].