All Projects → juancarlospaco → Faster Than Csv

juancarlospaco / Faster Than Csv

Licence: mit
Faster CSV on Python 3

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects
cython
566 projects

Projects that are alternatives of or similar to Faster Than Csv

Intellij Csv Validator
CSV validator, highlighter and formatter plugin for JetBrains Intellij IDEA, PyCharm, WebStorm, ...
Stars: ✭ 198 (+280.77%)
Mutual labels:  csv, tsv, csv-parser
Vroom
Fast reading of delimited files
Stars: ✭ 462 (+788.46%)
Mutual labels:  csv, tsv, csv-parser
Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+8757.69%)
Mutual labels:  csv, tsv, tabular-data
Miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+8809.62%)
Mutual labels:  csv, tsv, tabular-data
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+2236.54%)
Mutual labels:  csv, tsv, tabular-data
tabular-stream
Detects tabular data (spreadsheets, dsv or json, 20+ different formats) and emits normalized objects.
Stars: ✭ 34 (-34.62%)
Mutual labels:  tsv, csv, tabular-data
Meza
A Python toolkit for processing tabular data
Stars: ✭ 374 (+619.23%)
Mutual labels:  csv, tabular-data
Jsoncons
A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
Stars: ✭ 400 (+669.23%)
Mutual labels:  csv, csv-parser
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+711.54%)
Mutual labels:  csv, tsv
Csvutil
csvutil provides fast and idiomatic mapping between CSV and Go (golang) values.
Stars: ✭ 501 (+863.46%)
Mutual labels:  csv, csv-parser
Swiftcsv
CSV parser for Swift
Stars: ✭ 511 (+882.69%)
Mutual labels:  csv, tsv
Csvtk
A cross-platform, efficient and practical CSV/TSV toolkit in Golang
Stars: ✭ 566 (+988.46%)
Mutual labels:  csv, tsv
Csv Parser
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.
Stars: ✭ 359 (+590.38%)
Mutual labels:  csv, csv-parser
Rainbow csv
🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language
Stars: ✭ 337 (+548.08%)
Mutual labels:  csv, tsv
Awesomecsv
🕶️A curated list of awesome tools for dealing with CSV.
Stars: ✭ 305 (+486.54%)
Mutual labels:  csv, csv-parser
Sq
swiss-army knife for data
Stars: ✭ 275 (+428.85%)
Mutual labels:  csv, tsv
Rows
A common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+1321.15%)
Mutual labels:  csv, tabular-data
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+11784.62%)
Mutual labels:  csv, tsv
Filehelpers
The FileHelpers are a free and easy to use .NET library to read/write data from fixed length or delimited records in files, strings or streams
Stars: ✭ 917 (+1663.46%)
Mutual labels:  csv, csv-parser
Pyexcel Io
One interface to read and write the data in various excel formats, import the data into and export the data from databases
Stars: ✭ 40 (-23.08%)
Mutual labels:  csv, tsv

Faster-than-CSV

screenshot

Library Time (Speed)
Pandas read_csv() 20.09
NumPy fromfile() 3.88
NumPy genfromtxt() 4.00
NumPy loadtxt() 1.26
csv (std lib) 0.40
csv (list) 0.38
csv (map) 0.37
Faster_than_csv 0.08
  • This CSV Lib is ~200 Lines of Code.
  • Benchmarks run on Docker from Dockerfile on this repo.
  • Speed is IRL time to complete 10000 CSV Parsings.
  • Lines Of Code counted using CLOC.
  • Direct dependencies of the package when ready to run.
  • Benchmarks run on Docker from Dockerfile on this repo.
  • Stats as of year 2019.
  • x86_64 64Bit AMD, SSD, Arch Linux.

Use

import faster_than_csv as csv

csv.csv2list("example.csv")                     # See Docs for more info.
                                                # Custom Separators supported.
csv.csv2json("example.csv", indentation=4)      # CSV to JSON, Pretty-Printed.

csv.csv2htmltable("example.csv")                # CSV to HTML+CSS Table (No JavaScript).

csv.read_clipboard()                            # CSV from the Clipboard.

csv.diff_csvs("example.csv", "anotherfile.csv") # Diff optimized for CSVs.
  • Input: CSV, TSV, Clipboard, File, URL, Custom.
  • Output: CSV, TSV, HTML, JSON, NDJSON, Diff, File, Custom.

csv2dict()

Description: Takes a path of a CSV file string, process CSV and returns a list of dictionaries. This is very similar to pandas.read_csv(filename).

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • has_header Set to True for CSV with Header, bool type, optional, defaults to True.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.

Returns: Data from the CSV, dict type.

read_clipboard()

Description: Reads CSV string from Clipboard, process CSV and returns a list of dictionaries. This is very similar to pandas.read_clipboard(). This works on Linux, Mac, Windows.

Arguments:

  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.

Returns: Data from the CSV, dict type.

csv2json()

Description: Takes a path of a CSV file string, process CSV and returns JSON.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.
  • indentation Pretty-Printed or Minified JSON output, int type, optional, 0 is Minified, 4 is Pretty-Printed, you can use any integer to adjust the indentation.

Returns: Data from the CSV as JSON Minified Single-line string computer-friendly, str type.

csv2ndjson()

Description: Takes a path of a CSV file string, process CSV and returns NDJSON.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • ndjson_file_path path of the NDJSON file, str type, required, must not be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.

Returns: None. Data from the CSV as NDJSON https://github.com/ndjson/ndjson-spec, str type.

csv2htmltable()

Description: Takes a path of a CSV file string, process CSV and returns the data rendered on HTML Table.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string, defaults to "", if its empty string then No file is written.
  • html_file_path path of the CSV file, str type, optional, can be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.
  • header_html HTML Header, str type, optional, defaults to Bulma CSS, can be empty string.

Returns: Data from the CSV as HTML Table, str type, raw HTML (no style at all).

csv2karax()

Description: Takes a path of a CSV file string, process CSV and returns the data rendered as a Karax HTML Table.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.

Returns: Karax DSL, str type.

csv2terminal()

Description: Takes a path of a CSV file string, process CSV and prints to terminal a colored prety-printed table.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string, defaults to "", if its empty string then No file is written.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • column_width column width of the wider column, required, int type, must not be 0, must not be negative.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.

Returns: None.

csv2xml()

Description: Takes a path of a CSV file string, process CSV and returns a Valid XML string. Output is guaranteed to be always Valid XML.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator Separator character of the CSV data, str type, optional, defaults to ',', must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.
  • header_xml XML Header of the XML string, str type, optional, can be empty string, defaults to "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n".

Returns: XML, str type.

tsv2csv()

Description: Takes a path of a CSV file string, process CSV and returns a TSV.

Arguments:

  • csv_file_path path of the CSV file, str type, required, must not be empty string.
  • columns total column count, optional, int type, ignored if 0, default to 0, faster performance if is not 0.
  • separator1 Separator character of the CSV data, str type, optional, must not be empty string.
  • separator2 Separator character of the CSV data, str type, optional, must not be empty string.
  • quote Quote character of the CSV data, str type, optional, defaults to '"', must not be empty string.
  • skipInitialSpace Set to True to ignore empty blank whitespace at the start of the CSV file, bool type, optional, defaults to False since is not technically valid.
  • reversed Set to True for the opposite behaviour TSV-to-CSV, bool type, optional, defaults to False.

Returns: Data from the CSV as TSV, str type.

diff_csvs()

Description: Takes 2 paths of 2 CSV files, process CSV and returns the Diff of the 2 CSV.

Arguments:

  • csv_file_path0 path of the CSV file, str type, required, must not be empty string, file must exist.
  • csv_file_path1 path of the CSV file, str type, required, must not be empty string, file must exist.

Returns: Diff.

For more Examples check the Examples and Tests.

Instead of having a pair of functions with a lot of arguments that you should provide to make it work, we have tiny functions with very few arguments that do one thing and do it as fast as possible.

Install

  • pip install faster_than_csv

Docker

  • Make a quick test drive on Docker!.
$ ./build-docker.sh
$ ./run-docker.sh
$ ./run-benchmark.sh  # Inside Docker.

Dependencies

  • None

Platforms

  • ✅ Linux
  • ✅ Windows
  • ✅ Mac
  • ✅ Android
  • ✅ Raspberry Pi
  • ✅ BSD

Requisites

  • Python 3.
  • GCC.
  • 64 Bit.

Windows

  • If installation fails on Windows, just use the Source Code:

win-compile

Stars

Star faster-than-csv on GitHub

FAQ

  • Whats the idea, inspiration, reason, etc ?.

Feel free to Fork, Clone, Download, Improve, Reimplement, Play with this Open Source. Make it 10 times faster, 10 times smaller.

  • This requires Cython ?.

No.

  • This runs on PyPy ?.

No.

  • This runs on Python2 ?.

I dunno. (Not supported)

  • How can I Install it ?.

https://github.com/juancarlospaco/faster-than-csv/releases

If you dont understand how to install it, you can just download, extract, put the files on the same folder as your *.py file and you are good to go.

  • How can be faster than NumPy ?.

I dunno.

  • How can be faster than Pandas ?.

I dunno.

  • Why needs 64Bit ?.

Maybe it works on 32Bit, but is not supported, integer sizes are too small, and performance can be worse.

  • Why needs Python 3 ?.

Maybe it works on Python 2, but is not supported, and performance can be worse, we suggest to migrate to Python3.

  • Can I wrap the functions on a try: except: block ?.

Functions do not have internal try: except: blocks, so you can wrap them inside try: except: blocks if you need very resilient code.

  • PIP fails to install or fails build the wheel ?.

Add at the end of the PIP install command:

--isolated --disable-pip-version-check --no-cache-dir --no-binary :all:

Not my Bug.

  • How to Build the project ?.

build.sh

  • How to Package the project ?.

package.sh

  • This requires Nimble ?.

No.

  • Whats the unit of measurement for speed ?.

Unmmodified raw output of Python timeit module.

Please send Pull Request to Python to improve the output of timeit.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].