Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (+544.74%)

Mutual labels: parquet

Kglab

Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

Stars: ✭ 98 (+157.89%)

Mutual labels: parquet

Parquet Rs

Apache Parquet implementation in Rust

Stars: ✭ 144 (+278.95%)

Mutual labels: parquet

Parquet Go

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

Stars: ✭ 114 (+200%)

Mutual labels: parquet

Sqlite Parquet Vtable

A SQLite vtable extension to read Parquet files

Stars: ✭ 167 (+339.47%)

Mutual labels: parquet

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (+3263.16%)

Mutual labels: parquet

jpopup

Simple lightweight (<2kB) javascript popup modal plugin

Stars: ✭ 27 (-28.95%)

Mutual labels: dependency-free

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (+468.42%)

Mutual labels: parquet

Kartothek

A consistent table management library in python

Stars: ✭ 144 (+278.95%)

Mutual labels: parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+268.42%)

Mutual labels: parquet

Amazon S3 Find And Forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (+202.63%)

Mutual labels: parquet

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+365.79%)

Mutual labels: parquet

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (+186.84%)

Mutual labels: parquet

openmrs-fhir-analytics

A collection of tools for extracting FHIR resources and analytics services on top of that data.

Stars: ✭ 55 (+44.74%)

Mutual labels: parquet

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (+155.26%)

Mutual labels: parquet

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+4221.05%)

Mutual labels: parquet

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (+281.58%)

Mutual labels: parquet

velox

The minimal PHP micro-framework.

Stars: ✭ 55 (+44.74%)

Mutual labels: dependency-free

denoliver

A simple, dependency free static file server for Deno with possibly the worst name ever.

Stars: ✭ 94 (+147.37%)

Mutual labels: dependency-free

View All Similar Projects ➔

miniparquet

miniparquet is a reader for a common subset of Parquet files. miniparquet only supports rectangular-shaped data structures (no nested tables) and only the Snappy compression scheme. miniparquet has no (zero, none, 0) external dependencies and is very lightweight. It compiles in seconds to a binary size of under 1 MB.

Installation

Miniparquet comes as C++ library, a Python package and a R package. Install the R package like so:

devtools::install_github("hannesmuehleisen/miniparquet")

The C++ library can be built by typing make.

The Python package is installed using python setup.py install

Usage

Use the R package like so: df <- miniparquet::parquet_read("example.parquet")

Folders of similar-structured Parquet files (e.g. produced by Spark) can be read like this:

df <- data.table::rbindlist(lapply(Sys.glob("some-folder/part-*.parquet"), miniparquet::parquet_read))

If you find a file that should be supported but isn't, please open an issue here with a link to the file.

Use the Python package like so: miniparquet.read('example.parquet'). You can convert the result to a Pandas dataframe like so: pandas.DataFrame.from_dict(miniparquet.read('example.parquet'))

Performance

miniparquet is quite fast, on my laptop (I7-4578U) it can read compressed Parquet files at over 200 MB/s using only a single thread. Previously, there was a comparision with the arrow package here, but it appeared that results were caused by a bug which is fixed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hannesmuehleisen / miniparquet

Programming Languages

Labels

Projects that are alternatives of or similar to miniparquet

miniparquet

Installation

Usage

Performance