All Projects → zrlio → albis

zrlio / albis

Licence: other
Albis: High-Performance File Format for Big Data Systems

Projects that are alternatives of or similar to albis

Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (+175%)
Mutual labels:  parquet, spark-sql
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-5%)
Mutual labels:  parquet, spark-sql
Gfa Spec
Graphical Fragment Assembly (GFA) Format Specification
Stars: ✭ 117 (+485%)
Mutual labels:  file-format
geospark
bring sf to spark in production
Stars: ✭ 53 (+165%)
Mutual labels:  spark-sql
miniparquet
Library to read a subset of Parquet files
Stars: ✭ 38 (+90%)
Mutual labels:  parquet
Matio
MATLAB MAT File I/O Library
Stars: ✭ 206 (+930%)
Mutual labels:  file-format
parquet-flinktacular
How to use Parquet in Flink
Stars: ✭ 29 (+45%)
Mutual labels:  parquet
Python Altium
Altium schematic format documentation, SVG converter and TK viewer
Stars: ✭ 112 (+460%)
Mutual labels:  file-format
parquet-extra
A collection of Apache Parquet add-on modules
Stars: ✭ 30 (+50%)
Mutual labels:  parquet
openmrs-fhir-analytics
A collection of tools for extracting FHIR resources and analytics services on top of that data.
Stars: ✭ 55 (+175%)
Mutual labels:  parquet
nix
Neuroscience information exchange format
Stars: ✭ 64 (+220%)
Mutual labels:  file-format
bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (+70%)
Mutual labels:  spark-sql
Klog
A plain-text file format and command line tool for time tracking
Stars: ✭ 222 (+1010%)
Mutual labels:  file-format
zipdump
Analyze zipfile, either local, or from url
Stars: ✭ 25 (+25%)
Mutual labels:  file-format
Bitmap
C++ Bitmap Library
Stars: ✭ 125 (+525%)
Mutual labels:  file-format
Tweet-Analysis-With-Kafka-and-Spark
A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.
Stars: ✭ 18 (-10%)
Mutual labels:  spark-sql
Cooler
A cool place to store your Hi-C
Stars: ✭ 112 (+460%)
Mutual labels:  file-format
Kaitai struct
Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Perl / PHP / Python / Ruby
Stars: ✭ 2,736 (+13580%)
Mutual labels:  file-format
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (+2090%)
Mutual labels:  parquet
dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+575%)
Mutual labels:  spark-sql

Albis

Albis: High-Performance File Format for Big Data Systems, Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, and Bernard Metzler, IBM Research, Zurich, https://www.usenix.org/conference/atc18/presentation/trivedi

Abstract: Over the last decade, a variety of external file formats such as Parquet, ORC, Arrow, etc., have been developed to store large volumes of relational data in the cloud. As high-performance networking and storage devices are used pervasively to process this data in frameworks like Spark and Hadoop, we observe that none of the popular file formats are capable of delivering data access rates close to the hardware. Our analysis suggests that multiple antiquated notions about the nature of I/O in a distributed setting, and the preference for the "storage efficiency" over performance is the key reason for this gap.

In this paper we present Albis, a high-performance file format for storing relational data on modern hardware. Albis is built upon two key principles: (i) reduce the CPU cost by keeping the data/metadata storage format simple; (ii) use a binary API for an efficient object management to avoid unnecessary object materialization. In our evaluation, we demonstrate that in micro-benchmarks Albis delivers 1.9-21.4x faster bandwidths than other formats. At the workload-level, Albis in Spark/SQL reduces the runtimes of TPC-DS queries up to a margin of 3x.

Source code

To appear here soon.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].