Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → JDASoftwareGroup → Kartothek

JDASoftwareGroup / Kartothek

Licence: mit

A consistent table management library in python

Programming Languages

python

139335 projects - #7 most used programming language

Labels

parquet arrow pydata

Projects that are alternatives of or similar to Kartothek

graphique

GraphQL service for arrow tables and parquet data sets.

Stars: ✭ 28 (-80.56%)

Mutual labels: arrow, parquet

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (+70.14%)

Mutual labels: parquet, arrow

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (+50%)

Mutual labels: parquet, arrow

Roapi

Create full-fledged APIs for static datasets without writing a single line of code.

Stars: ✭ 253 (+75.69%)

Mutual labels: parquet, arrow

Amazon S3 Find And Forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (-20.14%)

Mutual labels: parquet

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (-32.64%)

Mutual labels: parquet

Arrow.jl

Pure Julia implementation of the apache arrow data format (https://arrow.apache.org/)

Stars: ✭ 92 (-36.11%)

Mutual labels: arrow

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (+787.5%)

Mutual labels: parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-2.78%)

Mutual labels: parquet

Pydata Chicago2016 Ml Tutorial

Machine learning with scikit-learn tutorial at PyData Chicago 2016

Stars: ✭ 128 (-11.11%)

Mutual labels: pydata

Leader Line

Draw a leader line in your web page.

Stars: ✭ 1,872 (+1200%)

Mutual labels: arrow

Kglab

Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

Stars: ✭ 98 (-31.94%)

Mutual labels: parquet

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+1024.31%)

Mutual labels: parquet

Pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

Stars: ✭ 92 (-36.11%)

Mutual labels: pydata

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+1040.28%)

Mutual labels: parquet

Open Arrow

Open Arrow is an open-source font that contains 112 arrow symbols from U+2190 to U+21ff

Stars: ✭ 89 (-38.19%)

Mutual labels: arrow

Blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

Stars: ✭ 1,652 (+1047.22%)

Mutual labels: arrow

Parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Stars: ✭ 125 (-13.19%)

Mutual labels: parquet

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-24.31%)

Mutual labels: parquet

Pymapd

Python client for OmniSci GPU-accelerated SQL engine and analytics platform

Stars: ✭ 109 (-24.31%)

Mutual labels: pydata

View All Similar Projects ➔

Kartothek

Kartothek is a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store. It stores data as datasets, which it presents as pandas DataFrames to the user. Datasets are a collection of files with the same schema that reside in a blob store. Kartothek uses a metadata definition to handle these datasets efficiently. For distributed access and manipulation of datasets Kartothek offers a Dask interface.

Storing data distributed over multiple files in a blob store (S3, ABS, GCS, etc.) allows for a fast, cost-efficient and highly scalable data infrastructure. A downside of storing data solely in an object store is that the storages themselves give little to no guarantees beyond the consistency of a single file. In particular, they cannot guarantee the consistency of your dataset. If we demand a consistent state of our dataset at all times, we need to track the state of the dataset. Kartothek frees us from having to do this manually.

The kartothek.io module provides building blocks to create and modify these datasets in data pipelines. Kartothek handles I/O, tracks dataset partitions and selects subsets of data transparently.

Installation

Installers for the latest released version are availabe at the Python package index and on conda.

# Install with pip
pip install kartothek

# Install with conda
conda install -c conda-forge kartothek

What is a (real) Kartothek?

A Kartothek (or more modern: Zettelkasten/Katalogkasten) is a tool to organize (high-level) information extracted from a source of information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 144

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (57) 🔗