All Projects → sksamuel → centurion

sksamuel / centurion

Licence: Apache-2.0 License
Kotlin Bigdata Toolkit

Programming Languages

kotlin
9241 projects
java
68154 projects - #9 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to centurion

columnify
Make record oriented data to columnar format.
Stars: ✭ 28 (-91.25%)
Mutual labels:  bigdata, parquet
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-73.12%)
Mutual labels:  bigdata, parquet
BigDataTools
tools for bigData
Stars: ✭ 36 (-88.75%)
Mutual labels:  bigdata
v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+0.94%)
Mutual labels:  bigdata
taller SparkR
Taller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-96.25%)
Mutual labels:  bigdata
flokkr
Documentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (-90.62%)
Mutual labels:  bigdata
room-renting
用Python爬取安居客房源信息,并用高德地图进行可视化
Stars: ✭ 16 (-95%)
Mutual labels:  bigdata
UnROOT.jl
Native Julia I/O package to work with CERN ROOT files
Stars: ✭ 52 (-83.75%)
Mutual labels:  bigdata
meepo
异构存储数据迁移
Stars: ✭ 29 (-90.94%)
Mutual labels:  parquet
bqv
The simplest tool to manage views of BigQuery.
Stars: ✭ 22 (-93.12%)
Mutual labels:  bigdata
pulsar-user-group-loc-cn
Workspace for China local user group.
Stars: ✭ 19 (-94.06%)
Mutual labels:  bigdata
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (-82.19%)
Mutual labels:  bigdata
parquet-usql
A custom extractor designed to read parquet for Azure Data Lake Analytics
Stars: ✭ 13 (-95.94%)
Mutual labels:  parquet
datasphere-service
an open source dataworks platform
Stars: ✭ 20 (-93.75%)
Mutual labels:  bigdata
Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (-82.81%)
Mutual labels:  parquet
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-89.37%)
Mutual labels:  bigdata
Exposure
Exposure是一个帮助做曝光统计需求的库,可以很方便的对曝光事件进行埋点,在现有代码上少量侵入即可实现曝光埋点。支持RV的线性布局、网格布局、瀑布流布局、横向滑动RV,ScrollView等各种滚动布局。支持配置item的有效曝光面积。
Stars: ✭ 51 (-84.06%)
Mutual labels:  bigdata
parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Stars: ✭ 157 (-50.94%)
Mutual labels:  parquet
ETL-Starter-Kit
📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-93.44%)
Mutual labels:  bigdata
learning notes
学习笔记
Stars: ✭ 18 (-94.37%)
Mutual labels:  bigdata

Centurion

master License

Introduction

Centurion is a JVM (written in Kotlin) toolkit for columnar and streaming formats.

This library allows you to read, write and convert between the following formats:

Readers and writers are compatible with data generated by Apache Spark and does not require you to start a cluster to perform I/O operations.

Schema Conversions

Centurion allows easy conversion of schemas between any of the supported formats, via Centurion's own internal format.

This internal format is a superset of the functionality of all the supported formats, and is intended as an intermediate format only to allow for conversions.

The following table shows how types map between each of the formats.

Centurion Type Avro Parquet Orc Arrow
Strings String Binary (String) String Utf8
UUID String (UUID) Binary (String) String Utf8
Booleans Boolean Boolean Boolean Bool
Int64 Long Int64 Long Int64 Signed
Int32 Int Int32 Int Int32 Signed
Int16 N/A (Int) Int32 (Signed Int16) Short Int16 Signed
Int8 N/A (Int) Int32 (Signed Int8) Byte Int8 Signed
Float64 Double Double Double FloatingPointDouble
Float32 Float Float Float FloatingPointSingle
Enum Enum Enum String String
Decimal Binary / Fixed with annotation Decimal Decimal(precision, scale) Decimal) Decimal
Varchar Fixed) N/A (String) Varchar N/A (String)
TimestampMillis Long (TimestampMillis) Int64 (Timestamp) Timestamp Timestamp (Millis)
TimestampMicros Long (TimestampMicros) Int64 (Timestamp) Unsupported Timestamp (Micros)
Map Map Map Map Map
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].