Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-17.24%)

Mutual labels: avro, parquet

Noproto

Flexible, Fast & Compact Serialization with RPC

Stars: ✭ 138 (+375.86%)

Mutual labels: avro, protocol-buffers

javascript-serialization-benchmark

Comparison and benchmark of JavaScript serialization libraries (Protocol Buffer, Avro, BSON, etc.)

Stars: ✭ 54 (+86.21%)

Mutual labels: avro, protocol-buffers

Cpp Serializers

Benchmark comparing various data serialization libraries (thrift, protobuf etc.) for C++

Stars: ✭ 533 (+1737.93%)

Mutual labels: avro, thrift

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1300%)

Mutual labels: avro, parquet

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (+234.48%)

Mutual labels: avro, parquet

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (+196.55%)

Mutual labels: avro, parquet

columnify

Make record oriented data to columnar format.

Stars: ✭ 28 (-3.45%)

Mutual labels: avro, parquet

Mu Haskell

Mu (μ) is a purely functional framework for building micro services.

Stars: ✭ 215 (+641.38%)

Mutual labels: avro, protocol-buffers

parquet-extra

A collection of Apache Parquet add-on modules

Stars: ✭ 30 (+3.45%)

Mutual labels: avro, parquet

Ratatool

A tool for data sampling, data generation, and data diffing

Stars: ✭ 279 (+862.07%)

Mutual labels: avro, parquet

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (+0%)

Mutual labels: thrift, parquet

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+100%)

Mutual labels: avro, parquet

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+510.34%)

Mutual labels: avro, parquet

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (+744.83%)

Mutual labels: avro, parquet

rules proto grpc

Bazel rules for building Protobuf and gRPC code and libraries from proto_library targets

Stars: ✭ 201 (+593.1%)

Mutual labels: protocol-buffers

kafka-scala-examples

Examples of Avro, Kafka, Schema Registry, Kafka Streams, Interactive Queries, KSQL, Kafka Connect in Scala

Stars: ✭ 53 (+82.76%)

Mutual labels: avro

View All Similar Projects ➔

parquet-flinktacular - How to use Parquet in Flink - Guide

The idea of this tutorial is to get you started as quickly as possible. Therefore I setup a Github repository. There you can find sample Maven projects which can serve you as templates for your own projects.

At the moment I provide templates for the following use cases:

Each project has two main folders: commons and flink.

In the commons folder you put your schema definition IDL file. The Maven commons/pom.xml is configured to build classes from the IDL file during compilation. This makes development more convenient, because you don't need to recompile the IDL file by hand whenever there is any minor change in your schema.

In the flink folder there are your Flink jobs which read and write Parquet.

So choose your template project, download the corresponding folder and run:

$ mvn clean install package

The more detailed tutorial can be found here :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

FelixNeutatz / parquet-flinktacular

Programming Languages

Labels

Projects that are alternatives of or similar to parquet-flinktacular

parquet-flinktacular - How to use Parquet in Flink - Guide