All Projects → FelixNeutatz → parquet-flinktacular

FelixNeutatz / parquet-flinktacular

Licence: Apache-2.0 license
How to use Parquet in Flink

Programming Languages

java
68154 projects - #9 most used programming language
CSS
56736 projects
ruby
36898 projects - #4 most used programming language
HTML
75241 projects
Thrift
134 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to parquet-flinktacular

Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1182.76%)
Mutual labels:  avro, parquet
Gcs Tools
GCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (+96.55%)
Mutual labels:  avro, parquet
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+1255.17%)
Mutual labels:  avro, parquet
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-17.24%)
Mutual labels:  avro, parquet
Noproto
Flexible, Fast & Compact Serialization with RPC
Stars: ✭ 138 (+375.86%)
Mutual labels:  avro, protocol-buffers
javascript-serialization-benchmark
Comparison and benchmark of JavaScript serialization libraries (Protocol Buffer, Avro, BSON, etc.)
Stars: ✭ 54 (+86.21%)
Mutual labels:  avro, protocol-buffers
Cpp Serializers
Benchmark comparing various data serialization libraries (thrift, protobuf etc.) for C++
Stars: ✭ 533 (+1737.93%)
Mutual labels:  avro, thrift
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1300%)
Mutual labels:  avro, parquet
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+234.48%)
Mutual labels:  avro, parquet
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+196.55%)
Mutual labels:  avro, parquet
columnify
Make record oriented data to columnar format.
Stars: ✭ 28 (-3.45%)
Mutual labels:  avro, parquet
Mu Haskell
Mu (μ) is a purely functional framework for building micro services.
Stars: ✭ 215 (+641.38%)
Mutual labels:  avro, protocol-buffers
parquet-extra
A collection of Apache Parquet add-on modules
Stars: ✭ 30 (+3.45%)
Mutual labels:  avro, parquet
Ratatool
A tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+862.07%)
Mutual labels:  avro, parquet
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (+0%)
Mutual labels:  thrift, parquet
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+100%)
Mutual labels:  avro, parquet
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+510.34%)
Mutual labels:  avro, parquet
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+744.83%)
Mutual labels:  avro, parquet
rules proto grpc
Bazel rules for building Protobuf and gRPC code and libraries from proto_library targets
Stars: ✭ 201 (+593.1%)
Mutual labels:  protocol-buffers
kafka-scala-examples
Examples of Avro, Kafka, Schema Registry, Kafka Streams, Interactive Queries, KSQL, Kafka Connect in Scala
Stars: ✭ 53 (+82.76%)
Mutual labels:  avro

parquet-flinktacular - How to use Parquet in Flink - Guide

The idea of this tutorial is to get you started as quickly as possible. Therefore I setup a Github repository. There you can find sample Maven projects which can serve you as templates for your own projects.

At the moment I provide templates for the following use cases:

  1. Parquet at Flink - using Java and Protocol Buffers schema definition
  2. Parquet at Flink - using Java and Thrift schema definition
  3. Parquet at Flink - using Java and Avro schema definition
  4. Parquet at Flink - using Scala and Protocol Buffers schema definition

Each project has two main folders: commons and flink.

In the commons folder you put your schema definition IDL file. The Maven commons/pom.xml is configured to build classes from the IDL file during compilation. This makes development more convenient, because you don't need to recompile the IDL file by hand whenever there is any minor change in your schema.

In the flink folder there are your Flink jobs which read and write Parquet.

So choose your template project, download the corresponding folder and run:

$ mvn clean install package

The more detailed tutorial can be found here :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].