All Projects → spotify → Gcs Tools

spotify / Gcs Tools

Licence: apache-2.0
GCS support for avro-tools, parquet-tools and protobuf

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Gcs Tools

Ratatool
A tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+389.47%)
Mutual labels:  protobuf, avro, parquet
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+612.28%)
Mutual labels:  gcp, avro, parquet
Rq
Record Query - A tool for doing record analysis and transformation
Stars: ✭ 1,808 (+3071.93%)
Mutual labels:  protobuf, avro
Jackson Dataformats Binary
Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Stars: ✭ 221 (+287.72%)
Mutual labels:  protobuf, avro
columnify
Make record oriented data to columnar format.
Stars: ✭ 28 (-50.88%)
Mutual labels:  avro, parquet
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+329.82%)
Mutual labels:  avro, parquet
Sparksql Protobuf
Read SparkSQL parquet file as RDD[Protobuf]
Stars: ✭ 82 (+43.86%)
Mutual labels:  protobuf, parquet
parquet-extra
A collection of Apache Parquet add-on modules
Stars: ✭ 30 (-47.37%)
Mutual labels:  avro, parquet
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+1.75%)
Mutual labels:  avro, parquet
javascript-serialization-benchmark
Comparison and benchmark of JavaScript serialization libraries (Protocol Buffer, Avro, BSON, etc.)
Stars: ✭ 54 (-5.26%)
Mutual labels:  protobuf, avro
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+552.63%)
Mutual labels:  avro, parquet
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+210.53%)
Mutual labels:  avro, parquet
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+70.18%)
Mutual labels:  avro, parquet
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+2789.47%)
Mutual labels:  protobuf, avro
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+50.88%)
Mutual labels:  avro, parquet
parquet-flinktacular
How to use Parquet in Flink
Stars: ✭ 29 (-49.12%)
Mutual labels:  avro, parquet
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-57.89%)
Mutual labels:  avro, parquet
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+589.47%)
Mutual labels:  avro, parquet
Cpp Serializers
Benchmark comparing various data serialization libraries (thrift, protobuf etc.) for C++
Stars: ✭ 533 (+835.09%)
Mutual labels:  protobuf, avro
Csi Gcs
Kubernetes CSI driver for Google Cloud Storage
Stars: ✭ 44 (-22.81%)
Mutual labels:  gcp

GCS Tools

Build Status GitHub license

Raison d'être:

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-cli, proto-tools for Scio's Protobuf in Avro file, and magnolify-tools for Magnolify code generation, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.

It uses your existing OAuth2 credentials and allows authentication via a browser.

Usage:

You can install the tools via our Homebrew tap on Mac.

brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-cli gcs-proto-tools gcs-magnolify-tools
avro-tools tojson <GCS_PATH>
parquet-cli cat <GCS_PATH>
proto-tools tojson <GCS_PATH>
magnolify-tools <avro|parquet> <GCS_PATH>

Or build them yourself.

sbt assembly
java -jar avro-tools/target/scala-2.13/avro-tools-*.jar tojson <GCS_PATH>
java -jar parquet-cli/target/scala-2.13/parquet-cli-*.jar cat <GCS_PATH>
java -jar proto-tools/target/scala-2.13/proto-tools-*.jar cat <GCS_PATH>
java -jar magnolify-tools/target/scala-2.13/magnolify-tools-*.jar <avro|parquet> <GCS_PATH>

How it works:

To make avro-tools and parquet-cli work with GCS we need:

GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].