All Categories â†’ Data Processing → parquet

Top 63 parquet open source projects

Vscode Data Preview
Data Preview ðŸˆļ extension for importing ðŸ“Ī viewing 🔎 slicing 🔊 dicing ðŸŽē charting 📊 & exporting ðŸ“Ĩ large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
Parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Sqlite Parquet Vtable
A SQLite vtable extension to read Parquet files
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
Parquet Rs
Apache Parquet implementation in Rust
Kartothek
A consistent table management library in python
Eel Sdk
Big Data Toolkit for the JVM
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Parquet Go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Parquet Index
Spark SQL index for Parquet tables
Kglab
Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Parquet Mr
Apache Parquet
✭ 1,278
javabig-dataparquet
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Sparksql Protobuf
Read SparkSQL parquet file as RDD[Protobuf]
Petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Rumble
⛈ïļ Rumble 1.11.0 "Banyan Tree"ðŸŒģ for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Gcs Tools
GCS support for avro-tools, parquet-tools and protobuf
Node Parquet
NodeJS module to access apache parquet format files
✭ 46
nodejsparquet
Quilt
Quilt is a self-organizing data hub for S3
Pucket
Bucketing and partitioning system for Parquet
Parquet Generator
Parquet file generator
Parquet Format
Apache Parquet
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Skale
High performance distributed data processing engine
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Oap
Optimized Analytics Package for Spark* Platform
Parquet Cpp
Apache Parquet
Pystore
Fast data store for Pandas time-series data
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Ratatool
A tool for data sampling, data generation, and data diffing
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Roapi
Create full-fledged APIs for static datasets without writing a single line of code.
Drill
Apache Drill is a distributed MPP query layer for self describing data
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
centurion
Kotlin Bigdata Toolkit
meepo
åž‚æž„å­˜å‚Ļ数æŪčŋį§ŧ
experiments
Code examples for my blog posts
graphique
GraphQL service for arrow tables and parquet data sets.
parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
parquet-usql
A custom extractor designed to read parquet for Azure Data Lake Analytics
Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Parquet.jl
Julia implementation of Parquet columnar file format reader
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
odbc2parquet
A command line tool to query an ODBC data source and write the result into a parquet file.
IMCtermite
Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats
columnify
Make record oriented data to columnar format.
albis
Albis: High-Performance File Format for Big Data Systems
parquet-extra
A collection of Apache Parquet add-on modules
1-60 of 63 parquet projects