All Projects → parquet-usql → Similar Projects or Alternatives

100 Open source projects that are alternatives of or similar to parquet-usql

Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (+861.54%)
Mutual labels:  parquet
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+784.62%)
Mutual labels:  parquet
Parquet Go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (+776.92%)
Mutual labels:  parquet
Parquet Index
Spark SQL index for Parquet tables
Stars: ✭ 109 (+738.46%)
Mutual labels:  parquet
Kglab
Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Stars: ✭ 98 (+653.85%)
Mutual labels:  parquet
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+646.15%)
Mutual labels:  parquet
Parquet Mr
Apache Parquet
Stars: ✭ 1,278 (+9730.77%)
Mutual labels:  parquet
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+561.54%)
Mutual labels:  parquet
Sparksql Protobuf
Read SparkSQL parquet file as RDD[Protobuf]
Stars: ✭ 82 (+530.77%)
Mutual labels:  parquet
Petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+8423.08%)
Mutual labels:  parquet
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+346.15%)
Mutual labels:  parquet
Gcs Tools
GCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (+338.46%)
Mutual labels:  parquet
Node Parquet
NodeJS module to access apache parquet format files
Stars: ✭ 46 (+253.85%)
Mutual labels:  parquet
Quilt
Quilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+7646.15%)
Mutual labels:  parquet
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (+123.08%)
Mutual labels:  parquet
Parquet Generator
Parquet file generator
Stars: ✭ 16 (+23.08%)
Mutual labels:  parquet
Parquet Format
Apache Parquet
Stars: ✭ 800 (+6053.85%)
Mutual labels:  parquet
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+3023.08%)
Mutual labels:  parquet
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+2923.08%)
Mutual labels:  parquet
Skale
High performance distributed data processing engine
Stars: ✭ 390 (+2900%)
Mutual labels:  parquet
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+2761.54%)
Mutual labels:  parquet
Oap
Optimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+2538.46%)
Mutual labels:  parquet
Parquet Cpp
Apache Parquet
Stars: ✭ 339 (+2507.69%)
Mutual labels:  parquet
Pystore
Fast data store for Pandas time-series data
Stars: ✭ 325 (+2400%)
Mutual labels:  parquet
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (+2207.69%)
Mutual labels:  parquet
Ratatool
A tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+2046.15%)
Mutual labels:  parquet
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (+2023.08%)
Mutual labels:  parquet
Roapi
Create full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (+1846.15%)
Mutual labels:  parquet
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+12353.85%)
Mutual labels:  parquet
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+130.77%)
Mutual labels:  parquet
HybridBackend
Efficient training of deep recommenders on cloud.
Stars: ✭ 30 (+130.77%)
Mutual labels:  parquet
centurion
Kotlin Bigdata Toolkit
Stars: ✭ 320 (+2361.54%)
Mutual labels:  parquet
meepo
异构存储数据迁移
Stars: ✭ 29 (+123.08%)
Mutual labels:  parquet
experiments
Code examples for my blog posts
Stars: ✭ 21 (+61.54%)
Mutual labels:  parquet
graphique
GraphQL service for arrow tables and parquet data sets.
Stars: ✭ 28 (+115.38%)
Mutual labels:  parquet
parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Stars: ✭ 157 (+1107.69%)
Mutual labels:  parquet
Hudi
Upserts, Deletes And Incremental Processing on Big Data.
Stars: ✭ 2,586 (+19792.31%)
Mutual labels:  datalake
Leofs
The LeoFS Storage System
Stars: ✭ 1,439 (+10969.23%)
Mutual labels:  datalake
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+35138.46%)
Mutual labels:  datalake
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (+338.46%)
Mutual labels:  datalake
61-100 of 100 similar projects