All Projects → parquet-usql → Similar Projects or Alternatives

100 Open source projects that are alternatives of or similar to parquet-usql

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Stars: ✭ 125 (+861.54%)

Mutual labels: parquet

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (+784.62%)

Mutual labels: parquet

Parquet Go

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

Stars: ✭ 114 (+776.92%)

Mutual labels: parquet

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (+738.46%)

Mutual labels: parquet

Kglab

Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

Stars: ✭ 98 (+653.85%)

Mutual labels: parquet

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (+646.15%)

Mutual labels: parquet

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (+9730.77%)

Mutual labels: parquet

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (+561.54%)

Mutual labels: parquet

Sparksql Protobuf

Read SparkSQL parquet file as RDD[Protobuf]

Stars: ✭ 82 (+530.77%)

Mutual labels: parquet

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (+8423.08%)

Mutual labels: parquet

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+346.15%)

Mutual labels: parquet

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (+338.46%)

Mutual labels: parquet

Node Parquet

NodeJS module to access apache parquet format files

Stars: ✭ 46 (+253.85%)

Mutual labels: parquet

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+7646.15%)

Mutual labels: parquet

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (+123.08%)

Mutual labels: parquet

Parquet Generator

Parquet file generator

Stars: ✭ 16 (+23.08%)

Mutual labels: parquet

Parquet Format

Apache Parquet

Stars: ✭ 800 (+6053.85%)

Mutual labels: parquet

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+3023.08%)

Mutual labels: parquet

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+2923.08%)

Mutual labels: parquet

Skale

High performance distributed data processing engine

Stars: ✭ 390 (+2900%)

Mutual labels: parquet

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)