Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (+861.54%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+784.62%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (+776.92%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (+738.46%)
KglabGraph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Stars: ✭ 98 (+653.85%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+646.15%)
Parquet MrApache Parquet
Stars: ✭ 1,278 (+9730.77%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+561.54%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+8423.08%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+346.15%)
Gcs ToolsGCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (+338.46%)
Node ParquetNodeJS module to access apache parquet format files
Stars: ✭ 46 (+253.85%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+7646.15%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (+123.08%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+3023.08%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+2923.08%)
SkaleHigh performance distributed data processing engine
Stars: ✭ 390 (+2900%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+2761.54%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+2538.46%)
PystoreFast data store for Pandas time-series data
Stars: ✭ 325 (+2400%)
Elasticsearch loaderA tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (+2207.69%)
RatatoolA tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+2046.15%)
Parquet Dotnet🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (+2023.08%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (+1846.15%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+12353.85%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+130.77%)
HybridBackendEfficient training of deep recommenders on cloud.
Stars: ✭ 30 (+130.77%)
centurionKotlin Bigdata Toolkit
Stars: ✭ 320 (+2361.54%)
meepo异构存储数据迁移
Stars: ✭ 29 (+123.08%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (+61.54%)
graphiqueGraphQL service for arrow tables and parquet data sets.
Stars: ✭ 28 (+115.38%)
parquet2Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Stars: ✭ 157 (+1107.69%)
HudiUpserts, Deletes And Incremental Processing on Big Data.
Stars: ✭ 2,586 (+19792.31%)
LeofsThe LeoFS Storage System
Stars: ✭ 1,439 (+10969.23%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+35138.46%)