All Projects → Parquet Go → Similar Projects or Alternatives

310 Open source projects that are alternatives of or similar to Parquet Go

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-78.95%)

Mutual labels: hadoop, parquet

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+41.23%)

Mutual labels: hadoop, presto

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+7.89%)

Mutual labels: presto, hadoop

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+3918.42%)

Mutual labels: hadoop, presto

Haproxy Configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

Stars: ✭ 106 (-7.02%)

Mutual labels: hadoop, presto

Parquet Rs

Apache Parquet implementation in Rust

Stars: ✭ 144 (+26.32%)

Mutual labels: hadoop, parquet

Parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Stars: ✭ 125 (+9.65%)

Mutual labels: hadoop, parquet

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+4618.42%)

Mutual labels: hadoop, presto

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+1320.18%)

Mutual labels: hadoop, parquet

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+11265.79%)

Mutual labels: hadoop, presto

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-85.09%)

Mutual labels: hadoop, parquet

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+244.74%)

Mutual labels: hadoop, parquet

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-85.09%)

Mutual labels: presto, hadoop

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+1340.35%)

Mutual labels: hadoop, parquet

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-83.33%)

Mutual labels: hadoop, parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+22.81%)

Mutual labels: hadoop, parquet

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+55.26%)

Mutual labels: hadoop, parquet

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+256.14%)

Mutual labels: hadoop, parquet

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+642.98%)

Mutual labels: hadoop, presto

Hive Funnel Udf

Hive UDFs for funnel analysis

Stars: ✭ 72 (-36.84%)

Mutual labels: hadoop

Hadoop Yarn Api Python Client

Python client for Hadoop® YARN API

Stars: ✭ 91 (-20.18%)

Mutual labels: hadoop

Src

A light-weight distributed stream computing framework for Golang

Stars: ✭ 67 (-41.23%)

Mutual labels: hadoop

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-47.37%)

Mutual labels: hadoop

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Stars: ✭ 101 (-11.4%)

Mutual labels: presto

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (+1021.05%)

Mutual labels: parquet

Likelike

An implementation of locality sensitive hashing with Hadoop

Stars: ✭ 58 (-49.12%)

Mutual labels: hadoop

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-35.09%)

Mutual labels: hadoop

Wifi

基于wifi抓取信息的大数据查询分析系统

Stars: ✭ 93 (-18.42%)

Mutual labels: hadoop

Atsd

Axibase Time Series Database Documentation

Stars: ✭ 68 (-40.35%)

Mutual labels: hadoop

Pyhive

Python interface to Hive and Presto. 🐝

Stars: ✭ 1,378 (+1108.77%)

Mutual labels: presto

Jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-43.86%)

Mutual labels: hadoop

Hadoop Mapreduce

Mirror of Apache Hadoop MapReduce

Stars: ✭ 88 (-22.81%)

Mutual labels: hadoop

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (+871.93%)

Mutual labels: parquet

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+1528.07%)

Mutual labels: hadoop

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-24.56%)

Mutual labels: parquet

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-49.12%)

Mutual labels: parquet

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (-50%)

Mutual labels: parquet

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-50%)

Mutual labels: hadoop

Bigdata Notebook

Stars: ✭ 100 (-12.28%)

Mutual labels: hadoop

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (-28.07%)

Mutual labels: hadoop

Docker Hadoop

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

Stars: ✭ 54 (-52.63%)

Mutual labels: hadoop

Hadoop Solr

Code to index HDFS to Solr using MapReduce

Stars: ✭ 51 (-55.26%)

Mutual labels: hadoop

Sparksql Protobuf

Read SparkSQL parquet file as RDD[Protobuf]

Stars: ✭ 82 (-28.07%)

Mutual labels: parquet

Base

https://www.researchgate.net/profile/Rajah_Iyer

Stars: ✭ 48 (-57.89%)

Mutual labels: hadoop

Node Parquet

NodeJS module to access apache parquet format files

Stars: ✭ 46 (-59.65%)

Mutual labels: parquet

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (-3.51%)

Mutual labels: hadoop

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-4.39%)

Mutual labels: parquet

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+9541.23%)

Mutual labels: hadoop

Docker Hadoop Cluster

Multiple node cluster on Docker for self development.

Stars: ✭ 82 (-28.07%)

Mutual labels: hadoop

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+799.12%)

Mutual labels: hadoop

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+783.33%)

Mutual labels: parquet

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (-28.95%)

Mutual labels: hadoop

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+777.19%)

Mutual labels: hadoop

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-67.54%)

Mutual labels: hadoop

Antsdb

AntsDB is a low latency, high concurrency, MySQL compliant SQL layer for HBase

Stars: ✭ 99 (-13.16%)

Mutual labels: hadoop

Docker Trino Cluster

Multiple node presto cluster on docker container