All Projects → Pucket → Similar Projects or Alternatives

572 Open source projects that are alternatives of or similar to Pucket

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1300%)

Mutual labels: spark, parquet, hdfs

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+100%)

Mutual labels: spark, parquet, hdfs

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (+27.59%)

Mutual labels: spark, hdfs

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+1255.17%)

Mutual labels: spark, parquet

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+5520.69%)

Mutual labels: hdfs, spark

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+144.83%)

Mutual labels: spark, hdfs

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (+1082.76%)

Mutual labels: spark, parquet

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+455.17%)

Mutual labels: spark, hdfs

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-51.72%)

Mutual labels: spark, hdfs

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+5562.07%)

Mutual labels: spark, parquet

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-31.03%)

Mutual labels: spark, hdfs

Yandex Big Data Engineering

Stars: ✭ 17 (-41.38%)

Mutual labels: spark, hdfs

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+217.24%)

Mutual labels: spark, hdfs

experiments

Code examples for my blog posts

Stars: ✭ 21 (-27.59%)

Mutual labels: spark, parquet

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+2855.17%)

Mutual labels: spark, hdfs

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (+275.86%)

Mutual labels: spark, parquet

parquet-flinktacular

How to use Parquet in Flink

Stars: ✭ 29 (+0%)

Mutual labels: thrift, parquet

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-44.83%)

Mutual labels: spark, hdfs

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (+196.55%)

Mutual labels: parquet, hdfs

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (+234.48%)

Mutual labels: spark, parquet

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+37800%)

Mutual labels: spark, hdfs

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-34.48%)

Mutual labels: hdfs, parquet

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+417.24%)

Mutual labels: spark, hdfs

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-44.83%)

Mutual labels: spark, parquet

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (+1151.72%)

Mutual labels: thrift, spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-55.17%)

Mutual labels: spark, hdfs

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+20617.24%)

Mutual labels: spark, hdfs

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+1668.97%)

Mutual labels: spark, hdfs

Snakebite

A pure python HDFS client

Stars: ✭ 828 (+2755.17%)

Mutual labels: hdfs

Impala Java Client

Java client to connect directly to Impala using thrift

Stars: ✭ 26 (-10.34%)

Mutual labels: thrift

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-82.76%)

Mutual labels: hdfs

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+2717.24%)

Mutual labels: spark

Rpc proxy

基于thrift的服务注册和发现框架

Stars: ✭ 13 (-55.17%)

Mutual labels: thrift

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-10.34%)

Mutual labels: spark

Zys

high performance service framework based on Yaf or Swoole

Stars: ✭ 812 (+2700%)

Mutual labels: thrift

Parquet Format

Apache Parquet

Stars: ✭ 800 (+2658.62%)

Mutual labels: parquet

Spark Swagger

Spark (http://sparkjava.com/) support for Swagger (https://swagger.io/)

Stars: ✭ 25 (-13.79%)

Mutual labels: spark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+2634.48%)

Mutual labels: spark

Spark Redis

A connector for Spark that allows reading and writing to/from Redis cluster

Stars: ✭ 773 (+2565.52%)

Mutual labels: spark

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

Stars: ✭ 21 (-27.59%)

Mutual labels: spark

Urhox

Urho3D extension library

Stars: ✭ 13 (-55.17%)

Mutual labels: spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+3103.45%)

Mutual labels: spark

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+2572.41%)

Mutual labels: spark

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning

Stars: ✭ 6,458 (+22168.97%)

Mutual labels: spark

Chronicler

Scala toolchain for InfluxDB

Stars: ✭ 24 (-17.24%)

Mutual labels: spark

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+2486.21%)

Mutual labels: spark

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+2468.97%)

Mutual labels: spark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-58.62%)

Mutual labels: spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-20.69%)

Mutual labels: spark

Sparkctr

CTR prediction model based on spark(LR, GBDT, DNN)

Stars: ✭ 740 (+2451.72%)

Mutual labels: spark

Cdhproject

hadoop各组件使用，持续更新

Stars: ✭ 733 (+2427.59%)

Mutual labels: spark

Digitrecognizer

Java Convolutional Neural Network example for Hand Writing Digit Recognition

Stars: ✭ 23 (-20.69%)

Mutual labels: spark

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+2410.34%)

Mutual labels: spark

Scrooge

A Thrift parser/generator

Stars: ✭ 724 (+2396.55%)

Mutual labels: thrift

Heracles

High performance HBase / Spark SQL engine

Stars: ✭ 27 (-6.9%)

Mutual labels: spark

Flint

A Time Series Library for Apache Spark