All Projects → Iceberg → Similar Projects or Alternatives

687 Open source projects that are alternatives of or similar to Iceberg

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+1.78%)

Mutual labels: spark, hadoop

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+5510.18%)

Mutual labels: spark, hadoop

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-90.59%)

Mutual labels: spark, hadoop

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-71.76%)

Mutual labels: spark, hadoop

experiments

Code examples for my blog posts

Stars: ✭ 21 (-94.66%)

Mutual labels: spark, parquet

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-80.15%)

Mutual labels: spark, hadoop

Bigdata Notebook

Stars: ✭ 100 (-74.55%)

Mutual labels: spark, hadoop

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-96.69%)

Mutual labels: spark, hadoop

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-94.91%)

Mutual labels: spark, hadoop

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (-91.09%)

Mutual labels: spark, hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-61.83%)

Mutual labels: spark, hadoop

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-91.86%)

Mutual labels: spark, avro

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (-12.72%)

Mutual labels: spark, parquet

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (+5.34%)

Mutual labels: spark, hadoop

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+311.96%)

Mutual labels: hadoop, parquet

Abris

Avro SerDe for Apache Spark structured APIs.

Stars: ✭ 130 (-66.92%)

Mutual labels: spark, avro

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (-37.66%)

Mutual labels: avro, parquet

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-95.17%)

Mutual labels: hadoop, parquet

Parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Stars: ✭ 125 (-68.19%)

Mutual labels: hadoop, parquet

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+314.76%)

Mutual labels: hadoop, spark

Parquet Rs

Apache Parquet implementation in Rust

Stars: ✭ 144 (-63.36%)

Mutual labels: hadoop, parquet

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+133.08%)

Mutual labels: spark, hadoop

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (-90.08%)

Mutual labels: spark, hadoop

spark-util

low-level helpers for Apache Spark libraries and tests

Stars: ✭ 16 (-95.93%)

Mutual labels: spark, hadoop

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (-5.34%)

Mutual labels: avro, parquet

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-34.61%)

Mutual labels: spark

Cook

Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark

Stars: ✭ 314 (-20.1%)

Mutual labels: spark

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (-35.37%)

Mutual labels: spark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-94.15%)

Mutual labels: spark

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (-7.89%)

Mutual labels: spark

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (-21.12%)

Mutual labels: spark

laravel-spark-camera

Profile Photo Camera support for Laravel Spark

Stars: ✭ 30 (-92.37%)

Mutual labels: spark

schema-registry-plugin

Gradle plugin to interact with Confluent Schema-Registry.

Stars: ✭ 60 (-84.73%)

Mutual labels: avro

Hadoop Book

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White

Stars: ✭ 3,317 (+744.02%)

Mutual labels: hadoop

sparkProjectTemplate.g8

Template for Spark Projects

Stars: ✭ 77 (-80.41%)

Mutual labels: spark

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-92.88%)

Mutual labels: avro

Tensorflowonspark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Stars: ✭ 3,748 (+853.69%)

Mutual labels: spark

Schema Registry Ui

Web tool for Avro Schema Registry |

Stars: ✭ 358 (-8.91%)

Mutual labels: avro

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+744.27%)

Mutual labels: spark

Book

本项目收藏这些年来看过或者听过的一些不错的书籍，在整理文件时看见这些，发现删掉有点可惜，放着又太浪费空间，本着分享的原则，就把它们共享出来，一方面给需要的读者提供这些书籍，另一方面也是一种像知识库的积累吧

Stars: ✭ 47 (-88.04%)

Mutual labels: spark

kafka-spark-streaming-zeppelin-docker

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

Stars: ✭ 82 (-79.13%)

Mutual labels: spark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Stars: ✭ 307 (-21.88%)

Mutual labels: spark

spring-kafka-event-sourcing-sampler

Showcases how to build a small Event-sourced application using Spring Boot, Spring Kafka, Apache Avro and Apache Kafka

Stars: ✭ 33 (-91.6%)

Mutual labels: avro

Sparkstreaming

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计；SpringBoot+Echarts实现数据可视化展示

Stars: ✭ 349 (-11.2%)

Mutual labels: spark

Crayon

Simple framework agnostic UI router for SPAs

Stars: ✭ 310 (-21.12%)

Mutual labels: spark

spark-http-stream

spark structured streaming via HTTP communication

Stars: ✭ 17 (-95.67%)

Mutual labels: spark

daf-kylo

Kylo integration with PDND (previously DAF).

Stars: ✭ 20 (-94.91%)

Mutual labels: spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+893.13%)

Mutual labels: spark

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-91.86%)

Mutual labels: spark

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Stars: ✭ 20,147 (+5026.46%)

Mutual labels: spark

Hive

Apache Hive

Stars: ✭ 4,031 (+925.7%)

Mutual labels: hadoop

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-82.19%)

Mutual labels: spark

pulse

phData Pulse application log aggregation and monitoring

Stars: ✭ 13 (-96.69%)

Mutual labels: hadoop

spark learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Stars: ✭ 42 (-89.31%)

Mutual labels: spark

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (-22.9%)

Mutual labels: spark

spark-data-sources

Developing Spark External Data Sources using the V2 API

Stars: ✭ 36 (-90.84%)

Mutual labels: spark

hadoop-docker-lite

Docker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager

Stars: ✭ 24 (-93.89%)

Mutual labels: hadoop

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (-12.21%)

Mutual labels: spark

Awesome Ada

A curated list of awesome resources related to the Ada and SPARK programming language

Stars: ✭ 299 (-23.92%)

Mutual labels: spark

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-86.26%)

Mutual labels: spark

61-120 of 687 similar projects

‹

›

next*5