All Projects → Iceberg → Similar Projects or Alternatives

687 Open source projects that are alternatives of or similar to Iceberg

Big data architect skills
一个大数据架构师应该掌握的技能
Stars: ✭ 400 (+1.78%)
Mutual labels:  spark, hadoop
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+5510.18%)
Mutual labels:  spark, hadoop
Weblogsanalysissystem
A big data platform for analyzing web access logs
Stars: ✭ 37 (-90.59%)
Mutual labels:  spark, hadoop
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-71.76%)
Mutual labels:  spark, hadoop
experiments
Code examples for my blog posts
Stars: ✭ 21 (-94.66%)
Mutual labels:  spark, parquet
Docker Spark
🚢 Docker image for Apache Spark
Stars: ✭ 78 (-80.15%)
Mutual labels:  spark, hadoop
Bigdata Notebook
Stars: ✭ 100 (-74.55%)
Mutual labels:  spark, hadoop
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-96.69%)
Mutual labels:  spark, hadoop
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-94.91%)
Mutual labels:  spark, hadoop
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-91.09%)
Mutual labels:  spark, hadoop
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-61.83%)
Mutual labels:  spark, hadoop
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-91.86%)
Mutual labels:  spark, avro
Oap
Optimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-12.72%)
Mutual labels:  spark, parquet
Marmaray
Generic Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+5.34%)
Mutual labels:  spark, hadoop
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+311.96%)
Mutual labels:  hadoop, parquet
Abris
Avro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-66.92%)
Mutual labels:  spark, avro
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (-37.66%)
Mutual labels:  avro, parquet
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-95.17%)
Mutual labels:  hadoop, parquet
Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-68.19%)
Mutual labels:  hadoop, parquet
Ibis
A pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+314.76%)
Mutual labels:  hadoop, spark
Parquet Rs
Apache Parquet implementation in Rust
Stars: ✭ 144 (-63.36%)
Mutual labels:  hadoop, parquet
Kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+133.08%)
Mutual labels:  spark, hadoop
yuzhouwan
Code Library for My Blog
Stars: ✭ 39 (-90.08%)
Mutual labels:  spark, hadoop
spark-util
low-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-95.93%)
Mutual labels:  spark, hadoop
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-5.34%)
Mutual labels:  avro, parquet
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (-34.61%)
Mutual labels:  spark
Cook
Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-20.1%)
Mutual labels:  spark
Big Data Rosetta Code
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-35.37%)
Mutual labels:  spark
spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.0.0
Stars: ✭ 23 (-94.15%)
Mutual labels:  spark
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-7.89%)
Mutual labels:  spark
Clickhouse Native Jdbc
ClickHouse Native Protocol JDBC implementation
Stars: ✭ 310 (-21.12%)
Mutual labels:  spark
laravel-spark-camera
Profile Photo Camera support for Laravel Spark
Stars: ✭ 30 (-92.37%)
Mutual labels:  spark
schema-registry-plugin
Gradle plugin to interact with Confluent Schema-Registry.
Stars: ✭ 60 (-84.73%)
Mutual labels:  avro
Hadoop Book
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Stars: ✭ 3,317 (+744.02%)
Mutual labels:  hadoop
sparkProjectTemplate.g8
Template for Spark Projects
Stars: ✭ 77 (-80.41%)
Mutual labels:  spark
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-92.88%)
Mutual labels:  avro
Tensorflowonspark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+853.69%)
Mutual labels:  spark
Schema Registry Ui
Web tool for Avro Schema Registry |
Stars: ✭ 358 (-8.91%)
Mutual labels:  avro
Coolplayspark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+744.27%)
Mutual labels:  spark
Book
本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-88.04%)
Mutual labels:  spark
kafka-spark-streaming-zeppelin-docker
One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)
Stars: ✭ 82 (-79.13%)
Mutual labels:  spark
Learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (-21.88%)
Mutual labels:  spark
spring-kafka-event-sourcing-sampler
Showcases how to build a small Event-sourced application using Spring Boot, Spring Kafka, Apache Avro and Apache Kafka
Stars: ✭ 33 (-91.6%)
Mutual labels:  avro
Sparkstreaming
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Stars: ✭ 349 (-11.2%)
Mutual labels:  spark
Crayon
Simple framework agnostic UI router for SPAs
Stars: ✭ 310 (-21.12%)
Mutual labels:  spark
spark-http-stream
spark structured streaming via HTTP communication
Stars: ✭ 17 (-95.67%)
Mutual labels:  spark
daf-kylo
Kylo integration with PDND (previously DAF).
Stars: ✭ 20 (-94.91%)
Mutual labels:  spark
Delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+893.13%)
Mutual labels:  spark
dllib
dllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-91.86%)
Mutual labels:  spark
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+5026.46%)
Mutual labels:  spark
Hive
Apache Hive
Stars: ✭ 4,031 (+925.7%)
Mutual labels:  hadoop
Spotify-Song-Recommendation-ML
UC Berkeley team's submission for RecSys Challenge 2018
Stars: ✭ 70 (-82.19%)
Mutual labels:  spark
pulse
phData Pulse application log aggregation and monitoring
Stars: ✭ 13 (-96.69%)
Mutual labels:  hadoop
spark learning
尚硅谷大数据Spark-2019版最新 Spark 学习
Stars: ✭ 42 (-89.31%)
Mutual labels:  spark
Zat
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-22.9%)
Mutual labels:  spark
spark-data-sources
Developing Spark External Data Sources using the V2 API
Stars: ✭ 36 (-90.84%)
Mutual labels:  spark
hadoop-docker-lite
Docker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-93.89%)
Mutual labels:  hadoop
Sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-12.21%)
Mutual labels:  spark
Awesome Ada
A curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (-23.92%)
Mutual labels:  spark
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-86.26%)
Mutual labels:  spark
61-120 of 687 similar projects