Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-12.61%)

Mutual labels: big-data, spark

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (+110.81%)

Mutual labels: spark, apache-spark

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (-18.02%)

Mutual labels: big-data, spark

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (-67.57%)

Mutual labels: spark, hadoop

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-54.95%)

Mutual labels: spark, pyspark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+121.62%)

Mutual labels: big-data, spark

Detecting-Malicious-URL-Machine-Learning

No description or website provided.

Stars: ✭ 47 (-57.66%)

Mutual labels: big-data, apache-spark

spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Stars: ✭ 23 (-79.28%)

Mutual labels: apache-spark, pyspark

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (+190.99%)

Mutual labels: big-data, big-data-analytics

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (+30.63%)

Mutual labels: big-data, apache-spark

Calcite

Apache Calcite

Stars: ✭ 2,816 (+2436.94%)

Mutual labels: big-data, hadoop

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (+29.73%)

Mutual labels: big-data, apache-spark

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (-50.45%)

Mutual labels: apache-spark, pyspark

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+26.13%)

Mutual labels: big-data, hadoop

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+31.53%)

Mutual labels: apache-spark, hadoop

awesome-tools

curated list of awesome tools and libraries for specific domains

Stars: ✭ 31 (-72.07%)

Mutual labels: big-data, apache-spark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+2115.32%)

Mutual labels: spark, pyspark

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (-76.58%)

Mutual labels: big-data, analysis

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (+111.71%)

Mutual labels: big-data, dataframe

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (-24.32%)

Mutual labels: apache-spark, pyspark

mmtf-spark

Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.

Stars: ✭ 20 (-81.98%)

Mutual labels: big-data, apache-spark

Sparkora

Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟

Stars: ✭ 51 (-54.05%)

Mutual labels: apache-spark, pyspark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-14.41%)

Mutual labels: big-data, spark

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-82.88%)

Mutual labels: big-data, apache-spark

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (-39.64%)

Mutual labels: big-data, apache-spark

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-71.17%)

Mutual labels: hadoop, pyspark

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-86.49%)

Mutual labels: big-data, hadoop

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+2026.13%)

Mutual labels: big-data, dataframe

jupyterlab-sparkmonitor

JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook

Stars: ✭ 78 (-29.73%)

Mutual labels: apache-spark, pyspark

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-85.59%)

Mutual labels: big-data, hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-78.38%)

Mutual labels: apache-spark, hadoop

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (-53.15%)

Mutual labels: hadoop, pyspark

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-76.58%)

Mutual labels: big-data, hadoop

check-engine

Data validation library for PySpark 3.0.0