WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+1388%)

Mutual labels: spark

Casper

A compiler for automatically re-targeting sequential Java code to Apache Spark.

Stars: ✭ 45 (+80%)

Mutual labels: spark

Sparkmeasure

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

Stars: ✭ 368 (+1372%)

Mutual labels: spark

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (+352%)

Mutual labels: spark

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (+1352%)

Mutual labels: spark

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+808%)

Mutual labels: spark

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+1348%)

Mutual labels: spark

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (+348%)

Mutual labels: spark

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (+1272%)

Mutual labels: spark

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (+88%)

Mutual labels: pyspark

Scalnet

A Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs

Stars: ✭ 342 (+1268%)

Mutual labels: spark

Elephas

Distributed Deep learning with Keras & Spark

Stars: ✭ 1,521 (+5984%)

Mutual labels: spark

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (+1248%)

Mutual labels: spark

Spark Workshop

Apache Spark™ and Scala Workshops

Stars: ✭ 224 (+796%)

Mutual labels: spark

Sparklint

A tool for monitoring and tuning Spark jobs for efficiency.

Stars: ✭ 316 (+1164%)

Mutual labels: spark

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+7324%)

Mutual labels: spark

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (+1140%)

Mutual labels: spark

Search Ads Web Service

Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

Stars: ✭ 30 (+20%)

Mutual labels: spark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Stars: ✭ 307 (+1128%)

Mutual labels: spark

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (+340%)

Mutual labels: spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+15512%)

Mutual labels: spark

Sagemaker Spark

A Spark library for Amazon SageMaker.

Stars: ✭ 219 (+776%)

Mutual labels: spark

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+1112%)

Mutual labels: spark

Distributed Dataset

A distributed data processing framework in Haskell.

Stars: ✭ 108 (+332%)

Mutual labels: spark

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (+108%)

Mutual labels: pyspark

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (+28%)

Mutual labels: pyspark

Spark Structured Streaming Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Stars: ✭ 168 (+572%)

Mutual labels: spark

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+184%)

Mutual labels: spark

Spark Notebook

Interactive and Reactive Data Science using Scala and Spark.

Stars: ✭ 3,081 (+12224%)

Mutual labels: spark

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+1012%)

Mutual labels: spark

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (+988%)

Mutual labels: spark

Seldon Server

Machine Learning Platform and Recommendation Engine built on Kubernetes

Stars: ✭ 1,435 (+5640%)

Mutual labels: spark

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (+944%)

Mutual labels: spark

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (+44%)

Mutual labels: spark

Sk Dist

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (+940%)

Mutual labels: spark

Spark On K8s Operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Stars: ✭ 1,780 (+7020%)

Mutual labels: spark

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (+928%)

Mutual labels: spark

Hydro Serving

MLOps Platform

Stars: ✭ 213 (+752%)

Mutual labels: spark

Usersessionbehaviorofflineanalysis

四川大学拓思爱诺用户session行为数据离线分析项目

Stars: ✭ 69 (+176%)

Mutual labels: spark

Fast Mrmr

An improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).

Stars: ✭ 67 (+168%)

Mutual labels: spark

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (+320%)

Mutual labels: spark

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-44%)

Mutual labels: spark

pyspark-asyncactions

Asynchronous actions for PySpark

Stars: ✭ 30 (+20%)

Mutual labels: pyspark

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (+168%)

Mutual labels: spark

docker-spark

Apache Spark docker container image (Standalone mode)

Stars: ✭ 34 (+36%)

Mutual labels: spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+280%)

Mutual labels: spark

pyspark-for-data-processing

Code for my presentation: Using PySpark to Process Boat Loads of Data

Stars: ✭ 20 (-20%)

Mutual labels: pyspark

361-420 of 456 similar projects

first

‹

›