Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-84.53%)

Mutual labels: spark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Stars: ✭ 307 (-12.03%)

Mutual labels: spark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-92.84%)

Mutual labels: spark

Sk Dist

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (-25.5%)

Mutual labels: spark

sparkProjectTemplate.g8

Template for Spark Projects

Stars: ✭ 77 (-77.94%)

Mutual labels: spark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+604.58%)

Mutual labels: spark

Elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

Stars: ✭ 298 (-14.61%)

Mutual labels: spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-92.84%)

Mutual labels: spark

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (-11.17%)

Mutual labels: spark

spark learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Stars: ✭ 42 (-87.97%)

Mutual labels: spark

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (-20.34%)

Mutual labels: spark

confluent-spark-avro

Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.

Stars: ✭ 18 (-94.84%)

Mutual labels: spark

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (-3.44%)

Mutual labels: spark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-90.26%)

Mutual labels: spark

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (-25.21%)

Mutual labels: spark

Casper

A compiler for automatically re-targeting sequential Java code to Apache Spark.

Stars: ✭ 45 (-87.11%)

Mutual labels: spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+1018.34%)

Mutual labels: spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-68.19%)

Mutual labels: spark

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-26.36%)

Mutual labels: spark

laravel-spark-camera

Profile Photo Camera support for Laravel Spark

Stars: ✭ 30 (-91.4%)

Mutual labels: spark

frovedis

Framework of vectorized and distributed data analytics

Stars: ✭ 59 (-83.09%)

Mutual labels: spark

Awesome Ada

A curated list of awesome resources related to the Ada and SPARK programming language

Stars: ✭ 299 (-14.33%)

Mutual labels: spark

Book

本项目收藏这些年来看过或者听过的一些不错的书籍，在整理文件时看见这些，发现删掉有点可惜，放着又太浪费空间，本着分享的原则，就把它们共享出来，一方面给需要的读者提供这些书籍，另一方面也是一种像知识库的积累吧

Stars: ✭ 47 (-86.53%)

Mutual labels: spark

Cook

Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark

Stars: ✭ 314 (-10.03%)

Mutual labels: spark

spark-http-stream

spark structured streaming via HTTP communication

Stars: ✭ 17 (-95.13%)

Mutual labels: spark

Spark Hbase Connector

Connect Spark to HBase for reading and writing data with ease

Stars: ✭ 299 (-14.33%)

Mutual labels: spark

daf-kylo

Kylo integration with PDND (previously DAF).

Stars: ✭ 20 (-94.27%)

Mutual labels: spark

Iql

An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)

Stars: ✭ 341 (-2.29%)

Mutual labels: spark

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-79.94%)

Mutual labels: spark

Spark Druid Olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 282 (-19.2%)

Mutual labels: spark

spark-data-sources

Developing Spark External Data Sources using the V2 API

Stars: ✭ 36 (-89.68%)

Mutual labels: spark

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+850.72%)

Mutual labels: spark

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-95.42%)

Mutual labels: spark

Hbase Rdd

Spark RDD to read, write and delete from HBase

Stars: ✭ 277 (-20.63%)

Mutual labels: spark

Covid19Tracker

A Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.

Stars: ✭ 65 (-81.38%)

Mutual labels: spark

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (-1.15%)

Mutual labels: spark

SparkV

🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.

Stars: ✭ 24 (-93.12%)

Mutual labels: spark

Helk

The Hunting ELK

Stars: ✭ 3,097 (+787.39%)

Mutual labels: spark

trembita

Model complex data transformation pipelines easily

Stars: ✭ 44 (-87.39%)

Mutual labels: spark

Crayon

Simple framework agnostic UI router for SPAs

Stars: ✭ 310 (-11.17%)

Mutual labels: spark

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-95.99%)

Mutual labels: spark

Around Dataengineering

A Data Engineering & Machine Learning Knowledge Hub

Stars: ✭ 257 (-26.36%)

Mutual labels: spark

smolder

HL7 Apache Spark Datasource

Stars: ✭ 33 (-90.54%)

Mutual labels: spark

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (-4.87%)

Mutual labels: spark

spark-demos

Collection of different demo applications using Apache Spark

Stars: ✭ 15 (-95.7%)

Mutual labels: spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (-25.79%)

Mutual labels: spark

tpch-spark

TPC-H queries in Apache Spark SQL using native DataFrames API

Stars: ✭ 63 (-81.95%)

Mutual labels: spark

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (-12.32%)

Mutual labels: spark

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (-27.22%)

Mutual labels: spark

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (-1.72%)

Mutual labels: spark

Scalnet

A Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs

Stars: ✭ 342 (-2.01%)

Mutual labels: spark

Sparklint

A tool for monitoring and tuning Spark jobs for efficiency.

Stars: ✭ 316 (-9.46%)

Mutual labels: spark

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (-13.18%)

Mutual labels: spark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-93.41%)

Mutual labels: spark

1-60 of 399 similar projects

›

next*5