Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+131.76%)

Mutual labels: pyspark

osm-parquetizer

A converter for the OSM PBFs to Parquet files

Stars: ✭ 71 (-93.31%)

Mutual labels: apache-spark

ai-deployment

关注AI模型上线、模型部署

Stars: ✭ 149 (-85.96%)

Mutual labels: pyspark

spark-connector

A connector for Apache Spark to access Exasol

Stars: ✭ 13 (-98.77%)

Mutual labels: apache-spark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-97.55%)

Mutual labels: pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-61.73%)

Mutual labels: pyspark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-76.72%)

Mutual labels: apache-spark

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-96.8%)

Mutual labels: pyspark

Pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Stars: ✭ 231 (-78.23%)

Mutual labels: apache-spark

Dblink

Distributed Bayesian Entity Resolution in Apache Spark

Stars: ✭ 38 (-96.42%)

Mutual labels: apache-spark

machine-learning-course

Machine Learning Course @ Santa Clara University

Stars: ✭ 17 (-98.4%)

Mutual labels: pyspark

sparklygraphs

Old repo for R interface for GraphFrames

Stars: ✭ 13 (-98.77%)

Mutual labels: apache-spark

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-79.74%)

Mutual labels: apache-spark

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (-65.03%)

Mutual labels: apache-spark

Analytics Zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

Stars: ✭ 2,448 (+130.73%)

Mutual labels: apache-spark

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (-95.57%)

Mutual labels: pyspark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (-12.44%)

Mutual labels: apache-spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-98.77%)

Mutual labels: apache-spark

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (-86.24%)

Mutual labels: apache-spark

Whylogs Java

Profile and monitor your ML data pipeline end-to-end

Stars: ✭ 164 (-84.54%)

Mutual labels: apache-spark

spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Stars: ✭ 39 (-96.32%)

Mutual labels: apache-spark

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (-68.71%)

Mutual labels: apache-spark

Albedo

A recommender system for discovering GitHub repos, built with Apache Spark

Stars: ✭ 149 (-85.96%)

Mutual labels: apache-spark

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (-95.1%)

Mutual labels: pyspark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-95.66%)

Mutual labels: apache-spark

proxima-platform

The Proxima platform.

Stars: ✭ 17 (-98.4%)

Mutual labels: apache-spark

Scalable Data Science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

Stars: ✭ 142 (-86.62%)

Mutual labels: apache-spark

net.jgp.books.spark.ch01

Spark in Action, 2nd edition - chapter 1 - Introduction

Stars: ✭ 72 (-93.21%)

Mutual labels: apache-spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-87.09%)

Mutual labels: apache-spark

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+212.72%)

Mutual labels: apache-spark

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+62.21%)

Mutual labels: apache-spark

spark-sql-internals

The Internals of Spark SQL

Stars: ✭ 331 (-68.8%)

Mutual labels: apache-spark

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (-88.6%)

Mutual labels: apache-spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-97.83%)

Mutual labels: pyspark

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (-90.1%)

Mutual labels: apache-spark

PysparkCheatsheet

PySpark Cheatsheet

Stars: ✭ 25 (-97.64%)

Mutual labels: apache-spark

Openscoring

REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models

Stars: ✭ 536 (-49.48%)

Mutual labels: apache-spark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-96.98%)

Mutual labels: pyspark

awesome-tools

curated list of awesome tools and libraries for specific domains

Stars: ✭ 31 (-97.08%)

Mutual labels: apache-spark

Mist

Serverless proxy for Spark cluster

Stars: ✭ 309 (-70.88%)

Mutual labels: apache-spark

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-92.18%)

Mutual labels: apache-spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-96.51%)

Mutual labels: apache-spark

Sparkit Learn

PySpark + Scikit-learn = Sparkit-learn

Stars: ✭ 1,073 (+1.13%)

Mutual labels: apache-spark

spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

Stars: ✭ 609 (-42.6%)

Mutual labels: apache-spark

Morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Stars: ✭ 303 (-71.44%)

Mutual labels: apache-spark

Spark-and-Kafka IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

Stars: ✭ 42 (-96.04%)

Mutual labels: pyspark

streamsx.kafka

Repository for integration with Apache Kafka