macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+10063.64%)

Mutual labels: spark

Roaringbitmap

A better compressed bitset in Java

Stars: ✭ 2,460 (+4372.73%)

Mutual labels: spark

Sparkflow

Easy to use library to bring Tensorflow on Apache Spark

Stars: ✭ 282 (+412.73%)

Mutual labels: dataframe

Sparkstreaming

💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算)；🚀 支持运行过程中增删topic；🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。

Stars: ✭ 179 (+225.45%)

Mutual labels: spark

Spark Submit Ui

This is a based on playframwork for submit spark app

Stars: ✭ 53 (-3.64%)

Mutual labels: spark

Spark Kafka Writer

Write your Spark data to Kafka seamlessly

Stars: ✭ 175 (+218.18%)

Mutual labels: spark

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+405.45%)

Mutual labels: spark

Spark

Firely's open source FHIR server

Stars: ✭ 174 (+216.36%)

Mutual labels: spark

Optopsy

A nimble options backtesting library for Python

Stars: ✭ 373 (+578.18%)

Mutual labels: dataframe

scipp

Multi-dimensional data arrays with labeled dimensions

Stars: ✭ 55 (+0%)

Mutual labels: dataframe

Deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…

Stars: ✭ 12,277 (+22221.82%)

Mutual labels: spark

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (+394.55%)

Mutual labels: spark

Spark Structured Streaming Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Stars: ✭ 168 (+205.45%)

Mutual labels: spark

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+1458.18%)

Mutual labels: spark

Geopyspark

GeoTrellis for PySpark

Stars: ✭ 167 (+203.64%)

Mutual labels: spark

Nimdata

DataFrame API written in Nim, enabling fast out-of-core data processing

Stars: ✭ 261 (+374.55%)

Mutual labels: dataframe

Big Whale

Spark、Flink等离线任务的调度以及实时任务的监控

Stars: ✭ 163 (+196.36%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+9923.64%)

Mutual labels: spark

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+192.73%)

Mutual labels: spark

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (+374.55%)

Mutual labels: spark

Vue Info Card

Simple and beautiful card component with an elegant spark line, for VueJS.

Stars: ✭ 159 (+189.09%)

Mutual labels: spark

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-32.73%)

Mutual labels: spark

Scalable Data Science Platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

Stars: ✭ 158 (+187.27%)

Mutual labels: spark

Sk Dist

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (+372.73%)

Mutual labels: spark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+181.82%)

Mutual labels: spark

Pdpipe

Easy pipelines for pandas DataFrames.

Stars: ✭ 590 (+972.73%)

Mutual labels: dataframe

Quill

Compile-time Language Integrated Queries for Scala

Stars: ✭ 1,998 (+3532.73%)

Mutual labels: spark

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (+367.27%)

Mutual labels: spark

Powderkeg

Live-coding the cluster!

Stars: ✭ 152 (+176.36%)

Mutual labels: spark

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-52.73%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+3305.45%)

Mutual labels: spark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-58.18%)

Mutual labels: spark

Aztk

AZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure

Stars: ✭ 152 (+176.36%)

Mutual labels: spark

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+9680%)

Mutual labels: spark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+167.27%)

Mutual labels: spark

sparkProjectTemplate.g8

Template for Spark Projects

Stars: ✭ 77 (+40%)

Mutual labels: spark

Spark Cassandra Connector

DataStax Spark Cassandra Connector

Stars: ✭ 1,816 (+3201.82%)

Mutual labels: spark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-18.18%)

Mutual labels: spark

Nd4j

Fast, Scientific and Numerical Computing for the JVM (NDArrays)

Stars: ✭ 1,742 (+3067.27%)

Mutual labels: spark

kafka-spark-streaming-zeppelin-docker

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

Stars: ✭ 82 (+49.09%)

Mutual labels: spark

Rasterframes

Geospatial Raster support for Spark DataFrames

Stars: ✭ 142 (+158.18%)

Mutual labels: spark

Sparklearning

Learning Apache spark,including code and data .Most part can run local.

Stars: ✭ 558 (+914.55%)

Mutual labels: spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (+154.55%)

Mutual labels: spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-54.55%)

Mutual labels: spark

Julia-data-science

Data science and numerical computing with Julia

Stars: ✭ 54 (-1.82%)

Mutual labels: dataframe

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-41.82%)

Mutual labels: spark

Pulsar Spark

When Apache Pulsar meets Apache Spark