ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (+441.82%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+5170.91%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-80%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+245.45%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+5501.82%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (+238.18%)
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+10063.64%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+4372.73%)
SparkflowEasy to use library to bring Tensorflow on Apache Spark
Stars: ✭ 282 (+412.73%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (+225.45%)
Spark Submit UiThis is a based on playframwork for submit spark app
Stars: ✭ 53 (-3.64%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+405.45%)
SparkFirely's open source FHIR server
Stars: ✭ 174 (+216.36%)
OptopsyA nimble options backtesting library for Python
Stars: ✭ 373 (+578.18%)
scippMulti-dimensional data arrays with labeled dimensions
Stars: ✭ 55 (+0%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+22221.82%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+394.55%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1458.18%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (+203.64%)
NimdataDataFrame API written in Nim, enabling fast out-of-core data processing
Stars: ✭ 261 (+374.55%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+196.36%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+9923.64%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (+374.55%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (+189.09%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+372.73%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+972.73%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+3532.73%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (+367.27%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (+176.36%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-52.73%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (+176.36%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+9680%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-18.18%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+3067.27%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (+158.18%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+914.55%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+154.55%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-54.55%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-41.82%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (+0%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+1845.45%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+1714.55%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+1263.64%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+6714.55%)
ladybug-pandas🐞 <3 🐼 A ladybug extension powered by pandas
Stars: ✭ 15 (-72.73%)