dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-90.83%)
blogblog entries
Stars: ✭ 39 (-88.83%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-22.06%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (-61.03%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+782.81%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-84.53%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (-12.03%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-92.84%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-25.5%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+604.58%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-14.61%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-92.84%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-20.34%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-94.84%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-3.44%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (-25.21%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-87.11%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+1018.34%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-68.19%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-26.36%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (-83.09%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (-14.33%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-86.53%)
CookFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-10.03%)
spark-http-streamspark structured streaming via HTTP communication
Stars: ✭ 17 (-95.13%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (-14.33%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-94.27%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (-2.29%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (-19.2%)
spark-data-sourcesDeveloping Spark External Data Sources using the V2 API
Stars: ✭ 36 (-89.68%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+850.72%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-95.42%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (-20.63%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (-81.38%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-1.15%)
SparkV🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
Stars: ✭ 24 (-93.12%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+787.39%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-87.39%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (-11.17%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-95.99%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-90.54%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-4.87%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-95.7%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-25.79%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-81.95%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-12.32%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-27.22%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-1.72%)
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (-2.01%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (-9.46%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-13.18%)