prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-84.16%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-92.67%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-23.75%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-90.62%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+621.11%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-20.23%)
blogblog entries
Stars: ✭ 39 (-88.56%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-11.14%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (-60.12%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-92.67%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-96.19%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-18.48%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+1044.57%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-94.72%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (-23.46%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-86.8%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-24.63%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-67.45%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-12.61%)
Spark-PMoFSpark Shuffle Optimization with RDMA+AEP
Stars: ✭ 28 (-91.79%)
spark-http-streamspark structured streaming via HTTP communication
Stars: ✭ 17 (-95.01%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-90.62%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (-17.3%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-94.13%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (-9.09%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (-18.77%)
spark-data-sourcesDeveloping Spark External Data Sources using the V2 API
Stars: ✭ 36 (-89.44%)
CookFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-7.92%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-95.31%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+808.21%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (-80.94%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-10.26%)
SparkV🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
Stars: ✭ 24 (-92.96%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-87.1%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-2.64%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-95.89%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-24.05%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-90.32%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (-12.32%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-95.6%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-25.51%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-81.52%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+873.02%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (-82.7%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (-12.32%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-86.22%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-1.17%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (-7.33%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (-9.97%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+803.52%)