Spark-ArResources for Spark AR
Stars: ✭ 43 (-85.62%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-85.28%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-93.31%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+935.79%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (-82.94%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-95.32%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+918.06%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-84.28%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-16.72%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-88.96%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-17.39%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (-5.69%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (-18.06%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-94.98%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (-18.39%)
spark-http-streamspark structured streaming via HTTP communication
Stars: ✭ 17 (-94.31%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (-20.4%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-78.93%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (-24.08%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (-80.27%)
Spark WorkshopApache Spark™ and Scala Workshops
Stars: ✭ 224 (-25.08%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-93.31%)
Sagemaker SparkA Spark library for Amazon SageMaker.
Stars: ✭ 219 (-26.76%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-27.76%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-89.3%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (-31.44%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+869.57%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-95.32%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+660.54%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-13.38%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-37.46%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-69.57%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-38.8%)
spark-data-sourcesDeveloping Spark External Data Sources using the V2 API
Stars: ✭ 36 (-87.96%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-91.97%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-41.14%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (-7.36%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-41.47%)
shamashAutoscaling for Google Cloud Dataproc
Stars: ✭ 31 (-89.63%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+742.14%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-94.65%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+596.99%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-86.96%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-15.05%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-94.65%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-0.33%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+930.43%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-9.03%)
blogblog entries
Stars: ✭ 39 (-86.96%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-68.23%)