SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+1085.08%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+349.14%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-99.21%)
Flink Sql CookbookThe Apache Flink SQL Cookbook is a curated collection of examples, patterns, and use cases of Apache Flink SQL. Many of the recipes are completely self-contained and can be run in Ververica Platform as is.
Stars: ✭ 189 (-92.92%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-95.73%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (-67.09%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-94.3%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-99.48%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-99.55%)
WatermillBuilding event-driven applications the easy way in Go.
Stars: ✭ 3,504 (+31.33%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-95.8%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-99.59%)
NmflibraryMATLAB library for non-negative matrix factorization (NMF): Version 1.8.1
Stars: ✭ 153 (-94.27%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (-42.13%)
Hazelcast JetDistributed Stream and Batch Processing
Stars: ✭ 855 (-67.95%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (-92.88%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (-68.25%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (-42.99%)
Javainterview最全的Java技术知识点,以及Java源码分析。为开源贡献自己的一份力。
Stars: ✭ 154 (-94.23%)
HorovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+347.64%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+254.27%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-95.88%)
Sagemaker SparkA Spark library for Amazon SageMaker.
Stars: ✭ 219 (-91.79%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (-99.14%)
10 Weeks10-weeks of technology exploration
Stars: ✭ 22 (-99.18%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-94.27%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (-92.24%)
Log Anomaly DetectorLog Anomaly Detection - Machine learning to detect abnormal events logs
Stars: ✭ 169 (-93.67%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (-55.21%)
Books技术书籍等
Stars: ✭ 110 (-95.88%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (-93.03%)
Flinkstreamsql基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (-36.96%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (-94.3%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-70.28%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-95.91%)
Simple It EnglishSimple-IT-English: smart wordbook from community for community
Stars: ✭ 233 (-91.27%)
Go Kafka ExampleGolang Kafka consumer and producer example
Stars: ✭ 108 (-95.95%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-97.19%)
TwitworkMonitor twitter stream
Stars: ✭ 133 (-95.01%)
SiddhiStream Processing and Complex Event Processing Engine
Stars: ✭ 1,185 (-55.58%)
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (-72.26%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-95.95%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (-72.71%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (-7.8%)
SamsaraSamsara is a real-time analytics platform
Stars: ✭ 132 (-95.05%)
LabsResearch on distributed system
Stars: ✭ 73 (-97.26%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-97.41%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-97.3%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (-92.32%)