MoonboxMoonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (+155.42%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+619.88%)
LearningsparkScala examples for learning to use Spark
Stars: ✭ 421 (+153.61%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-14.46%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+152.41%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-54.82%)
Enterprise gatewayA lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Stars: ✭ 412 (+148.19%)
Spark SyntaxThis is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (+148.19%)
LabsResearch on distributed system
Stars: ✭ 73 (-56.02%)
Junos monitoring with healthbotHealthbot configuration examples. Scripts to manage Healthbot. Closed loop automation. Healthbot building blocks description and troubleshooting guide
Stars: ✭ 17 (-89.76%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-1.81%)
CurveAn Integrated Experimental Platform for time series data anomaly detection.
Stars: ✭ 408 (+145.78%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-56.63%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+136.75%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-27.71%)
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
Stars: ✭ 19,768 (+11808.43%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+2157.83%)
PyoddsAn End-to-end Outlier Detection System
Stars: ✭ 141 (-15.06%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-59.64%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+120.48%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+869.88%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+117.47%)
ThingsboardOpen-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+6240.96%)
SparkstreamingSpark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Stars: ✭ 349 (+110.24%)
Kitnet PyKitNET is a lightweight online anomaly detection algorithm, which uses an ensemble of autoencoders.
Stars: ✭ 152 (-8.43%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (+107.83%)
Coursera Ml PyPython programming assignments for Machine Learning by Prof. Andrew Ng in Coursera
Stars: ✭ 1,140 (+586.75%)
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (+106.02%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+7118.67%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+103.01%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-60.84%)
DeepadotsRepository of the paper "A Systematic Evaluation of Deep Anomaly Detection Methods for Time Series".
Stars: ✭ 335 (+101.81%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-15.66%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+100%)
Deep Svdd PytorchA PyTorch implementation of the Deep SVDD anomaly detection method
Stars: ✭ 320 (+92.77%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-31.33%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (+90.36%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+889.16%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-44.58%)
RemixautomlR package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.
Stars: ✭ 159 (-4.22%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+84.94%)
Repo 2017Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+576.51%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (+86.75%)
Pytorch cppDeep Learning sample programs using PyTorch in C++
Stars: ✭ 114 (-31.33%)
Spark GotchasSpark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+85.54%)
Silexsomething to help you spark
Stars: ✭ 61 (-63.25%)
SkylineAnomaly detection
Stars: ✭ 303 (+82.53%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-16.27%)
Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+434.34%)
LuminolAnomaly Detection and Correlation library
Stars: ✭ 888 (+434.94%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-22.89%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-43.98%)