Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+1552%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+852%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+1556%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (+380%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (+188%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+6340%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+80488%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+15152%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (+356%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+1388%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (+80%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+1372%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+1352%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+808%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+1348%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (+348%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+1272%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+88%)
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (+1268%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+5984%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+1248%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (+1164%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+7324%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (+20%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+1128%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (+340%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+15512%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+1112%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+28%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+12224%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+1012%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+988%)
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (+5640%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (+944%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+940%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+7020%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (+928%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (+168%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (+320%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-44%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (+168%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (+36%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+280%)