H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+3621.05%)

Mutual labels: hadoop

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-7.89%)

Mutual labels: hadoop

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+3438.82%)

Mutual labels: hadoop

Chukwa

Mirror of Apache Chukwa

Stars: ✭ 77 (-49.34%)

Mutual labels: hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+221.05%)

Mutual labels: hadoop

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-25%)

Mutual labels: hadoop

School Of Sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

Stars: ✭ 5,141 (+3282.24%)

Mutual labels: hadoop

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+686.18%)

Mutual labels: hadoop

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+14405.26%)

Mutual labels: hadoop

Airflow Pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Stars: ✭ 128 (-15.79%)

Mutual labels: hadoop

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (+172.37%)

Mutual labels: hadoop

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-51.32%)

Mutual labels: hadoop

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (+163.16%)

Mutual labels: hadoop

Parquet Go

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

Stars: ✭ 114 (-25%)

Mutual labels: hadoop

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+158.55%)

Mutual labels: hadoop

Atsd

Axibase Time Series Database Documentation

Stars: ✭ 68 (-55.26%)

Mutual labels: hadoop

Ignite

Apache Ignite

Stars: ✭ 4,027 (+2549.34%)

Mutual labels: hadoop

Hadoop

Apache Hadoop

Stars: ✭ 12,177 (+7911.18%)

Mutual labels: hadoop

Hive

Apache Hive

Stars: ✭ 4,031 (+2551.97%)

Mutual labels: hadoop

Jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-57.89%)

Mutual labels: hadoop

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (+121.71%)

Mutual labels: hadoop

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (-27.63%)

Mutual labels: hadoop

Gather Deployment

Gathers scalable tensorflow and infrastructure deployment

Stars: ✭ 326 (+114.47%)

Mutual labels: hadoop

Likelike

An implementation of locality sensitive hashing with Hadoop

Stars: ✭ 58 (-61.84%)

Mutual labels: hadoop

Tez

Apache Tez

Stars: ✭ 313 (+105.92%)

Mutual labels: hadoop

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (-15.79%)

Mutual labels: hadoop

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+101.32%)

Mutual labels: hadoop

Docker Hadoop

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

Stars: ✭ 54 (-64.47%)

Mutual labels: hadoop

Elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

Stars: ✭ 298 (+96.05%)

Mutual labels: hadoop

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+1121.05%)

Mutual labels: hadoop

Android Nosql

Lightweight, simple structured NoSQL database for Android

Stars: ✭ 284 (+86.84%)

Mutual labels: hadoop

Base

https://www.researchgate.net/profile/Rajah_Iyer

Stars: ✭ 48 (-68.42%)

Mutual labels: hadoop

Hadoop Mini Clusters

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

Stars: ✭ 265 (+74.34%)

Mutual labels: hadoop

Hbaseclient

HBase客户端数据管理软件

Stars: ✭ 135 (-11.18%)

Mutual labels: hadoop

pulse

phData Pulse application log aggregation and monitoring

Stars: ✭ 13 (-91.45%)

Mutual labels: hadoop

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+557.89%)

Mutual labels: hadoop

knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

Stars: ✭ 53 (-65.13%)

Mutual labels: hadoop

Bigdata Notebook

Stars: ✭ 100 (-34.21%)

Mutual labels: hadoop

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-90.79%)