This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (-35.9%)

Mutual labels: hadoop

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-30.77%)

Mutual labels: spark

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-69.23%)

Mutual labels: hadoop

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (+438.46%)

Mutual labels: spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-3.85%)

Mutual labels: spark

Hive Funnel Udf

Hive UDFs for funnel analysis

Stars: ✭ 72 (-7.69%)

Mutual labels: hadoop

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+1343.59%)

Mutual labels: spark

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+1182.05%)

Mutual labels: hadoop

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+806.41%)

Mutual labels: spark

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-79.49%)

Mutual labels: spark

UBA

UEBA Solution for Insider Security. This repo is archived. Thanks!

Stars: ✭ 36 (-53.85%)

Mutual labels: hadoop

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+429.49%)

Mutual labels: spark

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (-5.13%)

Mutual labels: hadoop

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

Stars: ✭ 14 (-82.05%)

Mutual labels: spark

datasqueeze

Hadoop utility to compact small files

Stars: ✭ 18 (-76.92%)

Mutual labels: hadoop

presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Stars: ✭ 91 (+16.67%)

Mutual labels: hadoop

Awesome Pulsar

A curated list of Pulsar tools, integrations and resources.

Stars: ✭ 57 (-26.92%)

Mutual labels: spark

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-70.51%)

Mutual labels: hadoop

memex-gate

General Architecture for Text Engineering

Stars: ✭ 47 (-39.74%)

Mutual labels: hadoop

Cdc Kafka Hadoop

MySQL to NoSQL real time dataflow

Stars: ✭ 13 (-83.33%)

Mutual labels: hadoop

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (-28.21%)

Mutual labels: hadoop

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (+412.82%)

Mutual labels: hadoop

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (-14.1%)

Mutual labels: spark

jmx exporter-cloudera-hadoop

Prometheus jmx_exporter configurations for Cloudera Hadoop

Stars: ✭ 33 (-57.69%)

Mutual labels: hadoop

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (-52.56%)

Mutual labels: hadoop

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-84.62%)

Mutual labels: spark

corc

An ORC File Scheme for the Cascading data processing platform.

Stars: ✭ 14 (-82.05%)

Mutual labels: hadoop

Ignite

Apache Ignite

Stars: ✭ 4,027 (+5062.82%)

Mutual labels: hadoop

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-58.97%)

Mutual labels: hadoop

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-29.49%)

Mutual labels: spark

big-data-exploration

[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product

Stars: ✭ 43 (-44.87%)

Mutual labels: hadoop

Docker practice

Learn and understand Docker technologies, with real DevOps practice!

Stars: ✭ 19,768 (+25243.59%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-85.9%)

Mutual labels: spark

confluent-spark-avro

Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.

Stars: ✭ 18 (-76.92%)

Mutual labels: spark

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-62.82%)

Mutual labels: hadoop

Tensorflowonspark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Stars: ✭ 3,748 (+4705.13%)

Mutual labels: spark

Lpa Detector

Optimize and improve the Label propagation algorithm

Stars: ✭ 75 (-3.85%)

Mutual labels: spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+792.31%)

Mutual labels: spark

Covid19Tracker

A Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.

Stars: ✭ 65 (-16.67%)

Mutual labels: spark

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-79.49%)

Mutual labels: hadoop

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (-28.21%)

Mutual labels: hadoop

knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

Stars: ✭ 53 (-32.05%)

Mutual labels: hadoop

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+87.18%)

Mutual labels: hadoop

Sparkmeasure

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

Stars: ✭ 368 (+371.79%)

Mutual labels: spark

Utils4s

scala、spark使用过程中，各种测试用例以及相关资料整理

Stars: ✭ 1,070 (+1271.79%)

Mutual labels: spark

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+1179.49%)

Mutual labels: spark

Winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows