This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (-28.57%)

Mutual labels: hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-55.71%)

Mutual labels: hadoop

corc

An ORC File Scheme for the Cascading data processing platform.

Stars: ✭ 14 (-80%)

Mutual labels: hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-72.86%)

Mutual labels: hadoop

UEFI MULTI

UEFI_MULTI - Make Multi-Boot USB-Drive

Stars: ✭ 33 (-52.86%)

Mutual labels: install

UWP-Package-Installer

An UWP installer for appx/appxbundle packages

Stars: ✭ 85 (+21.43%)

Mutual labels: install

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-62.86%)

Mutual labels: hadoop

hadoop-ecosystem

Visualizations of the Hadoop Ecosystem

Stars: ✭ 20 (-71.43%)

Mutual labels: hadoop

PackageProject.cmake

🏛️ Help other developers use your project. A CMake script for packaging C/C++ projects for simple project installation while employing best-practices for maximum compatibility.

Stars: ✭ 48 (-31.43%)

Mutual labels: install

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-75.71%)

Mutual labels: hadoop

Installomator

Installation script to deploy standard software on Macs

Stars: ✭ 472 (+574.29%)

Mutual labels: install

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-54.29%)

Mutual labels: hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-65.71%)

Mutual labels: hadoop

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-78.57%)

Mutual labels: hadoop

fsbrowser

Fast desktop client for Hadoop Distributed File System

Stars: ✭ 27 (-61.43%)

Mutual labels: hadoop

sixarm mac osx installation help

SixArm.com » Mac OSX installation help, notes, and guides

Stars: ✭ 22 (-68.57%)

Mutual labels: install

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-32.86%)

Mutual labels: hadoop

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-54.29%)

Mutual labels: hadoop

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (-45.71%)

Mutual labels: hadoop

big-data-exploration

[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product

Stars: ✭ 43 (-38.57%)

Mutual labels: hadoop

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (-24.29%)

Mutual labels: hadoop

skein

A tool and library for easily deploying applications on Apache YARN

Stars: ✭ 128 (+82.86%)

Mutual labels: hadoop

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-80%)

Mutual labels: hadoop

nvim

❤️ A neovim config repo.

Stars: ✭ 33 (-52.86%)

Mutual labels: install

UBA

UEBA Solution for Insider Security. This repo is archived. Thanks!

Stars: ✭ 36 (-48.57%)

Mutual labels: hadoop

disq

A library for manipulating bioinformatics sequencing formats in Apache Spark

Stars: ✭ 29 (-58.57%)

Mutual labels: hadoop

docker-hadoop-3

Docker file for Hadoop 3

Stars: ✭ 19 (-72.86%)

Mutual labels: hadoop

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-70%)

Mutual labels: hadoop

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (+5.71%)

Mutual labels: hadoop

disk

基于hadoop+hbase+springboot实现分布式网盘系统

Stars: ✭ 53 (-24.29%)

Mutual labels: hadoop

clickhouse hadoop

Import data from clickhouse to hadoop with pure SQL

Stars: ✭ 26 (-62.86%)

Mutual labels: hadoop

lsst

Configures environment for LSST software (newinstall.sh)

Stars: ✭ 14 (-80%)

Mutual labels: install

first-steps-and-hardening-in-ubuntu-server-and-docker

First Steps in Ubuntu (Server) / Hardening and Config With Docker

Stars: ✭ 28 (-60%)

Mutual labels: install

platys-modern-data-platform

Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....

Stars: ✭ 35 (-50%)

Mutual labels: hadoop

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-77.14%)

Mutual labels: hadoop

gtni

Install your all npm dependencies recursively with gtni while you are doing git clone, fetch or pull

Stars: ✭ 17 (-75.71%)

Mutual labels: install

datasqueeze

Hadoop utility to compact small files

Stars: ✭ 18 (-74.29%)

Mutual labels: hadoop

1-60 of 322 similar projects

›

next*5