Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+115.03%)

Mutual labels: hadoop

corc

An ORC File Scheme for the Cascading data processing platform.

Stars: ✭ 14 (-95.1%)

Mutual labels: hadoop

CDH-Install-Manual

CDH安装手册

Stars: ✭ 70 (-75.52%)

Mutual labels: hadoop

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-88.81%)

Mutual labels: hadoop

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-88.11%)

Mutual labels: mapreduce

big-data-exploration

[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product

Stars: ✭ 43 (-84.97%)

Mutual labels: hadoop

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (-81.82%)

Mutual labels: hadoop

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (-86.36%)

Mutual labels: hadoop

ros hadoop

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

Stars: ✭ 92 (-67.83%)

Mutual labels: hadoop

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (-69.23%)

Mutual labels: hadoop

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (-77.27%)

Mutual labels: hadoop

Android Nosql

Lightweight, simple structured NoSQL database for Android

Stars: ✭ 284 (-0.7%)

Mutual labels: hadoop

rail

Scalable RNA-seq analysis

Stars: ✭ 74 (-74.13%)

Mutual labels: mapreduce

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-94.41%)

Mutual labels: hadoop

interview-refresh-java-bigdata

a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

Stars: ✭ 25 (-91.26%)

Mutual labels: mapreduce

spark-util

low-level helpers for Apache Spark libraries and tests

Stars: ✭ 16 (-94.41%)

Mutual labels: hadoop

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (-80.42%)

Mutual labels: hadoop

durablefunctions-mapreduce-dotnet

An implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day

Stars: ✭ 20 (-93.01%)

Mutual labels: mapreduce

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-90.56%)

Mutual labels: mapreduce

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-95.1%)

Mutual labels: hadoop

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (-72.38%)

Mutual labels: hadoop

fsbrowser

Fast desktop client for Hadoop Distributed File System

Stars: ✭ 27 (-90.56%)

Mutual labels: hadoop

Guitar

A Simple and Efficient Distributed Multidimensional BI Analysis Engine.

Stars: ✭ 86 (-69.93%)

Mutual labels: mapreduce

mapreduce-examples

A collection of mapreduce problems and solutions

Stars: ✭ 23 (-91.96%)

Mutual labels: mapreduce

MLHadoop

This repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.

Stars: ✭ 50 (-82.52%)

Mutual labels: hadoop

TonY

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 687 (+140.21%)

Mutual labels: hadoop

openPDC

Open Source Phasor Data Concentrator

Stars: ✭ 109 (-61.89%)

Mutual labels: hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (-56.99%)

Mutual labels: hadoop

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-91.26%)

Mutual labels: hadoop

yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

Stars: ✭ 44 (-84.62%)

Mutual labels: hadoop

clickhouse hadoop

Import data from clickhouse to hadoop with pure SQL

Stars: ✭ 26 (-90.91%)

Mutual labels: hadoop

mit-6.824-distributed-systems

Template repository to work on the labs from MIT 6.824 Distributed Systems course.

Stars: ✭ 48 (-83.22%)

Mutual labels: mapreduce

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-93.01%)

Mutual labels: hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-91.61%)

Mutual labels: hadoop

teraslice

Scalable data processing pipelines in JavaScript

Stars: ✭ 48 (-83.22%)

Mutual labels: hadoop

JavaFramework

Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.

Stars: ✭ 16 (-94.41%)

Mutual labels: hadoop

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-93.01%)

Mutual labels: mapreduce

beanszoo

Distributed Java micro-services using ZooKeeper

Stars: ✭ 12 (-95.8%)

Mutual labels: hadoop

orion

Management and automation platform for Stateful Distributed Systems