dtailDTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Stars: ✭ 112 (-60.84%)
oosoJava library for running Serverless MapReduce jobs
Stars: ✭ 25 (-91.26%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-87.76%)
hadoop-deployment-bashCode for the deployment of Hadoop clusters, written in Bourne or Bourne Again shell.
Stars: ✭ 31 (-89.16%)
hadoop-docker-liteDocker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-91.61%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-94.06%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-88.81%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-94.76%)
oci-clouderaTerraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)
Stars: ✭ 20 (-93.01%)
Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (-4.2%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (-55.24%)
cloud云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (-83.22%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+115.03%)
corcAn ORC File Scheme for the Cascading data processing platform.
Stars: ✭ 14 (-95.1%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-88.81%)
big-data-exploration[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
Stars: ✭ 43 (-84.97%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-86.36%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (-67.83%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (-69.23%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-77.27%)
Android NosqlLightweight, simple structured NoSQL database for Android
Stars: ✭ 284 (-0.7%)
railScalable RNA-seq analysis
Stars: ✭ 74 (-74.13%)
cobra-policytoolManage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-94.41%)
interview-refresh-java-bigdataa one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
Stars: ✭ 25 (-91.26%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-94.41%)
durablefunctions-mapreduce-dotnetAn implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day
Stars: ✭ 20 (-93.01%)
HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-90.56%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-95.1%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (-72.38%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-90.56%)
GuitarA Simple and Efficient Distributed Multidimensional BI Analysis Engine.
Stars: ✭ 86 (-69.93%)
MLHadoopThis repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
Stars: ✭ 50 (-82.52%)
TonYTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 687 (+140.21%)
openPDCOpen Source Phasor Data Concentrator
Stars: ✭ 109 (-61.89%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (-56.99%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-91.26%)
clickhouse hadoopImport data from clickhouse to hadoop with pure SQL
Stars: ✭ 26 (-90.91%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-93.01%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-91.61%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (-83.22%)
JavaFrameworkSimple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
Stars: ✭ 16 (-94.41%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-93.01%)
beanszooDistributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-95.8%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (-73.08%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-61.19%)
jumbo🐘 A local Hadoop cluster bootstrapper using Vagrant, Ansible, and Ambari.
Stars: ✭ 17 (-94.06%)
hadoop-ansibleInstall hadoop cluster with ansible
Stars: ✭ 35 (-87.76%)