oci-clouderaTerraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)
Stars: ✭ 20 (-35.48%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (-25.81%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+3.23%)
hadoop-cryptoLibrary for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (+22.58%)
openPDCOpen Source Phasor Data Concentrator
Stars: ✭ 109 (+251.61%)
LogAnalyzeHelper论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
Stars: ✭ 33 (+6.45%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-45.16%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (+0%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-48.39%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-51.61%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+370.97%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+296.77%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+312.9%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-38.71%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (+54.84%)
disqA library for manipulating bioinformatics sequencing formats in Apache Spark
Stars: ✭ 29 (-6.45%)
beanszooDistributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-61.29%)
hadoop-ansibleInstall hadoop cluster with ansible
Stars: ✭ 35 (+12.9%)
big-data-exploration[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
Stars: ✭ 43 (+38.71%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-25.81%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+25.81%)
UBAUEBA Solution for Insider Security. This repo is archived. Thanks!
Stars: ✭ 36 (+16.13%)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-6.45%)
memex-gateGeneral Architecture for Text Engineering
Stars: ✭ 47 (+51.61%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+109.68%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-22.58%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (+80.65%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-32.26%)
implyrSQL backend to dplyr for Impala
Stars: ✭ 74 (+138.71%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+154.84%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-9.68%)
MLHadoopThis repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
Stars: ✭ 50 (+61.29%)
ambari-hdp-dockerDockerfiles and Docker Compose for HDP 2.6 with Blueprints
Stars: ✭ 23 (-25.81%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (+183.87%)
TonYTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 687 (+2116.13%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-41.94%)
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+25.81%)
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (+19.35%)
JavaFrameworkSimple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
Stars: ✭ 16 (-48.39%)
aaocp一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (+70.97%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (+148.39%)
corcAn ORC File Scheme for the Cascading data processing platform.
Stars: ✭ 14 (-54.84%)
RecommendationEngineSource code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"
Stars: ✭ 43 (+38.71%)
prestoTeradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
Stars: ✭ 91 (+193.55%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+3.23%)
clickhouse hadoopImport data from clickhouse to hadoop with pure SQL
Stars: ✭ 26 (-16.13%)
darwinAvro Schema Evolution made easy
Stars: ✭ 26 (-16.13%)