Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-87.54%)

Mutual labels: big-data, hadoop

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-95.21%)

Mutual labels: big-data, hadoop

couchdb-pkg

Apache CouchDB Packaging support files

Stars: ✭ 24 (-92.33%)

Mutual labels: big-data, apache

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-91.69%)

Mutual labels: big-data, hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-90.1%)

Mutual labels: hadoop, apache

big-data-lite

Samples to the Oracle Big Data Lite VM

Stars: ✭ 41 (-86.9%)

Mutual labels: big-data, hadoop

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-95.85%)

Mutual labels: big-data, hadoop

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-64.54%)

Mutual labels: big-data, hadoop

nifi

Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Stars: ✭ 37 (-88.18%)

Mutual labels: big-data, apache

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-94.89%)

Mutual labels: hadoop, apache

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-89.78%)

Mutual labels: big-data, hadoop

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-94.89%)

Mutual labels: big-data, hadoop

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-84.98%)

Mutual labels: big-data, hadoop

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+1363.58%)

Mutual labels: big-data, hadoop

masc

Microsoft's contributions for Spark with Apache Accumulo

Stars: ✭ 20 (-93.61%)

Mutual labels: big-data, apache

yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

Stars: ✭ 44 (-85.94%)

Mutual labels: hadoop, apache

View All Similar Projects ➔

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At its heart, tez is very simple and has just two components:

The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every 'task' in tez has the following:

Input to consume key/value pairs from.
Processor to process them.
Output to collect the processed key/value pairs.

A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks' described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 313

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (31) 🔗