All Projects → apache → Tez

apache / Tez

Licence: apache-2.0
Apache Tez

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Tez

hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (-94.57%)
Mutual labels:  big-data, hadoop, apache
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-52.08%)
Mutual labels:  big-data, hadoop, apache
Hive
Apache Hive
Stars: ✭ 4,031 (+1187.86%)
Mutual labels:  big-data, hadoop, apache
implyr
SQL backend to dplyr for Impala
Stars: ✭ 74 (-76.36%)
Mutual labels:  hadoop, apache
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-87.54%)
Mutual labels:  big-data, hadoop
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-95.21%)
Mutual labels:  big-data, hadoop
couchdb-pkg
Apache CouchDB Packaging support files
Stars: ✭ 24 (-92.33%)
Mutual labels:  big-data, apache
clusterdock
clusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-91.69%)
Mutual labels:  big-data, hadoop
hive-jdbc-driver
An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-90.1%)
Mutual labels:  hadoop, apache
big-data-lite
Samples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-86.9%)
Mutual labels:  big-data, hadoop
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-95.85%)
Mutual labels:  big-data, hadoop
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-64.54%)
Mutual labels:  big-data, hadoop
nifi
Deploy a secured, clustered, auto-scaling NiFi service in AWS.
Stars: ✭ 37 (-88.18%)
Mutual labels:  big-data, apache
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-94.89%)
Mutual labels:  hadoop, apache
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-89.78%)
Mutual labels:  big-data, hadoop
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-94.89%)
Mutual labels:  big-data, hadoop
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-84.98%)
Mutual labels:  big-data, hadoop
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+1363.58%)
Mutual labels:  big-data, hadoop
masc
Microsoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-93.61%)
Mutual labels:  big-data, apache
yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Stars: ✭ 44 (-85.94%)
Mutual labels:  hadoop, apache

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At its heart, tez is very simple and has just two components:

  • The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every 'task' in tez has the following:
  • Input to consume key/value pairs from.
  • Processor to process them.
  • Output to collect the processed key/value pairs.
  • A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks' described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].