Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+311.67%)

Mutual labels: big-data

Clickhouse

ClickHouse® is a free analytics DBMS for big data

Stars: ✭ 21,089 (+35048.33%)

Mutual labels: big-data

Lite Virtual List

Virtual list component library supporting waterfall flow based on vue

Stars: ✭ 223 (+271.67%)

Mutual labels: big-data

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (-56.67%)

Mutual labels: big-data

Cboard

An easy to use, self-service open BI reporting and BI dashboard platform.

Stars: ✭ 2,795 (+4558.33%)

Mutual labels: big-data

acousticbrainz-server

The server components for the AcousticBrainz project

Stars: ✭ 128 (+113.33%)

Mutual labels: big-data

Trafodion

Apache Trafodion

Stars: ✭ 242 (+303.33%)

Mutual labels: big-data

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+310%)

Mutual labels: big-data

predictionio-template-recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 80 (+33.33%)

Mutual labels: big-data

Selinon

An advanced distributed task flow management on top of Celery

Stars: ✭ 237 (+295%)

Mutual labels: big-data

predictionio-sdk-ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (+220%)

Mutual labels: big-data

Books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据、推荐系统、数据库、数据挖掘、机器学习、深度学习、算法等。

Stars: ✭ 222 (+270%)

Mutual labels: big-data

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+4973.33%)

Mutual labels: big-data

bagri

XML/Document DB on top of distributed cache

Stars: ✭ 40 (-33.33%)

Mutual labels: big-data

TT Tech Space

TT Tech Research Notes

Stars: ✭ 21 (-65%)

Mutual labels: big-data

Detecting-Malicious-URL-Machine-Learning

No description or website provided.

Stars: ✭ 47 (-21.67%)

Mutual labels: big-data

View All Similar Projects ➔

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At it's heart, tez is very simple and has just two components:

The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every 'task' in tez has the following:

Input to consume key/value pairs from.
Processor to process them.
Output to collect the processed key/value pairs.

A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks' described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

apache / incubator-tez

Programming Languages

Labels

Projects that are alternatives of or similar to incubator-tez

Apache Tez