All Projects → apache → Oozie

apache / Oozie

Licence: apache-2.0
Mirror of Apache Oozie

Programming Languages

javascript
184084 projects - #8 most used programming language
java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Oozie

Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+3562.46%)
Mutual labels:  big-data
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (-15.78%)
Mutual labels:  big-data
Nipype
Workflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-7.48%)
Mutual labels:  big-data
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-24.58%)
Mutual labels:  big-data
Stream Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+660.13%)
Mutual labels:  big-data
Beam
Apache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+755.32%)
Mutual labels:  big-data
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-29.24%)
Mutual labels:  big-data
Giraph
Mirror of Apache Giraph
Stars: ✭ 569 (-5.48%)
Mutual labels:  big-data
Pgm Index
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-17.11%)
Mutual labels:  big-data
Couchdb
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+758.14%)
Mutual labels:  big-data
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+674.42%)
Mutual labels:  big-data
Fit Sne
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-19.44%)
Mutual labels:  big-data
Arkime
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+729.57%)
Mutual labels:  big-data
Conjure Up
Deploying complex solutions, magically.
Stars: ✭ 454 (-24.58%)
Mutual labels:  big-data
Pachyderm
Reproducible Data Science at Scale!
Stars: ✭ 5,305 (+781.23%)
Mutual labels:  big-data
Circosjs
d3 library to build circular graphs
Stars: ✭ 436 (-27.57%)
Mutual labels:  big-data
Onlinestats.jl
Single-pass algorithms for statistics
Stars: ✭ 507 (-15.78%)
Mutual labels:  big-data
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+815.78%)
Mutual labels:  big-data
Scanner
Efficient video analysis at scale
Stars: ✭ 569 (-5.48%)
Mutual labels:  big-data
Thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-12.29%)
Mutual labels:  big-data

Apache Oozie

What is Oozie

Oozie is an extensible, scalable and reliable system to define, manage, schedule, and execute complex Hadoop workloads via web services. More specifically, this includes:

  • XML-based declarative framework to specify a job or a complex workflow of dependent jobs.
  • Support different types of job such as Hadoop Map-Reduce, Pipe, Streaming, Pig, Hive and custom java applications.
  • Workflow scheduling based on frequency and/or data availability.
  • Monitoring capability, automatic retry and failure handing of jobs.
  • Extensible and pluggable architecture to allow arbitrary grid programming paradigms.
  • Authentication, authorization, and capacity-aware load throttling to allow multi-tenant software as a service.

Oozie Overview

Oozie is a server based Workflow Engine specialized in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.

Oozie is a Java Web-Application that runs in a Java servlet-container.

For the purposes of Oozie, a workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Directed Acyclic Graph). "control dependency" from one action to another means that the second action can't run until the first action has completed.

Oozie workflows definitions are written in hPDL (a XML Process Definition Language similar to JBOSS JBPM jPDL).

Oozie workflow actions start jobs in remote systems (i.e. Hadoop, Pig). Upon action completion, the remote systems callback Oozie to notify the action completion, at this point Oozie proceeds to the next action in the workflow.

Oozie workflows contain control flow nodes and action nodes.

Control flow nodes define the beginning and the end of a workflow ( start , end and fail nodes) and provide a mechanism to control the workflow execution path ( decision , fork and join nodes).

Action nodes are the mechanism by which a workflow triggers the execution of a computation/processing task. Oozie provides support for different types of actions: Hadoop map-reduce, Hadoop file system, Pig, SSH, HTTP, eMail and Oozie sub-workflow. Oozie can be extended to support additional type of actions.

Oozie workflows can be parameterized (using variables like ${inputDir} within the workflow definition). When submitting a workflow job values for the parameters must be provided. If properly parameterized (i.e. using different output directories) several identical workflow jobs can concurrently.

Documentations :

Oozie web service is bundle with the built-in details documentation.

More inforamtion could be found at: http://oozie.apache.org/

Oozie Quick Start: http://oozie.apache.org/docs/5.0.0/DG_QuickStart.html

Supported Hadoop Versions:

This version of Oozie was primarily tested against Hadoop 2.4.x and 2.6.x.

If you have any questions/issues, please send an email to:

[email protected]

Subscribe using the link:

http://oozie.apache.org/mail-lists.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].