- Apache Oozie Essentials
- Jagat Jasjit Singh
- 213字
- 2021-07-30 09:58:20
Chapter 1. Setting up Oozie
Oozie is a workflow scheduler system to run Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclic Graphs (DAGs) of actions. More information on DAG can be found at https://en.wikipedia.org/wiki/Directed_acyclic_graph. Actions tell what to do in the job. Oozie supports running jobs of various types such as Java, Map-reduce, Pig, Hive, Sqoop, Spark, and Distcp. The output of one action can be consumed by the next action to create a chain sequence.
Oozie has client-server architecture, in which we install the server for storing the jobs and using client we submit our jobs to the server.
In this chapter, we will learn how to install Oozie for learning purpose and in production. For learning purposes, we will build Oozie from the source code, and for production we will use Hadoop distribution by Hortonworks. Throughout the book, we will use Hortonworks single node virtual machine. If you are using a different Hadoop distribution, you should not worry at all. All distribution packages are the same for Oozie software, which is made by the Apache community (http://oozie.apache.org).
After reading this chapter, we will be able to:
- Configure Oozie in Hortonworks distribution using Ambari
- Install Oozie using the source code provided as tar ball by the Apache Oozie website