- Apache Oozie Essentials
- Jagat Jasjit Singh
- 1677字
- 2021-07-30 09:58:21
Installing Oozie using tar ball
In this section, we will learn how to build and install Oozie from the source code. Since Hortonworks virtual machine had already Oozie installed, we did not need to do anything.
Just to learn how to install Oozie from tar ball in this section, we will use a Vagrant-based machine in which we will configure and install the Oozie server.
The summary of the steps we will perform is as follows:
- Create a test build machine.
- Download and build the Oozie code to make a WAR file.
- Download the Oozie third-party dependency jars and libraries.
- Package the Oozie WAR file and its dependencies into a WAR file.
- Configure the MySQL database for the Oozie server.
- Configure the shared library.
- Start and test the Oozie server.
Note
Just as a heads-up, the vagrant machine needs lot of resources to build the code. So, if you do not have a powerful machine, you can build it directly on your host operating system rather than the virtual machine. I am working on a MacBook Pro, which has a 16 GB RAM. I gave 8 GB to the virtual machine to show how to install Oozie from source.
Creating a test virtual machine
The following are the steps to create a test virtual machine:
- Download latest Oozie distribution from the Apache Oozie website. Go to the downloads section and download the latest version (4.2.0 at time of writing) in machine where you want to install it.
- Download and install Vagrant depending upon your operating system:
The Vagrant download
- After this, go to the VirtualBox website. Depending on your computer operating system, download and install the VirtualBox.
- If you already have a test machine that has a Linux-based operating system, then you can skip the Vagrant-based setup and follow the steps for building Oozie from scripts.
- Clone the source code for the book from https://github.com/jagatsingh/apache_oozie_essentials.git.
- Create a folder in your system called
dev
, or any suitable location where we can clone code. We will call thedev/apache_oozie_essentials
location as<BOOK_CODE_HOME>
in this book. The following are the commands to do this:$ git clone https://github.com/jagatsingh/apache_oozie_essentials.git $ cd <BOOK_CODE_HOME> $ cd learn_oozie/ch01 $ # Let's start the virtual machine $ vagrant up
- Wait for some time till our new test machine comes up.
Here is what Vagrant does behind the scene:
- Gets the image of the Centos 6.5 operating system
- Installs JDK, MySQL, Git, and Maven
- All the preceding steps are being done by the provider script, which is shown as follows:
$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo $ sudo yum install -y java-1.7.0-openjdk mysql-server git unzip zip apache-maven telnet $ cp /vagrant/files/maven/settings.xml /etc/maven/ $ sudo service mysqld start
- When the machine starts off completely, you will see something, as shown in the following figure:
Vagrant up finish
If you need a quick tutorial on how Vagrant works, then read the documentation on Vagrant at https://docs.vagrantup.com/v2/.
- Now we can log in to the virtual machine by using the command
vagrant ssh
. This command should be executed from the folderch01
. - Inside the Vagrant virtual machine, mount/vagrant is same as the
ch01
folder, placed at<BOOK_CODE_HOME>/learn_oozie/
, from where we started the Vagrant.$ cd /vagrant $ ls
Building Oozie source code
Let's build Oozie from the source code. We will download the latest Oozie distribution and build it. All of these steps are present in the script build_oozie.sh
placed at cat/vagrant/scripts/
.
The contents of the script which we will run is as follows:
# Download and make Oozie distribution $ cd ~/ $ mkdir {oozie_build,oozie_install,hadoop_install} $ cd oozie_build $ wget http://apache.mirror.digitalpacific.com.au/oozie/4.2.0/oozie-4.2.0.tar.gz $ tar -xvf oozie-4.2.0.tar.gz $ cd oozie-4.2.0 $ bin/mkdistro.sh -DskipTests -P hadoop-2
Summary of the build script
In the oozie_build
directory, we will build Oozie. In the oozie_install
directory, we will install Oozie. In the hadoop_install
directory, we will download Hadoop distribution and copy few jars needed for Oozie to run. You can also download the jars from your own hadoop cluster.
Let's run the command to start the Oozie build. It will take some time to download all the dependencies and build the source code:
$ /vagrant/scripts/build_oozie.sh
Tip
If you already have a Maven repository on your host machine and want to to avoid downloading maven artifacts again, then look at the Maven settings file. I have configured (and commented) it to use my MacBook home maven as I already had all the artifacts there. You can uncomment that if you want to do something similar.
Codehaus Maven move
Codehaus no longer serves up Maven repositories, we need to configure Maven to download those dependencies from a different location. If you look at /etc/maven/settings.xml
, which came with this machine, it has already been modified. You can see the details about it on the Codehaus website at http://www.codehaus.org/mechanics/maven/.
On a successful build, you should see something like the following screenshot:

Oozie build success
Download dependency jars
To run Oozie properly, the Oozie WAR file needs to have some dependencies packaged with it. Some of them are Hadoop, MySQL JDBC driver, Ext-js, and so on. The MySQL JDBC driver is used by the server database, and Ext-js is used by the Oozie web console.
We will copy all of them in to one folder, libext
, and then use the oozie-setup.sh
command to build the WAR file.
Let's download the Hadoop jars from your cluster or by executing the following steps:
$ cd ~/hadoop_install $ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz $ tar -xvf hadoop-2.4.0.tar.gz
Now we should have Hadoop extracted to the folder ~/hadoop_install
.
The preceding steps can be executed in one go using the following command:
/vagrant/scripts/download_hadoop_jars.sh
Preparing to create a WAR file
To create the WAR file, we need to copy the Oozie distro built earlier and combine it with the jars for Hadoop, the MySQL JDBC driver, and the Ext-js library.
If you remember from the previous Ambari Oozie configuration, we used MySQL as our database and configured it using the ambari-setup
command. We will take a similar approach for the MySQL JDBC driver jar, which we are providing by merging it with the Oozie WAR file.
Let's prepare the Oozie distro using the following commands:
# Prepare to make Oozie war file $ cd ~/oozie_install $ cp ~/oozie_build/oozie-4.2.0/distro/target/oozie-4.2.0-distro.tar.gz ~/oozie_install $ tar -xvf oozie-4.2.0-distro.tar.gz $ cd oozie-4.2.0 $ # Removing hsql jar as they cause class conflict $ rm lib/hsqldb-1.8.0.10.jar
Download the MySQL jar using the following commands:
# Collect all external jar files $ mkdir libext $ wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.36.tar.gz --no-check-certificate $ tar -xvf mysql-connector-java-5.1.36.tar.gz $ # Copy MySQL JDBC Driver $ cp mysql-connector-java-5.1.36/*.jar libext/
Merge the Hadoop jars and the ext-js library using the following commands:
$ cd libext $ wget http://dev.sencha.com/deploy/ext-2.2.zip $ # Collect hadoop related jars $ shopt -s globstar $ /bin/cp -rf ~/hadoop_install/hadoop-2.4.0/share/**/*.jar ~/oozie_install/oozie-4.2.0/libext $ # Removing source jars to reduce size $ rm -rf *sources* $ rm -rf *jasper*
All of the preceding steps can be executed in one go using the following command:
/vagrant/scripts/war_file_preparation.sh
After successful execution, go to /home/vagrant/oozie_install/oozie-4.2.0/libext
and see that we now have jars placed in the folder.
Create a WAR file
Now we need to package the oozie-distro
and jars that we copied in to the libext
folder as a single packaged WAR file. This WAR file will be deployed in tomcat by going to the folder /home/vagrant/oozie_install/oozie-4.2.0
and executing the following command:
bin/oozie-setup.sh prepare-war
The command completes with a WAR file being created in the folder, as shown in the following screenshot:

Prepare a WAR file
Note
Exercise: Execute bin/oozie-setup.sh
help and read all the commands possible with the setup
command.
Configure Oozie MySQL database
If you remember, we configured Ambari Oozie to use MySQL database for Oozie. We will do the same for this instance of the Oozie server.
At the Mysql prompt, execute the following:
$ mysql -u root CREATE USER 'oozie'@'%' IDENTIFIED BY 'hadoop'; CREATE DATABASE oozie; GRANT ALL PRIVILEGES ON oozie.* TO 'oozie'@'%' WITH GRANT OPTION;
This will create the Oozie database, which will be used by the server.
Go to /home/vagrant/oozie_install/oozie-4.2.0/conf
and open the oozie-site.xml
file. In this file, all the Oozie settings are declared. All the Oozie configuration properties and their default values are defined in the oozie-default.xml
file.
Oozie resolves configuration property values in the following order.
If a Java System
property is defined, it uses its value, else if the Oozie configuration file (oozie-site.xml
) contains the property, it uses its value, else it uses the default value documented in the oozie-default.xml
file.
Note
Oozie does not use the oozie-default.xml
file found in the conf/
directory. It is there for reference purposes only.
Let's edit the oozie-site.xml
and configure the database details. You can use the vi editor or copy the settings from the already created file using the following command:
$ cp /vagrant/files/oozie/oozie-site.xml /home/vagrant/oozie_install/oozie-4.2.0/conf/
If you want to edit it manually, then add the following code:
<property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> <description>JDBC driver class</description> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/${oozie.db.schema.name}?createDatabaseIfNotExist=true</value> <description>JDBC URL</description> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> <description>DB user name</description> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>hadoop</value> <description>DB user password</description> </property>
Note
Exercise: Execute bin/ooziedb.sh
help and read all the commands possible with the setup
command.
Let's create the database tables in our newly created database using the following command:
bin/ooziedb.sh create -sqlfile oozie.sql -run
The following screenshot shows the output generated:

Database creation success
Configure the shared library
We just need to tell Oozie about the shared libraries before starting the Oozie server. The Oozie sharelib
.tar.gz
file bundled with the distribution contains the necessary files to run Oozie Map-reduce streaming, Pig, Hive, Sqoop, Hcatalog, and Distcp actions.
Let's execute the following command:
bin/oozie-setup.sh sharelib create -fs oozie-sharelib-4.2.0.tar.gz
The following screenshot shows the output generated:

Create a shared library
Start server testing and verification
The following command is used to start the server:
bin/oozied.sh start
Note
Exercise: Execute bin/oozied.sh
help and read all the commands possible with the setup
command.
The command, on successful completion, will not print any error message. We can check the status of Oozie server using the following command:
bin/oozie admin -oozie http://localhost:11000/oozie -status
The output should be:
system mode: NORMAL
We can also check the Oozie web console by opening the URL http://localhost:11000/oozie
.
- 移動(dòng)UI設(shè)計(jì)(微課版)
- Learn Type:Driven Development
- Python從菜鳥到高手(第2版)
- 青少年美育趣味課堂:XMind思維導(dǎo)圖制作
- NumPy Essentials
- Groovy for Domain:specific Languages(Second Edition)
- Object-Oriented JavaScript(Second Edition)
- 程序設(shè)計(jì)基礎(chǔ)教程:C語言
- Python機(jī)器學(xué)習(xí)算法: 原理、實(shí)現(xiàn)與案例
- Natural Language Processing with Java and LingPipe Cookbook
- Programming Microsoft Dynamics? NAV 2015
- Web App Testing Using Knockout.JS
- 從零開始學(xué)Android開發(fā)
- Vue.js光速入門及企業(yè)項(xiàng)目開發(fā)實(shí)戰(zhàn)
- Hadoop Blueprints