Using Ambari Simplifying Management Of A Hadoop Cluster

Within the Hadoop ecosystem Apache Ambari was developed to provide a simple way of managing Hadoop clusters using a web based interface. Cluster management services that are possible using Ambari are provisioning clusters, management of clusters and their monitoring. To help in provisioning clusters an installation wizard to install Hadoop on a desired number of hosts is available. Cluster configuration is another feature that is available to help in provisioning. To support cluster management Ambari provides a way to start, stop and reconfigure all services that are installed within a cluster. To aid cluster monitoring there is a dashboard that displays the status of the cluster. Furthermore collection of metrics and creation of system alerts are features provided by Ambari. In this tutorial it will be demonstrated how to use these Ambari capabilities to simplify cluster management. Developers who are interested integrating cluster management into their applications can rely on Ambari REST APIs although we we will not look at that in this tutorial.

To install Ambari as a user you download it from a repository while as a developer source code that you can compile is available. Any of these is available at this link https://cwiki.apache.org/confluence/display/AMBARI/Installation+Guide+for+Ambari+2.2.2 . To install Ambari on ubuntu as a user first you need to select an appropriate repository by running the commands below at a terminal. If you are using a different platform from ubuntu you need to use the appropriate repository.
cd /etc/apt/sources.list.d
sudo wget http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/
select repo

After the repository has been selected the next step is to install and configure the server. The commands below will do that.

sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
sudo apt-get update
sudo apt-get install ambari-server

install ambari

Once installation is complete we need to perform configuratins for the Ambari server, database, JDK and other options that are required for proper running of Ambari server. The command below is used to start the process of configuring Ambari.
sudo ambari-server setup
server setup

configuring ambari database

During the set up process you will be prompted to enter an account for the Ambari server daemon, select JDK files, accept Oracle licence agreement and set up the database that will be used by the Ambari server. There is an embedded Postgresql database that can be configured. Other database choices are Mysql, Oracle, Microsoft SQL server and SQL anywhere. Once you have specified database name, schema, username and password you have completed the configuration process. Check that you get a ‘Ambari server setup completed successfully message to ensure installation was successful. The screen shots below show you the configuration details.
Once set up process is complete you are ready to start Ambari server using the command below.

 sudo ambari-server start
start ambari

The server will start and show you where logs are saved. Please make sure you get a ‘Ambari server start completed successfully to ensure start up was okay. Once the server is running start your favorite internet browser and navigate to the adress where your server is installed. Assuming a local install you navigate to this address http://localhost:8080, to access the web based interface. Login with username admin and password admin which can be changed later.
ambari login

Once you login you will be led to a welcome screen from where you can manage users and create a cluster. Click on launch install wizard to create a cluster.
ambari welcome

The first step is to give your cluster a name, for this tutorial we will use eduonix as our cluster name. Type eduonix and click next.
After clicking next you will have the option of selecting a HDP stack that you would like to install. HDP is the Hadoop distribution offered by Hortornworks.

select stack

After selecting the stack to be installed you need to specify all the target hosts where Hadoop will be installed using a fully qualified domain name (FQDN). A FQDN is used to specify the exact location of a computer host so that no other host can have such a location. When installing Hadoop in a cluster it is advisable to work with a person knowledgeable in networking so that they can assist you in identifying the FQDN of your hosts. When you would just like to install on local machine specify localhost as your FQDN. You also need to provide SSH keys because SSH is the protocol used to communicate by all hosts in a Hadoop cluster. This enables automatic installation of agents on each host by the Ambari server. You can opt to install agents manually but this has to be done before the Ambari server is started.
The process of installing, configuring and generating SSH keys is a bit involving. For a complete discussion of these issues please refer to setting up Hadoop tutorial. To generate a key you use the ssh-keygen command at the terminal. You specify the file to store the key and the pass phrase for protecting the key. The public key is saved as id_rsa.pub and the private key is saved as id_rsa in your .ssh directory. You need to copy the public key to all hosts in the cluster while the private key is retained on the machine running the Ambari server. For example on a ubuntu machine with user eduonix the keys are saved in /home/eduonix/.ssh/ directory. You need to specify the private key as shown in the dialog below.
host names

You can specify the path to the private key or copy the contents and paste them. After that click on register and confirm. You will be led to a screen where you are able to confirm the hosts that will make up the cluster.
confirm hosts

The confirm hosts page gives a summary of hosts that have been successfully installed and those that have failed. From here you can also remove any hosts not needed in the cluster.
This tutorial introduced you to Ambari which is project that has been developed to simplify administration of Hadoop clusters. A brief overview of what can be done using Ambari was given. The process of updating repository and installing Ambari was discussed. Configuring JDK and Postgresql was discussed. Getting SSH keys and using them in installation was discussed. This was the first part of this tutorial. In the second part we will demonstrate how cluster management is achieved using Ambari.


Post a Comment

Google Q&A Forum