2017/03/07

Cloudera : CDH5 - Manual Installation

After trying out the Cloudera Quickstart VM, I wanted to set up a CDH Eco system by installing all the required components manually to understand the installation process.  The below is the procedure that I followed:

I have a modest PC with below configuration:
1 TB Hard Disk, 16GB RAM, Windows 10 Pro  (64 bit)

I have downloaded and installed Virtual Box on my desktop and set up a VM with RHEL 6.4, 8GB RAM & 30GB Hard Disk.

Cloudera: CDH5 Installation

Most of the instructions for installing CDH5 are straight out of Cloudera documentation:
  • Download and Install CDH5 - Version 5.10
           [root@ODIGettingStarted ~]#  sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm 
  • Download and Install Zookeeper 
           # installing zookeeper Packages
           [root@ODIGettingStarted ~]# sudo yum install zookeeper 
           # installing zookeeper Server
           [root@ODIGettingStarted ~]# sudo yum install zookeeper-server
  • Install YARN resource manager
          [root@ODIGettingStarted ~]# sudo yum install hadoop-yarn-resourcemanager
  • Install namenode
          [root@ODIGettingStarted ~]# sudo yum install hadoop-hdfs-namenode
  • Install secondarynamenode
          [root@ODIGettingStarted ~]# sudo yum install hadoop-hdfs-secondarynamenode
  • Install datanode and MapReduce
          [root@ODIGettingStarted ~]# sudo yum install hadoop-hdfs-datanode
          [root@ODIGettingStarted ~]# sudo yum install hadoop-mapreduce

Starting CDH5 Services

After installation, let's start up the services one by one and see whether the installation went through successfully or not.
  • Starting Zookeeper-Server
        First start the Zookeeper Server.
           [root@ODIGettingStarted ~]# sudo service zookeeper-server start

        This did not go well, the zookeeper-server failed to start.  

        Apparently, after first installation the Zookeeper need to initiated first by                                 specifying "myid=n" where "n" is any integer.

          [root@ODIGettingStarted ~]# sudo service zookeeper-server init --myid = 1           
        [root@ODIGettingStarted ~]# sudo service zookeeper-server start

        After initiation the zookeeper started successfully.
  • Starting namenode
        Next, attempted to start namenode as below:
           [root@ODIGettingStarted ~]# sudo service hadoop-hdfs-namenode start

        Again I had few issues with starting up of NameNode, ultimately figured out that I                 needed to explicitly create directory paths for NameNode and DataNodes and grant             correct privileges (777 in my case).

        So I created below directories for namenode and data node and updated  the                       configuration to point to these directory paths.

           [root@ODIGettingStarted data]# ls -ltr
           total 12
           drwxr-xr-x. 2 root root 4096 Mar  4 17:47 current
           drwxrwxrwx. 2 root root 4096 Mar  4 18:09 dn
           drwxrwxrwx. 3 root root 4096 Mar  4 18:12 nn
           [root@ODIGettingStarted data]# 

        Also updated the configuration file "hdfs-site.xml" with below parameters:

         <property>
                  <name>  dfs.namenode.name.dir </name>
                  <value> /usr/lib/hadoop/data/nn </value> 
         </property>   

         <property>
               <name>  dfs.datanode.name.dir</name>  
               <value>/usr/lib/hadoop/data/dn</value>
         </property>   

         After these changes, the namenode, secondarynamenode and datanode services are          started successfully.
          [root@ODIGettingStarted ~]# sudo service hadoop-hdfs-namenode start
          [root@ODIGettingStarted ~]# sudo service hadoop-hdfs-secondarynamenode start
          [root@ODIGettingStarted ~]# sudo service hadoop-hdfs-datanode start

CDH5 Components Installation

Following successful installation and configuration of core CDH5, I also installed below components.
  • Installed mysql
  • Installed sqoop1
  • Installed pig
  • Installed hive
  • Installed hbase
  • Installed Impala




4 comments:

File Handling with Python

This little utility is for copying files from source to target directories.  On the way it checks whether a directory exists in the target, ...