HortonWorks HDP 2-5 Multinode Hadoop cluster Installation using ambari on CentOS 7 /Redhat ~ TECHTalksPro

HortonWorks HDP 2.5 Hadoop Multinode cluster installation/setup guide for Centos/Redhat

Hadoop Stack:

http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/hdp_25_repositories.html

Download HDP 2.5 Repository for setting up cluster

wget -nv http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0/HDP-2.5.0.0-centos7-rpm.tar.gz

wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7/HDP-UTILS-1.1.0.21-centos7.tar.gz

wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.1.0/ambari-2.4.1.0-centos7.tar.gz

We will use YUM (Yellowdog Updater Modified) tool which is developed by RedHat to install, update, remove, find packages, manage packages and repositories on Linux systems

HDP 2.5 prerequisites

1. Check OS of your machines and see if it's a 64-bit machine

For checking 32 bit / 64 bit

$ uname -m

x86_64 ==> 64-bit kernel

2. For checking the flavour of linux - cat /etc/*-release

3.Check if we have postgresql package exist

yum search postgres

Steps to follow the Installation:

1. Login as “ROOT” for Installing hadoop cluster

2. Check the below commands

1. cat /etc/redhat-release

2. Check for packages installed (Here if you didn't find any error means these pacakges were installed)

yum --help

rpm --help

curl --help

tar --help

wget --help

python -V

If not please install above packages using (Yum install <<package name>>)

Make a java dir in /usr/ and then download oracle JDK 8u51 from this link

http://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html#jdk-8u51-oth-JPR (down load jdk-8u51-linux-x64.tar.gz)

and put it in /usr/java/.

4. Extract the java tar file using below command

cd /usr/java/ tar -xzvf jdk-8u51-linux-x64.tar.gz

5. Create a symbolic link (symlink) to the JDK:

ln -s /usr/java/jdk1.8.0_51 /usr/java/default

Set the JAVA_HOME and PATH environment variables.

export JAVA_HOME=/usr/java/default

export PATH=$JAVA_HOME/bin:$PATH

Add the below entries in end of the profile file using command: Vi /etc/profile

export JAVA_HOME=/usr/java/default

export PATH=$JAVA_HOME/bin:$PATH

Verify that Java is installed in your environment by running the following command:

java -version

You should see output similar to the following:

java version "1.8.0_51"

6. Make sure to repeat the Steps (3,4,5) in all other nodes

7. Make an entry for all the hostnames in all machines like below

vi /etc/hosts

IPadddress <<tabspace>> custom hostname <<tabspace>>customname

e.g.

192.16.1.1 hadoopmaster.techtalks.local hdpm

192.16.1.2 hadoopslave1.techtalks.local hdps1

192.16.1.3 hadoopslave2.techtalks.local hdps2

8. update the FQDN (Fully Qualified Domain Name) in all the hosts

then enter

vi /etc/hostname

hadoopmaster.techtalks.local -> This is for master host

hadoopslave1.techtalks.local -- This is for slave host

9. Do network service restart after adding hosts and Hostnames using below command

service network restart

[ this is not required for every time. Whenever we see any issues in finding the host name then only do network restart]

10. Create SSH key in primary node and then copy that key to all other slave machines to avoid giving the password while connecting from master node. This is required for setting up ambari agents

From master node enter below command:

ssh-keygen

After entering above command press enter and for every prompt just do ENTER then it will automatically create SSH keys

11. Create authorized_keys file like below

cd /root/.ssh/

cat id_rsa.pub >> authorized_keys

12. Check whether we have /root/.ssh/ folder in all other slave nodes if not create .ssh folder

Then copy the above created authorized keys file to all other slave nodes like below

scp /root/.ssh/authorized_keys root@hadoopslave1.techtalks.local://root/.ssh/

13. Check Httpd Service in all the machines

[root@hadoop1 ~]# yum search httpd

[root@hadoop1 ~]# yum install httpd

[root@hadoop1 ~]# service httpd start

14. Configuring iptables

For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:

Disable fire wall on all the nodes using below commands

systemctl stop firewalld

service firewalld stop

systemctl disable firewalld

Note: We should restart firewallid once we are done with entire cluster setup

15. Enable NTP in all the machines (Master + all slave nodes)

Commands:

yum install ntp

Enable the service:

systemctl enable ntpd

Start NTPD:

systemctl start ntpd

16. Enter master node IP address in browser: http://xxx.xx.x.xx/ . You will see Testing 123 page

17 . Extract Amabari Tar file

tar -zvxf ambari-2.4.1.0-centos7.tar.gz -C /var/www/html/

tar -zvxf HDP-UTILS-1.1.0.21-centos7.tar.gz -C /var/www/html/

tar -zvxf HDP-2.5.0.0-centos7-rpm.tar.gz -C /var/www/html/

18. Create local repository

Create amabari repo file

[root@hadoop1]# cd /etc/yum.repos.d

[root@hadoop1 yum.repos.d]# vi ambari.repo

make below entries in ambari.repo file

[Ambari]

name=Ambari

enabled=1

baseurl=http://[Master node IP]/AMBARI-2.4.1.0/centos7/2.4.1.0-22/

gpgcheck=0

Check ambari is showing in repo list

[root@hadoop1 yum.repos.d]# yum repolist

Copy hdp.repo file

[root@hadoop1] cd /var/www/html/

[root@hadoop1 html]# cp ./HDP/centos7/hdp.repo /etc/yum.repos.d/

Update the gpcheck value from 1 to 0

[root@hadoop1 html]# vi /etc/yum.repos.d/hdp.repo

should looks like below

#VERSION_NUMBER=2.5.0.0-1245

[HDP-2.5.0.0]

name=HDP Version - HDP-2.5.0.0

baseurl=http://[Masternode ip]/HDP/centos7/

gpgcheck=0

gpgkey=http://[Masternode ip]/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins

enabled=1

priority=1

[HDP-UTILS-1.1.0.21]

name=HDP-UTILS Version - HDP-UTILS-1.1.0.21

baseurl=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/

gpgcheck=0

gpgkey=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins

enabled=1

priority=1

19. ********** Start installing ambari server

1. Install Postgre sql

[root@hadoop1 yum.repos.d]# yum install postgresql-server

2. Install ambari server

[root@hadoop1 ~]# yum install ambari-server

While installing if you see any Y/N then Give Y

3. Before doing ambari-server setup first disable SELinux status in all the nodes

[root@hadoop1 ~]# vi /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

# enforcing - SELinux security policy is enforced.

# permissive - SELinux prints warnings instead of enforcing.

# disabled - No SELinux policy is loaded.

SELINUX=disabled

# SELINUXTYPE= can take one of three two values:

# targeted - Targeted processes are protected,

# minimum - Modification of targeted policy. Only selected processes are protected.

# mls - Multi Level Security protection.

SELINUXTYPE=targeted

AMBARI Server Setup

[root@hadoop1 java]# ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar

[root@hadoop1 java]# ambari-server setup –j /usr/java/jdk1.8.0_51/

While doing ambari-server setup

some times you may see error like "ambari-server.py: error: Invalid number of arguments. Entered: 3, required: 1"

In this case directly run below command

>ambari-server setup

And then select customer JDK-> give the java_home path. Here if it asks for custom accounts then select it as 'n' and proceed further

[root@hadoop1 java]# ambari-server start

Create Hive and oozie user

[root@hdpmaster etc]# su - postgres

then enter 'psql' to get postgres shell promt

postgres=# create database hivemeta;

CREATE DATABASE

postgres=# create user hive with password 'hive';

CREATE ROLE

postgres=# grant all privileges on database hivemeta to hive;

GRANT

postgres=# create database ooziemeta;

CREATE DATABASE

postgres=# create user oozie with password 'oozie';

CREATE ROLE

postgres=# grant all privileges on database ooziemeta to oozie;

GRANT

Then enter \q

postgres=# exit

After that edit hba_conf file to access Metadata database

[root@hdpmaster data]# cd /var/lib/pgsql/data

[root@hdpmaster data]# ls -l

[root@hdpmaster data]# vi pg_hba.conf

Add below entries at the end of file

host all all 0.0.0.0/0 trust

Then enter service postgresql restart

20. Open Ambari UI to install required hadoop services

Enter

1. Go to http://<<masternodeIP>>:8080/ and enter username and password as admin

2. Add required cluster name

3. Enter the list of Hosts and then click next from Install options

4. Confirm the hosts

5. Choose required services

6. Assign the master and slave nodes and clients

7. Customise the services and then do installations. This will start Hadoop installation

Thanks for Reading this article. If you need any help in setting up cluster please reach me on email: techvasuit@gmail.com

Please watch this video

2 comments:

GOLDY.MNovember 10, 2017 at 11:41 PM

Hi Thanks for this nice article which explains about configuring and install HortonWorks Hadoop.
reference http://www.techtalkspro.com/2017/02/hortonworks-hdp-2-5-multinode-hadoop-cluster-setup-installation-on-Centos7-redhat.html

I am currently facing an issue in 18 th step " Create local repository" in that I could not resolve the below HDP-UTILS path
http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/

There is no folder named HDP-UTILS-1.1.0.21 in /var/www/html/, I repeated the command "tar -zvxf HDP-UTILS-1.1.0.21-centos7.tar.gz -C /var/www/html/" thinking that it din't extracted the files properly, but no luck. Please help me to resolve this issue.

Copy hdp.repo file
[root@hadoop1] cd /var/www/html/
[root@hadoop1 html]# cp ./HDP/centos7/hdp.repo /etc/yum.repos.d/

Update the gpcheck value from 1 to 0
[root@hadoop1 html]# vi /etc/yum.repos.d/hdp.repo

should looks like below

#VERSION_NUMBER=2.5.0.0-1245
[HDP-2.5.0.0]
name=HDP Version - HDP-2.5.0.0
baseurl=http://[Masternode ip]/HDP/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1

[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1

AnonymousDecember 31, 2017 at 3:45 AM
Should be like following

#VERSION_NUMBER=2.5.0.0-1245
[HDP-2.5.0.0]
name=HDP Version - HDP-2.5.0.0
baseurl=http://192.168.1.165/HDP/centos7/
gpgcheck=0
gpgkey=http://192.168.1.165/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1

[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://192.168.1.165/
gpgcheck=0
gpgkey=http://192.168.1.165/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
[root@namenode repodata]#

Saturday, February 25, 2017

HortonWorks HDP 2-5 Multinode Hadoop cluster Installation using ambari on CentOS 7 /Redhat

HortonWorks HDP 2.5 Hadoop Multinode cluster installation/setup guide for Centos/Redhat