HortonWorks HDP 2.5 Hadoop Multinode cluster installation/setup guide for Centos/Redhat
Hadoop Stack:
Download HDP 2.5 Repository for setting up cluster
wget -nv http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0/HDP-2.5.0.0-centos7-rpm.tar.gz
We will use YUM (Yellowdog Updater Modified) tool which is developed by RedHat to install, update, remove, find packages, manage packages and repositories on Linux systems
HDP 2.5 prerequisites
1. Check OS of your machines and see if it's a 64-bit machine
For checking 32 bit / 64 bit
$ uname -m
x86_64 ==> 64-bit kernel
2. For checking the flavour of linux - cat /etc/*-release
3.Check if we have postgresql package exist
yum search postgres
Steps to follow the Installation:
1. Login as “ROOT” for Installing hadoop cluster
2. Check the below commands
1. cat /etc/redhat-release
2. Check for packages installed (Here if you didn't find any error
means these pacakges were installed)
yum --help
rpm --help
curl
--help
tar --help
wget --help
python -V
If not please install above packages using (Yum install
<<package name>>)
3.
Make a java dir in /usr/ and then download oracle JDK
8u51 from this link
http://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html#jdk-8u51-oth-JPR
(down load jdk-8u51-linux-x64.tar.gz)
and put it in /usr/java/.
4. Extract the java tar file using below command
cd /usr/java/ tar -xzvf
jdk-8u51-linux-x64.tar.gz
5. Create a symbolic link (symlink) to the JDK:
ln -s /usr/java/jdk1.8.0_51 /usr/java/default
Set the JAVA_HOME and PATH environment variables.
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
Add the below entries in end of the profile file using command:
Vi /etc/profile
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
Verify that Java is installed in your environment by running the
following command:
java -version
You should see output similar to the following:
java version "1.8.0_51"
6. Make sure to repeat the Steps (3,4,5) in all other nodes
7. Make an entry for all the hostnames in all machines like below
vi /etc/hosts
IPadddress <<tabspace>> custom hostname
<<tabspace>>customname
e.g.
192.16.1.1 hadoopmaster.techtalks.local hdpm
192.16.1.2 hadoopslave1.techtalks.local hdps1
192.16.1.3 hadoopslave2.techtalks.local hdps2
8. update
the FQDN (Fully Qualified Domain Name) in all the hosts
login
to the host which you think to configure it as master
then
enter
vi
/etc/hostname
hadoopmaster.techtalks.local
-> This is for master host
Login to
slave host and then enter vi /etc/hostname and add an entry like below
hadoopslave1.techtalks.local
-- This is for slave host
9. Do
network service restart after adding hosts and Hostnames using below
command
service
network restart
[ this is not required for every time. Whenever we see any issues in finding
the host name then only do network restart]
10. Create
SSH key in primary node and then copy that key to all other slave machines to
avoid giving the password while connecting from master node. This is required
for setting up ambari agents
From master node enter below command:
ssh-keygen
After
entering above command press enter and for every prompt just do ENTER then it will
automatically create SSH keys
11. Create
authorized_keys file like below
cd
/root/.ssh/
cat
id_rsa.pub >> authorized_keys
12. Check
whether we have /root/.ssh/ folder in all other slave nodes if not create .ssh
folder
Then copy
the above created authorized keys file to all other slave nodes like
below
scp
/root/.ssh/authorized_keys root@hadoopslave1.techtalks.local://root/.ssh/
13. Check
Httpd Service in all the machines
[root@hadoop1 ~]# yum search httpd
[root@hadoop1 ~]# yum install httpd
[root@hadoop1 ~]# service httpd start
14. Configuring iptables
For Ambari to communicate during setup with the hosts it deploys to and
manages, certain ports must be open and available. The easiest way to do this
is to temporarily disable iptables, as follows:
Disable fire wall on all the nodes using below commands
systemctl stop firewalld
service firewalld stop
systemctl disable firewalld
Note: We should restart firewallid once we are done with entire cluster
setup
15. Enable NTP in all the machines (Master + all slave nodes)
Commands:
yum install ntp
Enable the service:
systemctl enable ntpd
Start NTPD:
systemctl start ntpd
16. Enter master node IP address in browser: http://xxx.xx.x.xx/ .
You will see Testing 123 page
17 . Extract Amabari Tar file
tar -zvxf ambari-2.4.1.0-centos7.tar.gz -C /var/www/html/
tar -zvxf HDP-UTILS-1.1.0.21-centos7.tar.gz -C /var/www/html/
tar -zvxf HDP-2.5.0.0-centos7-rpm.tar.gz -C /var/www/html/
18. Create local repository
Create amabari repo file
[root@hadoop1]# cd /etc/yum.repos.d
[root@hadoop1 yum.repos.d]# vi ambari.repo
make below entries in ambari.repo file
[Ambari]
name=Ambari
enabled=1
baseurl=http://[Master node IP]/AMBARI-2.4.1.0/centos7/2.4.1.0-22/
gpgcheck=0
Check ambari is showing in repo list
[root@hadoop1 yum.repos.d]# yum repolist
Copy hdp.repo file
[root@hadoop1] cd /var/www/html/
[root@hadoop1 html]# cp ./HDP/centos7/hdp.repo /etc/yum.repos.d/
Update the gpcheck value from 1 to 0
[root@hadoop1 html]# vi /etc/yum.repos.d/hdp.repo
should looks like below
#VERSION_NUMBER=2.5.0.0-1245
[HDP-2.5.0.0]
name=HDP Version - HDP-2.5.0.0
baseurl=http://[Masternode ip]/HDP/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
19. ************** Start installing ambari server****
1. Install Postgre sql
[root@hadoop1 yum.repos.d]# yum install postgresql-server
2. Install ambari server
[root@hadoop1 ~]# yum install ambari-server
While installing if you see any Y/N then Give Y
3. Before doing ambari-server setup first disable SELinux status in all
the nodes
[root@hadoop1 ~]# vi /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is
enforced.
# permissive - SELinux prints warnings instead
of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy.
Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
AMBARI Server Setup
[root@hadoop1 java]# ambari-server setup --jdbc-db=postgres
--jdbc-driver=/usr/share/java/postgresql-jdbc.jar
[root@hadoop1 java]# ambari-server setup –j /usr/java/jdk1.8.0_51/
While doing ambari-server setup
some times you may see error like "ambari-server.py: error: Invalid number of arguments. Entered: 3, required: 1"
In this case directly run below command
>ambari-server setup
And then select customer JDK-> give the java_home path. Here if it asks for custom accounts then select it as 'n' and proceed further
[root@hadoop1 java]# ambari-server start
Create Hive and oozie user
[root@hdpmaster etc]# su - postgres
then enter 'psql' to get postgres shell promt
postgres=# create database hivemeta;
CREATE DATABASE
postgres=# create user hive with password 'hive';
CREATE ROLE
postgres=# grant all privileges on database hivemeta to hive;
GRANT
postgres=# create database ooziemeta;
CREATE DATABASE
postgres=# create user oozie with password 'oozie';
CREATE ROLE
postgres=# grant all privileges on database ooziemeta to oozie;
GRANT
Then enter \q
postgres=# exit
After that edit hba_conf file to access Metadata database
[root@hdpmaster data]# cd /var/lib/pgsql/data
[root@hdpmaster data]# ls -l
[root@hdpmaster data]# vi pg_hba.conf
Add below entries at the end of file
host all all 0.0.0.0/0 trust
Then enter service postgresql restart
20. Open Ambari UI to install required hadoop services
Enter
1. Go to http://<<masternodeIP>>:8080/ and enter username and password as admin
2. Add required cluster name
3. Enter the list of Hosts and then click next from Install options
4. Confirm the hosts
5. Choose required services
6. Assign the master and slave nodes and clients
7. Customise the services and then do installations. This will start Hadoop installation
Thanks for Reading this article. If you need any help in setting up cluster please reach me on email: techvasuit@gmail.com
Please watch this video
Please watch this video
ReplyDeleteHi Thanks for this nice article which explains about configuring and install HortonWorks Hadoop.
reference http://www.techtalkspro.com/2017/02/hortonworks-hdp-2-5-multinode-hadoop-cluster-setup-installation-on-Centos7-redhat.html
I am currently facing an issue in 18 th step " Create local repository" in that I could not resolve the below HDP-UTILS path
http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/
There is no folder named HDP-UTILS-1.1.0.21 in /var/www/html/, I repeated the command "tar -zvxf HDP-UTILS-1.1.0.21-centos7.tar.gz -C /var/www/html/" thinking that it din't extracted the files properly, but no luck. Please help me to resolve this issue.
Copy hdp.repo file
[root@hadoop1] cd /var/www/html/
[root@hadoop1 html]# cp ./HDP/centos7/hdp.repo /etc/yum.repos.d/
Update the gpcheck value from 1 to 0
[root@hadoop1 html]# vi /etc/yum.repos.d/hdp.repo
should looks like below
#VERSION_NUMBER=2.5.0.0-1245
[HDP-2.5.0.0]
name=HDP Version - HDP-2.5.0.0
baseurl=http://[Masternode ip]/HDP/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
Should be like following
ReplyDelete#VERSION_NUMBER=2.5.0.0-1245
[HDP-2.5.0.0]
name=HDP Version - HDP-2.5.0.0
baseurl=http://192.168.1.165/HDP/centos7/
gpgcheck=0
gpgkey=http://192.168.1.165/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://192.168.1.165/
gpgcheck=0
gpgkey=http://192.168.1.165/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
[root@namenode repodata]#