TECHTalksPro
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • SQL Server
    • SQL Server 2012
    • SQL Server 2014
    • SQL Server 2016
  • Downloads
    • PowerShell Scripts
    • Database Scripts
  • Big Data
    • Hadoop
      • Hive
      • Pig
      • HDFS
    • MPP
  • Certifications
    • Microsoft SQL Server -70-461
    • Hadoop-HDPCD
  • Problems/Solutions
  • Interview Questions

Saturday, February 25, 2017

HortonWorks HDP 2-5 Multinode Hadoop cluster Installation using ambari on CentOS 7 /Redhat

 Chitchatiq     2/25/2017 11:45:00 PM     SQL Server     2 comments   

HortonWorks HDP 2.5 Hadoop Multinode cluster installation/setup guide for Centos/Redhat


Hadoop Stack:
http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/hdp_25_repositories.html

 Download HDP 2.5 Repository for setting up cluster 


wget -nv http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0/HDP-2.5.0.0-centos7-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7/HDP-UTILS-1.1.0.21-centos7.tar.gz

wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.1.0/ambari-2.4.1.0-centos7.tar.gz


We will use YUM (Yellowdog Updater Modified) tool which is developed by RedHat to install, update, remove, find packages, manage packages and repositories on Linux systems 

HDP 2.5 prerequisites


1. Check OS of your machines and see if it's a 64-bit machine
For checking 32 bit / 64 bit  
$ uname -m
x86_64 ==> 64-bit kernel
2. For checking the flavour of linux -  cat /etc/*-release
3.Check if we have postgresql package exist
     yum search postgres
 
    


Steps to follow the Installation:
1.  Login as “ROOT” for Installing hadoop cluster

2. Check the below commands
 1. cat  /etc/redhat-release
 2. Check for packages installed (Here if you didn't find any error means these pacakges were installed)
                    
  yum --help
   rpm --help
   curl --help
    tar --help
   wget --help
   python -V
If not please install above packages using (Yum install <<package name>>)

3. 
Make a java dir in /usr/ and then download oracle JDK 8u51  from this link
http://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html#jdk-8u51-oth-JPR (down load jdk-8u51-linux-x64.tar.gz)
and put it in /usr/java/.

4. Extract the java tar file using below command 
       
        cd /usr/java/ tar -xzvf jdk-8u51-linux-x64.tar.gz
  
5. Create a symbolic link (symlink) to the JDK:

ln -s /usr/java/jdk1.8.0_51  /usr/java/default

Set the JAVA_HOME and PATH environment variables.

export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH

Add the below entries in end of the profile file using command: Vi /etc/profile

export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH


Verify that Java is installed in your environment by running the following command:
java -version
You should see output similar to the following:
java version "1.8.0_51"

 6. Make sure to repeat the Steps (3,4,5) in all other nodes

7. Make an entry for all the hostnames in all machines like below

vi /etc/hosts
IPadddress  <<tabspace>>  custom hostname <<tabspace>>customname
e.g.
192.16.1.1   hadoopmaster.techtalks.local   hdpm
192.16.1.2   hadoopslave1.techtalks.local   hdps1
192.16.1.3   hadoopslave2.techtalks.local   hdps2








8. update the FQDN (Fully Qualified Domain Name) in all the hosts
 login to the host which you think to configure it as master
then enter 

vi /etc/hostname

 hadoopmaster.techtalks.local  -> This is for master host
Login to slave host and then enter vi /etc/hostname and add an entry like below
hadoopslave1.techtalks.local -- This is for slave host

9. Do network service restart  after adding hosts and Hostnames using below command

service network restart 
[ this is not required for every time. Whenever we see any issues in finding the host name then only do network restart]

10. Create SSH key in primary node and then copy that key to all other slave machines to avoid giving the password while connecting from master node. This is required for setting up ambari agents 

From master node enter below command:
 ssh-keygen

After entering above command press enter and for every prompt just do ENTER then it will automatically create SSH keys

11. Create authorized_keys file like below

cd /root/.ssh/

cat id_rsa.pub >> authorized_keys

12. Check whether we have /root/.ssh/ folder in all other slave nodes if not create .ssh folder

Then copy the above created authorized keys file to all other slave nodes like below 

scp /root/.ssh/authorized_keys  root@hadoopslave1.techtalks.local://root/.ssh/


13.  Check Httpd Service in all the machines
       [root@hadoop1 ~]# yum search httpd
       [root@hadoop1 ~]# yum install httpd
       [root@hadoop1 ~]# service httpd start


14. Configuring iptables

For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:

Disable fire wall on all the nodes using below commands

systemctl stop  firewalld
service firewalld stop
systemctl disable firewalld

Note: We should restart firewallid once we are done with entire cluster setup

15. Enable NTP in all the machines (Master + all slave nodes)

Commands:
 yum install ntp

Enable the service:
systemctl enable ntpd

Start NTPD:
systemctl start ntpd

16. Enter master node IP address in browser: http://xxx.xx.x.xx/  . You will see Testing 123 page

17 . Extract Amabari Tar file

tar -zvxf ambari-2.4.1.0-centos7.tar.gz -C /var/www/html/
tar -zvxf HDP-UTILS-1.1.0.21-centos7.tar.gz  -C /var/www/html/
tar -zvxf HDP-2.5.0.0-centos7-rpm.tar.gz -C /var/www/html/

18. Create local repository


Create amabari repo file
[root@hadoop1]# cd /etc/yum.repos.d

[root@hadoop1 yum.repos.d]# vi ambari.repo
make below entries in ambari.repo file

[Ambari]
name=Ambari
enabled=1
baseurl=http://[Master node IP]/AMBARI-2.4.1.0/centos7/2.4.1.0-22/
gpgcheck=0


Check  ambari is showing in repo list
[root@hadoop1 yum.repos.d]# yum repolist

Copy hdp.repo file
 [root@hadoop1] cd /var/www/html/
[root@hadoop1 html]# cp ./HDP/centos7/hdp.repo  /etc/yum.repos.d/

Update the gpcheck value from 1 to 0
[root@hadoop1 html]# vi /etc/yum.repos.d/hdp.repo

should looks like below

#VERSION_NUMBER=2.5.0.0-1245
[HDP-2.5.0.0]
name=HDP Version - HDP-2.5.0.0
baseurl=http://[Masternode ip]/HDP/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1


[HDP-UTILS-1.1.0.21]
name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
baseurl=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/
gpgcheck=0
gpgkey=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1

19. ************** Start installing ambari server****


1. Install Postgre sql
[root@hadoop1 yum.repos.d]# yum install postgresql-server


2. Install ambari server
[root@hadoop1 ~]# yum install ambari-server
While installing if you see any Y/N then Give Y


3. Before doing ambari-server setup first disable SELinux status in all the nodes
[root@hadoop1 ~]# vi /etc/selinux/config


# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted



AMBARI Server Setup

[root@hadoop1 java]# ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar

[root@hadoop1 java]# ambari-server setup –j  /usr/java/jdk1.8.0_51/
While doing ambari-server setup

 some times you may see error like "ambari-server.py: error: Invalid number of arguments. Entered: 3, required: 1"

In this case directly run below command
>ambari-server setup
And then select customer JDK-> give the java_home path. Here if it asks for custom accounts then select it as 'n' and proceed further
  
[root@hadoop1 java]# ambari-server start

 Create Hive and oozie user

[root@hdpmaster etc]# su - postgres
 then enter 'psql' to get postgres shell promt 
postgres=# create database hivemeta;
CREATE DATABASE
postgres=# create user hive with password 'hive';
CREATE ROLE
postgres=# grant all privileges on database hivemeta to hive;
GRANT
postgres=# create database ooziemeta;
CREATE DATABASE
postgres=# create user oozie with password 'oozie';
CREATE ROLE
postgres=# grant all privileges on database ooziemeta to oozie;
GRANT

Then enter \q
postgres=# exit


After that edit hba_conf file to access Metadata database
  
[root@hdpmaster data]# cd /var/lib/pgsql/data
[root@hdpmaster data]# ls -l
[root@hdpmaster data]# vi pg_hba.conf
Add below entries at the end of file
 host  all  all   0.0.0.0/0  trust
Then enter service postgresql restart

20.  Open Ambari UI to install required hadoop services 
Enter 
1. Go to http://<<masternodeIP>>:8080/ and enter username and password as admin
2. Add required cluster name
3. Enter the list of Hosts and then click next from Install options
4. Confirm the hosts
5. Choose required services
6. Assign the master and slave nodes and clients
7.  Customise the services and then do installations. This will start Hadoop installation

Thanks for Reading this article. If you need any help in setting up cluster please reach me on email: techvasuit@gmail.com

Please watch this video




  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook
Newer Post Older Post Home

2 comments:

  1. GOLDY.MNovember 10, 2017 at 11:41 PM


    Hi Thanks for this nice article which explains about configuring and install HortonWorks Hadoop.
    reference http://www.techtalkspro.com/2017/02/hortonworks-hdp-2-5-multinode-hadoop-cluster-setup-installation-on-Centos7-redhat.html


    I am currently facing an issue in 18 th step " Create local repository" in that I could not resolve the below HDP-UTILS path
    http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/

    There is no folder named HDP-UTILS-1.1.0.21 in /var/www/html/, I repeated the command "tar -zvxf HDP-UTILS-1.1.0.21-centos7.tar.gz -C /var/www/html/" thinking that it din't extracted the files properly, but no luck. Please help me to resolve this issue.

    Copy hdp.repo file
    [root@hadoop1] cd /var/www/html/
    [root@hadoop1 html]# cp ./HDP/centos7/hdp.repo /etc/yum.repos.d/

    Update the gpcheck value from 1 to 0
    [root@hadoop1 html]# vi /etc/yum.repos.d/hdp.repo

    should looks like below

    #VERSION_NUMBER=2.5.0.0-1245
    [HDP-2.5.0.0]
    name=HDP Version - HDP-2.5.0.0
    baseurl=http://[Masternode ip]/HDP/centos7/
    gpgcheck=0
    gpgkey=http://[Masternode ip]/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
    enabled=1
    priority=1


    [HDP-UTILS-1.1.0.21]
    name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
    baseurl=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/
    gpgcheck=0
    gpgkey=http://[Masternode ip]/HDP-UTILS-1.1.0.21/repos/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
    enabled=1
    priority=1


    ReplyDelete
    Replies
      Reply
  2. AnonymousDecember 31, 2017 at 3:45 AM

    Should be like following

    #VERSION_NUMBER=2.5.0.0-1245
    [HDP-2.5.0.0]
    name=HDP Version - HDP-2.5.0.0
    baseurl=http://192.168.1.165/HDP/centos7/
    gpgcheck=0
    gpgkey=http://192.168.1.165/HDP/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
    enabled=1
    priority=1


    [HDP-UTILS-1.1.0.21]
    name=HDP-UTILS Version - HDP-UTILS-1.1.0.21
    baseurl=http://192.168.1.165/
    gpgcheck=0
    gpgkey=http://192.168.1.165/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
    enabled=1
    priority=1
    [root@namenode repodata]#

    ReplyDelete
    Replies
      Reply
Add comment
Load more...

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum Best Practises
    Best Practices: A distribution key should not have more than 2 columns, recommended is 1 column. While modeling a database,...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...
  • How to fix ERROR 1045 : Access denied for user 'root'@'localhost' (using password: YES)
    Please follow below steps to fix HDP Sandbox -Mysql root password issue: Enter mysql -u root UPDATE mysql.user SET Password=PASSWOR...
  • HADOOP - HDFS OPERATIONS
    Starting HDFS To format the configured HDFS file system, execute the following command in namenode HDFS server, $ hadoop namenode ...

Facebook

Categories

Best Practices (1) Big Data (5) BigData&Hadoop (6) DAG (1) Error 10294 (1) external tables (1) File Formats in Hive (1) Greenplum (3) Hadoop (5) Hadoop Commands (1) Hive (4) Internal tables (1) interview Questions (1) Managed tables (1) MySQL Installation (1) ORCFILE (1) org.apache.hadoop.hive.ql.exec.MoveTask (1) Powershell (1) Problems&Solutions (15) RCFILE (1) return code 1 (1) SEQUENCEFILE (1) Service 'userhome' (1) Service 'userhome' check failed: java.io.FileNotFoundException (1) SQL Server (27) sqoop (2) SSIS (1) TEXTFILE (1) Tez (1) transaction manager (1) Views (1) What is Hadoop (1)

Blog Archive

  • December (1)
  • November (1)
  • October (2)
  • September (6)
  • August (1)
  • July (3)
  • March (1)
  • February (8)
  • January (4)
  • December (9)
  • August (4)
  • July (1)

Popular Tags

  • Best Practices
  • Big Data
  • BigData&Hadoop
  • DAG
  • Error 10294
  • external tables
  • File Formats in Hive
  • Greenplum
  • Hadoop
  • Hadoop Commands
  • Hive
  • Internal tables
  • interview Questions
  • Managed tables
  • MySQL Installation
  • ORCFILE
  • org.apache.hadoop.hive.ql.exec.MoveTask
  • Powershell
  • Problems&Solutions
  • RCFILE
  • return code 1
  • SEQUENCEFILE
  • Service 'userhome'
  • Service 'userhome' check failed: java.io.FileNotFoundException
  • SQL Server
  • sqoop
  • SSIS
  • TEXTFILE
  • Tez
  • transaction manager
  • Views
  • What is Hadoop

Featured Post

TOP 100 SQL SERVER INTERVIEW QUESTIONS

SQL SERVER INTERVIEW QUESTIONS 1.       What is the Complex task that you handled in your project 2.       What are the diffe...

Pages

  • Home
  • SQL SERVER
  • Greenplum
  • Hadoop Tutorials
  • Contact US
  • Disclaimer
  • Privacy Policy

Popular Posts

  • FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
    Error: Resolution:         1.       Check if Your table storage format is ORC         2.      Check if yo...
  • Greenplum System-Useful Queries
    Useful Queries: Query to verify the list of segments in a Greenplum system select * from gp_segment_configuration; R...

Copyright © TECHTalksPro
Designed by Vasu