Saturday, October 17, 2015

2 Clustering of Hadoop Environment

Clustering of Hadoop Environment

Pre-Preparation: Install one Hadoop System on VMware Player/Workstation and clone it for second Node.

We will be creating two node clusters with one Namenode and two Datanode

Open the master VM and go to the hadoop/conf directory
$ cd  hadoop/conf

List the contents of the directory to view various configuration files.
$ ls
capacity-scheduler.xml
configuration.xls
core-site.xml
hadoop-env.sh
hadoop-metrics.properties
hadoop-policy.xml
hdfs-site.xml
log4j.properties
mapred-site.xml
masters
slaves
slaves.multi
ssl-client.xml.example
ssl-server.xml.example

Open core-site.xml for editing in vi:
Enter the file system's Namenode IP address with 9000 port.
$ vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file -->
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://192.168.197.128:9000</value>
    </property>
</configuration>

Save and Exit
:wq!

Open hdfs-site.xml for editing in vi:
Enter dfs.replication value as 2.
$ vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
</configuration>

Save and Exit
:wq!

Open mapred-site.xml for editing in vi:
Enter mapred.job.tracker IP address and port 9001.
$ vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>hdfs://192.168.197.128:9001</value>
    </property>
</configuration>

Save and Exit
:wq!

Open masters file for editing in vi:
Enter IP address of master.
$ vi masters
192.168.197.128

Save and Exit
:wq!

Open slaves file for editing in vi:
Enter IP address of master.
$ vi slaves
192.168.197.128
192.168.197.134

Save and Exit
:wq!

Switch to the slave terminal now:
Check the IP address of the slave
$ ifconfig
...
inet addr:192.168.197.134 Bcast:192.168.197.255 Mask:255.255.255.0
...

Note down and clear the screen
$ clear

Go to the configuration directory: hadoop/conf
$ cd hadoop/conf

List the contents to see various configuration files.
$ ls
capacity-scheduler.xml
configuration.xls
core-site.xml
hadoop-env.sh
hadoop-metrics.properties
hadoop-policy.xml
hdfs-site.xml
log4j.properties
mapred-site.xml
masters
slaves
slaves.multi
ssl-client.xml.example
ssl-server.xml.example

Clear the screen
$ clear

Open core-site.xml for editing in vi:
Enter the file system's Namenode IP address with 9000 port.
$ vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file -->
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://192.168.197.128:9000</value>
    </property>
</configuration>

Save and Exit
:wq!

Open hdfs-site.xml for editing in vi:
Enter dfs.replication value as 2.
$ vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
</configuration>

Save and Exit
:wq!

Open mapred-site.xml for editing in vi:
Enter mapred.job.tracker IP address and port 9001.
$ vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>hdfs://192.168.197.128:9001</value>
    </property>
</configuration>

Save and Exit
:wq!

Open masters file for editing in vi:
Enter IP address of master.
$ vi masters
192.168.197.128

Save and Exit
:wq!

Open slaves file for editing in vi:
Enter IP address of master.
$ vi slaves
192.168.197.134

Save and Exit
:wq!

Switch to masters terminal
Generate ssh keys for password less communication between nodes.
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key(/home/hadoop-user/.ssh/id_rsa):
/home/hadoop-user/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter paraphrase (empty for no paraphrase):
Enter same paraphrase again:
Your identification has been saved in /home/hadoop-user/.ssh/id_rsa.
Your public key has been saved in /home/hadoop-user/.ssh/id_rsa.pub
The key finger print is: 
00:ba:38:a2:a9:da:33:71:37:8d:56:e3:31:7c:ea:6f hadoop-user@hadoop-desk
[ master@domain ]: cat ~/.ssh/id_rsa >> ~/.ssh/authorized_keys

Copy ssh keys from the master to the slave node.
[ master@domain ]: ssh-copy-id -i $HOME/.ssh/id_rsa
hadoop-user@192.168.197.128 


Now trying logging into machine, with "ssh 'hadoop-user @192.168.197.128'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[ slave@domain ]: ssh hadoop-user@192.168.197.128 
...
[ master@domain ]:

Exit after login.
On Master Node start all the daemons of hadoop
[ master@domain ]: start-all.sh
......

Hadoop get successfully started on master
On the slave node start all hadoop daemons
[ slave@domain ]: start-all.sh
......

Hadoop get successfully started on slave


On Master list the contents of hdfs
[ master@domain ]: hadoop fs -ls /
......

On Slave list the contents of hdfs
[ slave@domain ]: hadoop fs -ls /
......

You will observe the same content on both master and slave.
Create a sample file.
vi  sample.txt
...
:wq!
Copy file to HDFS
[ master@domain ]: hadoop fs -copyFromLocal sample.txt /

Turn safemode to off if prompted.
[ master@domain ]: hadoop dfsadmin -safemode leave

Copy file to HDFS
[ master@domain ]: hadoop fs -copyFromLocal sample.txt /

List the hdfs contents that are to be verified
[ master@domain ]: hadoop fs -ls /
......

You will observe the new file. Cat the contents of the file
[ master@domain ]: hadoop fs -cat sample.txt
......

Switch to Slave Node
List the hdfs contents that are to be verified
[ slave@domain ]: hadoop fs -ls /
......

You will observe the new file. Cat the contents of the file
[ slave@domain ]: hadoop fs -cat sample.txt
......

You will observe the distributed file system is working properly on the two nodes!

No comments:

Post a Comment