Hadoop Quick Notes: 2 Clustering of Hadoop Environment

Clustering of Hadoop Environment

Pre-Preparation: Install one Hadoop System on VMware Player/Workstation and clone it for second Node.

We will be creating two node clusters with one Namenode and two Datanode

Open the master VM and go to the hadoop/conf directory
$ cd hadoop/conf

List the contents of the directory to view various configuration files.
$ ls
capacity-scheduler.xml
configuration.xls
core-site.xml
hadoop-env.sh
hadoop-metrics.properties
hadoop-policy.xml
hdfs-site.xml
log4j.properties
mapred-site.xml
masters
slaves
slaves.multi
ssl-client.xml.example
ssl-server.xml.example

Open core-site.xml for editing in vi:
Enter the file system's Namenode IP address with 9000 port.

$ vi core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

</property>

</configuration>

Save and Exit

:wq!

Open hdfs-site.xml for editing in vi:
Enter dfs.replication value as 2.
$ vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

Save and Exit
:wq!

Open mapred-site.xml for editing in vi:
Enter mapred.job.tracker IP address and port 9001.
$ vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property>

<name>mapred.job.tracker</name>

</property>
</configuration>

Save and Exit
:wq!

Open masters file for editing in vi:
Enter IP address of master.
$ vi masters
192.168.197.128

Save and Exit
:wq!

Open slaves file for editing in vi:
Enter IP address of master.
$ vi slaves
192.168.197.128

192.168.197.134

Save and Exit
:wq!

Switch to the slave terminal now:
Check the IP address of the slave
$ ifconfig
...
inet addr:192.168.197.134 Bcast:192.168.197.255 Mask:255.255.255.0
...

Note down and clear the screen
$ clear

Go to the configuration directory: hadoop/conf
$ cd hadoop/conf

List the contents to see various configuration files.
$ ls
capacity-scheduler.xml
configuration.xls
core-site.xml
hadoop-env.sh
hadoop-metrics.properties
hadoop-policy.xml
hdfs-site.xml
log4j.properties
mapred-site.xml
masters
slaves
slaves.multi
ssl-client.xml.example
ssl-server.xml.example

Clear the screen
$ clear

Open core-site.xml for editing in vi:
Enter the file system's Namenode IP address with 9000 port.

$ vi core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

</property>

</configuration>

Save and Exit

:wq!

<name>mapred.job.tracker</name>

</property>
</configuration>

Save and Exit
:wq!

Open masters file for editing in vi:
Enter IP address of master.
$ vi masters
192.168.197.128

Save and Exit
:wq!

Open slaves file for editing in vi:
Enter IP address of master.
$ vi slaves
192.168.197.134

Save and Exit
:wq!

Switch to masters terminal
Generate ssh keys for password less communication between nodes.
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key(/home/hadoop-user/.ssh/id_rsa):
/home/hadoop-user/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter paraphrase (empty for no paraphrase):
Enter same paraphrase again:
Your identification has been saved in /home/hadoop-user/.ssh/id_rsa.
Your public key has been saved in /home/hadoop-user/.ssh/id_rsa.pub
The key finger print is:

00:ba:38:a2:a9:da:33:71:37:8d:56:e3:31:7c:ea:6f hadoop-user@hadoop-desk
[ master@domain ]: cat ~/.ssh/id_rsa >> ~/.ssh/authorized_keys

Copy ssh keys from the master to the slave node.
[ master@domain ]: ssh-copy-id -i $HOME/.ssh/id_rsa
hadoop-user@192.168.197.128

Now trying logging into machine, with "ssh 'hadoop-user @192.168.197.128'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[ slave@domain ]: ssh hadoop-user@192.168.197.128
...
[ master@domain ]:

Exit after login.
On Master Node start all the daemons of hadoop

[ master@domain ]: start-all.sh
......

Hadoop get successfully started on master
On the slave node start all hadoop daemons
[ slave@domain ]: start-all.sh
......

Hadoop get successfully started on slave

On Master list the contents of hdfs

[ master@domain ]: hadoop fs -ls /
......

On Slave list the contents of hdfs
[ slave@domain ]: hadoop fs -ls /
......

You will observe the same content on both master and slave.
Create a sample file.
vi sample.txt
...
:wq!

Copy file to HDFS

[ master@domain ]: hadoop fs -copyFromLocal sample.txt /

Turn safemode to off if prompted.

[ master@domain ]: hadoop dfsadmin -safemode leave

Copy file to HDFS

[ master@domain ]: hadoop fs -copyFromLocal sample.txt /

List the hdfs contents that are to be verified

[ master@domain ]: hadoop fs -ls /
......

You will observe the new file. Cat the contents of the file

[ master@domain ]: hadoop fs -cat sample.txt
......

Switch to Slave Node
List the hdfs contents that are to be verified

[ slave@domain ]: hadoop fs -ls /
......

You will observe the new file. Cat the contents of the file

[ slave@domain ]: hadoop fs -cat sample.txt
......

You will observe the distributed file system is working properly on the two nodes!

Hadoop Quick Notes

Saturday, October 17, 2015

2 Clustering of Hadoop Environment

No comments:

Post a Comment