Fully Distributed Node Hadoop Cluster

file
Fully-distributed node cluster:

1. edit the /etc/hosts file:
192.168.0.51
192.168.0.52
192.168.0.53
192.168.0.54
192.168.0.55
192.168.0.56
nn.cluster.com
jt.cluster.com
snn.cluster.com
dn1.cluster.com
dn2.cluster.com
dn3.cluster.com
nn
jt
snn
dn1
dn2
dn3
[hadoop@nn1 ~]$ vim hadoop/conf/core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.51:8020</value>
</property>
</configuration>
~
[hadoop@nn1 ~]$ vim hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

Page 1
file
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoop/data/snn</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>
[hadoop@nn1 ~]$ vim hadoop/conf/mapred-site.xml

Page 2
file
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.52:8021</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/data/mapred/local</value>
</property>
<property>
<value>/home/hadoop/data/mapred/system</value>
</property>
</configuration>
Page 3
file
[hadoop@nn1 ~]$ vim hadoop/conf/masters
192.168.0.53
[hadoop@nn1 ~]$ vim hadoop/conf/slaves

192.168.0.54
192.168.0.55
192.168.0.56
192.168.0.57
ON NAMENODE:
-----------[hadoop@nn ~]$ start-dfs.sh
starting namenode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-nameno
de-nn.cluster.com.out
192.168.0.57: starting datanode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datano
de-dn4.cluster.com.out
192.168.0.53: starting secondarynamenode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-second
arynamenode-snn.cluster.com.out
[hadoop@nn ~]$
Page 4
file
ON JOBTRACKER:
------------[hadoop@nn ~]$ ssh jt
Last login: Sun Jul 12 02:16:12 2015 from 192.168.0.51
[hadoop@jt ~]$ start-mapred.sh
starting jobtracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtra
cker-jt.cluster.com.out
192.168.0.55: starting tasktracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktr
acker-dn2.cluster.com.out
ON NAMENODE
----------[hadoop@nn ~]$ for i in {1..7};do ssh 192.168.0.5$i
"hostname;jdk/bin/jps;echo -e '\n'";done
[hadoop@nn ~]$ for i in {1..7};do ssh 192.168.0.5$i
"hostname;jdk/bin/jps;echo -e '\n'";done
nn.cluster.com
2207 Jps
1632 NameNode
jt.cluster.com
1547 JobTracker
Page 5
file
1918 Jps
snn.cluster.com
1658 Jps
1546 SecondaryNameNode
dn1.cluster.com
1406 DataNode
1493 TaskTracker
2545 Jps
dn2.cluster.com
2117 Jps
1614 TaskTracker
1532 DataNode
dn3.cluster.com
1509 DataNode
2141 Jps
1584 TaskTracker
dn4.cluster.com
1593 TaskTracker
2085 Jps
1512 DataNode
[hadoop@nn ~]$
==============================================================
==============================================================
================================
Page 6
file
HADOOP ADMINISTRATOR - ROLES AND RESPONSIBILITIES
1. Plan hadoop cluster
small cluster
medium cluster
large cluster
jobs are IO bound
jobs are CPU bound
build the cluster based upon the storage
capacity, how your data is growing
a. Role assignments
which node will be datanode/namenode/hbase
master/hive metastore server/hue server/client nodes/standby
namenode/resourcemanager
b. Default Tuning
2.
a.
b.
c.
Install the cluster

How the cluster is designed
Bandwidth management
Balancing of your datanodes
3.
a.
b.
c.
d.
Management of the cluster

shell scripts to balance the datanodes
housekeeping
make the cluster HA and fault tolerant
Comm and decomm of datanodes
4.
a.
b.
c.
d.
Monitoring
CPUs
Memory
Processes
Hadoop and it's ecosystem components
5.
6.
7.
8.
Upgrades of Hadoop and it's ecosystem components

Helping the developers to run their jobs
Performance tuning
Security and user management
Page 7

Fully Distributed Node Hadoop Cluster

Uploaded by

Copyright:

Available Formats

You might also like

Fully Distributed Node Hadoop Cluster

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fully Distributed Node Hadoop Cluster

Uploaded by

Copyright:

Available Formats

file

Fully-distributed node cluster:

[hadoop@nn1 ~]$ vim hadoop/conf/core-site.xml

[hadoop@nn1 ~]$ vim hadoop/conf/hdfs-site.xml

[hadoop@nn1 ~]$ vim hadoop/conf/mapred-site.xml

[hadoop@nn1 ~]$ vim hadoop/conf/slaves

Install the cluster

Management of the cluster

Upgrades of Hadoop and it's ecosystem components

You might also like