Professional Documents
Culture Documents
Fully Distributed Node Hadoop Cluster
Fully Distributed Node Hadoop Cluster
Fully Distributed Node Hadoop Cluster
nn.cluster.com
jt.cluster.com
snn.cluster.com
dn1.cluster.com
dn2.cluster.com
dn3.cluster.com
nn
jt
snn
dn1
dn2
dn3
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.51:8020</value>
</property>
</configuration>
~
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
Page 1
file
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoop/data/snn</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>
file
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.52:8021</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/data/mapred/local</value>
</property>
<property>
<value>/home/hadoop/data/mapred/system</value>
</property>
</configuration>
Page 3
file
[hadoop@nn1 ~]$ vim hadoop/conf/masters
192.168.0.53
ON NAMENODE:
-----------[hadoop@nn ~]$ start-dfs.sh
starting namenode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-nameno
de-nn.cluster.com.out
192.168.0.57: starting datanode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datano
de-dn4.cluster.com.out
192.168.0.55: starting datanode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datano
de-dn2.cluster.com.out
192.168.0.56: starting datanode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datano
de-dn3.cluster.com.out
192.168.0.54: starting datanode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datano
de-dn1.cluster.com.out
192.168.0.53: starting secondarynamenode, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-second
arynamenode-snn.cluster.com.out
[hadoop@nn ~]$
Page 4
file
ON JOBTRACKER:
------------[hadoop@nn ~]$ ssh jt
Last login: Sun Jul 12 02:16:12 2015 from 192.168.0.51
[hadoop@jt ~]$ start-mapred.sh
starting jobtracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtra
cker-jt.cluster.com.out
192.168.0.55: starting tasktracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktr
acker-dn2.cluster.com.out
192.168.0.54: starting tasktracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktr
acker-dn1.cluster.com.out
192.168.0.57: starting tasktracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktr
acker-dn4.cluster.com.out
192.168.0.56: starting tasktracker, logging to
/home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktr
acker-dn3.cluster.com.out
ON NAMENODE
----------[hadoop@nn ~]$ for i in {1..7};do ssh 192.168.0.5$i
"hostname;jdk/bin/jps;echo -e '\n'";done
[hadoop@nn ~]$ for i in {1..7};do ssh 192.168.0.5$i
"hostname;jdk/bin/jps;echo -e '\n'";done
nn.cluster.com
2207 Jps
1632 NameNode
jt.cluster.com
1547 JobTracker
Page 5
file
1918 Jps
snn.cluster.com
1658 Jps
1546 SecondaryNameNode
dn1.cluster.com
1406 DataNode
1493 TaskTracker
2545 Jps
dn2.cluster.com
2117 Jps
1614 TaskTracker
1532 DataNode
dn3.cluster.com
1509 DataNode
2141 Jps
1584 TaskTracker
dn4.cluster.com
1593 TaskTracker
2085 Jps
1512 DataNode
[hadoop@nn ~]$
==============================================================
==============================================================
================================
Page 6
file
HADOOP ADMINISTRATOR - ROLES AND RESPONSIBILITIES
1. Plan hadoop cluster
small cluster
medium cluster
large cluster
jobs are IO bound
jobs are CPU bound
build the cluster based upon the storage
capacity, how your data is growing
a. Role assignments
which node will be datanode/namenode/hbase
master/hive metastore server/hue server/client nodes/standby
namenode/resourcemanager
b. Default Tuning
2.
a.
b.
c.
3.
a.
b.
c.
d.
4.
a.
b.
c.
d.
Monitoring
CPUs
Memory
Processes
Hadoop and it's ecosystem components
5.
6.
7.
8.
Page 7