Từng Bước Chinh Phục Hadoop 2.x (Next Gen or YARN) - Cài Đặt Cluster Multi Node

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Tng bc chinh phc Hadoop 2.

x
(YARN or Next Gen): ci t cluster multi node

0. Thng tin
Phin bn ci t Hadoop 2.2.0
http://mirrors.digipower.vn/apache/hadoop/common/hadoop-2.2.0/hadoop-
2.2.0.tar.gz
Mi trng H iu hnh Ubuntu 13.04
Java Development Kit:
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.13.04.2)
$ sudo apt-get install openjdk-7-jdk
Open SSH-Server
$ sudo apt-get install openssh-server
Lu : T i kho n ng i d ng t i c c m y tr m ph i c ng t n v i nh. ng d n
th m c hdoop t i tt c c c m y ph i giong nh.

1. To ti khon ngi dng
T o t i kho n ng i d ng m i hduser v i ph n qy n Administrtor
System Settings... > User Accounts

2. To SSH key
T t i kho n chnh c mo i m y (kho ng ph i sr hduser v t o) ch y c c l nh trminl
s t o SSH ky cho hduser:
user@TenMay:~$ su hduser
Do ng l nh tip tho d ng t o r RSA ky pir v i pssword ro ng. i n y ngh co v
kho ng n to n nhng b n s kho ng c n ph i nh p pssphrs mo i khi Hdoop tho t c v i
nod n y.
hduser@TenMay:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart image is:
[...snipp...]
hduser@TenMay:~$
Tip tho ch ng t t o r fil authorized_keys cc my trong mng c th thng qua file
ny truy cp vo my tnh ca mnh.
hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys

3. Khai bo a ch IP
Bc ny chng ta s thc hin cc khai bo cc my tnh trong cng mt cluster c th nhn
dng ra nhau.
Ly a ch IP ca tt c cc my v in vo file /etc/hosts theo mu sau:
192.168.32.102 TenMayChinh
192.168.32.103 TenMayTram01
192.168.32.104 TenMayTram02
192.168.32.105 TenMayTram03

4. Kt ni cc my tnh trong cng cluster
Khai bo hduser my chnh c th login vo iu khin cc my trm trong cluster. Bn
s phi thm public key ca my chnh (c lu file $HOME/.ssh/id_rsa.pub) vo file
$HOME/.ssh/authorized_keys cc my trm. Thc hin bng cu lnh sau:
hduser@TenMayChinh:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub
hduser@TenMayTram01
Cui cng t my chnh chng ta s th kt ni vo my chnh v cc my trm. Bc ny gip
lu fingerprint ca cc my trm vo file known_hosts ca hduser ti my chnh.
hduser@TenMayChinh:~$ ssh TenMayChinh
The authenticity of host ' TenMayChinh (192.168.32.102)' can't be
established.
RSA key fingerprint is
3b:21:b3:c0:21:5c:7c:54:2f:1e:2d:96:79:eb:7f:95.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'TenMayChinh' (RSA) to the list of
known hosts.
Linux master 2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007 i686
...
hduser@TenMayChinh:~$ exit

hduser@TenMayChinh:~$ ssh TenMayTram01
The authenticity of host 'TenMayTram01 (192.168.32.103)' can't be
established.
RSA key fingerprint is
74:d7:61:86:db:86:8f:31:90:9c:68:b0:13:88:52:72.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'TenMayTram01' (RSA) to the list of
known hosts.
Ubuntu 10.04
...
hduser@TenMayTram01:~$ exit

Login vo user hduser v thc hin tip cc bc cn li.
5. Bc 5
Gii nn file Hadoop ti v v to th mc yarn ti ng dn /home/hduser/. Copy th mc
Hadoop vo yarn.
Thc hin cc cu lnh sau:
$ cd /home/hduser/yarn
$ sudo chown -R hduser hadoop-2.2.0

6. Thm vo file ~/.bashrc
export HADOOP_HOME=/home/hduser/yarn/hadoop-2.2.0
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

7. Sa nhng file thit lp mi trng trong Hadoop
Xc nh JAVA_HOME bng lnh terminal sau:
$(readlink -f /usr/bin/java | sed "s:bin/java::")
Thm bin JAVA_HOME vo cc file sau:
Thm nhng dng sau vo v tr bt u m lnh trong file libexec/hadoop-config.sh :
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Thm nhng dng sau vo v tr bt u m lnh trong file etc/hadoop/yarn-env.sh :
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
export HADOOP_HOME=/home/hduser/yarn/hadoop-2.2.0
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
Sa li ng dn ty theo mi trng trong my ca bn.

8. Thm thuc tnh vo cc file configuration
C th chnh sa ti mt my ri sau copy ra cc my khc (khng sa li)
$HADOOP_CONF_DIR/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://TenMayChinh:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/yarn/hadoop-2.2.0/tmp</value>
</property>
</configuration>
$HADOOP_CONF_DIR/hdfs-site.xml :
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
$HADOOP_CONF_DIR/mapred-site.xml :
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
$HADOOP_CONF_DIR/yarn-site.xml :
Thuc tnh mapreduce_shuffle trong Hadoop version 2.2.0 thay i so vi cc version
trc t mapreduce.shuffle thnh mapreduce_shuffle
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-
services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
</configuration>

9. To th mc nh dng HDFS trong HADOOP_HOME
To th mc nh khai bo trong core-site.xml thuc tnh hadoop.tmp.dir
$ mkdir -p $HADOOP_HOME/tmp

Cc bc t 10 n 12 ch thc hin trn my chnh
10. Thm tn cc my trm vo file slaves
Thm tn cc my trm vo file $HADOOP_CONF_DIR/slaves trn my chnh:
TenMayTram01
TenMayTram02
TenMayTram03
Nu mun my chnh cng lm mt datanode th thm tn my chnh vo.
Khi khi ng nhng prcess ca Hadoop th ch nhng my c tn trong file slaves mi c
gi process datanode v node manager.

11. Format th mc HDFS thng qua namenode
$ bin/hadoop namenode format

12. Start Hadoop Daemons
$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemons.sh start datanode
$ sbin/yarn-daemon.sh start resourcemanager
$ sbin/yarn-daemons.sh start nodemanager
$ sbin/mr-jobhistory-daemon.sh start historyserver
Lu : Vi datanode v nodemanager, chng ta gi file *-daemons.sh ch khng phi l *-
daemon.sh. V daemon.sh khng c file slaves v do , n ch start cc process trn my
chnh m thi.
13. Check installation
Kim tra jps output trn c my chnh v cc my trm.
Vi my chnh (trng hp my chnh cng lm mt datanode):
$ jps
6539 ResourceManager
6451 DataNode
8701 Jps
6895 JobHistoryServer
6234 NameNode
6765 NodeManager
Nu my chnh khng lm mt datanode th khng c process datanode v nodemanager.
Vi cc my trm:
$ jps
8014 NodeManager
7858 DataNode
9868 Jps
Nu cc process khng chy y nh trn, kim tra file logs trong th mc $HA-
DOOP_HOME/logs xem xt vn .
14. Chy ng dng demo
$ mkdir in
$ cat > in/file
This is one line
This is another one
Thm th mc ny vo HDFS:
$ bin/hadoop dfs -copyFromLocal in /in
Chy v d wordcount c sn:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.2.0.jar wordcount /in /out
Kim tra output:
$ bin/hadoop dfs -cat /out/*
This 2
another 1
is 2
line 1
one 2

15. Web interface
1. http://master:50070/dfshealth.jsp
2. http://master:8088/cluster
3. http://master:19888/jobhistory (Job History Server)

16. Tt Hadoop daemons
$ sbin/mr-jobhistory-daemon.sh stop historyserver
$ sbin/yarn-daemons.sh stop nodemanager
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/hadoop-daemons.sh stop datanode
$ sbin/hadoop-daemon.sh stop namenode

17. Cc li thng gp
Kho ng strt c mo t trong so c c procss, b n ch y ri ng procss o b ng c ch s:
$ cd $HADOOP_HOME
$ bin/hadoop namenode (c th l datanode|nodemanager|resourcemanager)
Tho do i b o lo i hi n r trong trminl, n b o lo i port c s d ng th b n xm c c
port ng bi chim b ng l nh s:
$ netstat -a -t --numeric-ports -p
S o s l i ho c khi b o th m v o fil configrtion tr n.
N dtnod kho ng strt c v tho ng b o lo i l :
FATAL datanode.DataNode: Initialization failed for block pool
Block pool BP-1649797416-127.0.1.1-1388995286952 (storage id DS-
2067046525-127.0.1.1-50010-1388988847634) service to
Ptnhttt07/127.0.1.1:9000
java.io.IOException: Incompatible clusterIDs in
/home/hduser/yarn/hadoop/tmp/dfs/data: namenode clusterID = CID-
7c23fc21-06ac-47eb-9982-949124d3b49d; datanode clusterID = CID-
8ff98657-f0bd-493d-aaec-c34474e492dd
L c n y clusterID v nodeID kho ng tr ng nh, b n co th t t ht c c procss v formt
l i to n bo HDFS ho c chnh s fil VERSION trong th m c $HADOOP_HOME/tmp/nm-
local-dir ro i kh i o ng l i procss.
N lo i l dtnod ho c nodmngr kho ng th kt noi n srvr
TenMayChinh/host:port th b n ch nh s fil /etc/hosts tr n m y chnh nh s:
Commnt l i do ng 127.0.1.1 TenMayChinh b ng d # s o kh i o ng l i c c procss
c Hdoop.

18. Hadoop YARN default
Try c p v o i ch b n d i bit c c tho c tnh m i trong Yrn v gi tri m c inh c
no .
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-
default.xml

You might also like