CDH3 Pseudo Installation On Ubuntu

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CDH3 Pseudo installation on Ubuntu

1) Do not create username as hadoop as you will have issues in installation.


2) Install Java
Copy the java jdk on to desktop.
$ sudo cp jdk-6u30-linux-x** /usr/local
$ cd /usr/local
$ sudo sh jdk-6u30-linux-x**
3) Install CDH3 package
Go to - http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH3/CDH3u6/CDH3Installation-Guide/CDH3-Installation-Guide.html
Click on Installing CDH3 on Ubuntu and Debian Systems
Click on - this link for a Maverick system
Install using GDebi package installer or save it and issue the command below
$ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb
$ sudo apt-get update
4) Install Hadoop
$ apt-cache search hadoop - Must show all available Hadoop Packages
$ sudo apt-get install hadoop-0.20 hadoop-0.20-native
sudo apt-get install hadoop-0.20-<daemon type> install all Daemons
5) Set Java and Hadoop Home
Using command:

gedit ~/.bashrc

# Set Hadoop-related environment variables


export HADOOP_HOME=/usr/lib/hadoop

export PATH=$PATH:/usr/lib/hadoop/bin

# Set JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.6.0_30
export PATH=$PATH:/usr/local/jdk1.6.0_30/bin
close terminals and open new one and test
echo $JAVA_HOME
echo $HADOOP HOME
6) Adding dedicated users hdfs and mapred to hadoop group

$ sudo gpasswd -a hdfs hadoop


$ sudo gpasswd -a mapred hadoop
7) Configuration
$ cd /usr/lib/hadoop/conf
Set Java Home in hadoop-env.sh
$ sudo gedit hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.6.0_30

8) core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/lib/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
$ sudo mkdir /usr/lib/hadoop/tmp
$ sudo chmod 750 /usr/lib/hadoop/tmp/
$ sudo chown hdfs:hadoop /usr/lib/hadoop/tmp/
9) hdfs-site.xml

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/storage/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/storage/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
$ sudo mkdir /storage

$ sudo chmod 775 /storage/


$ chown hdfs:hadoop /storage/

10) mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:8021</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/mapred/local</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/mapred/temp</value>

</property>
$ sudo mkdir /mapred

$ sudo chmod 775 /mapred


$ sudo chown mapred:hadoop /mapred
11) User Assignment

export HADOOP_NAMENODE_USER=hdfs
export HADOOP_SECONDARYNAMENODE_USER=hdfs
export HADOOP_DATANODE_USER=hdfs
export HADOOP_JOBTACKER_USER=mapred
export HADOOP_TASKTRACKER_USER=mapred

12) Format namenode


$ cd /usr/lib/hadoop/bin/

$ sudo -u hdfs hadoop namenode -format


You must get a successfully formatted message. Otherwise, check the error and correct it.
13) Start Daemons
$ sudo /etc/init.d/hadoop-0.20-namenode start
$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start
$ sudo /etc/init.d/hadoop-0.20-jobtracker start
$ sudo /etc/init.d/hadoop-0.20-datanode start
$ sudo /etc/init.d/hadoop-0.20-tasktracker start

Check for any errors in /var/log/hadoop-0.20 for each daemon


check all ports are opened using $netstat -ptlen
14) Check UI
localhost:50070 - Hadoop Admin
localhost:50030 - Mapreduce

You might also like