Professional Documents
Culture Documents
RHadoop2.0.2u2 Installation Configuration For RedHat
RHadoop2.0.2u2 Installation Configuration For RedHat
RHadoop2.0.2u2 Installation Configuration For RedHat
Revolution R, Revolution R Enterprise, and Revolution Analytics are trademarks of Revolution Analytics.
All other trademarks are the property of their respective owners.
Copyright 2012-2013 Revolution Analytics, Inc. All Rights Reserved.
Contents
Overview ......................................................................................................................................... 4
System Requirements ..................................................................................................................... 5
Software Dependencies .................................................................................................................. 5
Typical Configurations for RHadoop ............................................................................................... 6
Basic Hadoop Configuration ....................................................................................................... 6
Basic Hadoop Configuration + Revolution R Enterprise + rmr2 package ................................... 7
Basic Hadoop Configuration + Revolution R Enterprise + rhdfs package ................................... 8
Basic Hadoop Configuration + Revolution R Enterprise + rhbase package ................................ 8
Installation ...................................................................................................................................... 9
Installing as Root vs. Non-Root ................................................................................................... 9
Installation Revolution R Enterprise and rmr2 on all nodes ..................................................... 10
Using a script to Install Revolution R Enterprise and rmr2 on all nodes .................................. 12
Installation of rhdfs ................................................................................................................... 13
Installation of rhbase ................................................................................................................ 14
Using a script to Install rhdfs and rhbase ................................................................................. 16
Testing to be sure the packages are configured and working ...................................................... 17
Overview
RHadoop is a collection of three R packages that allow users to manage and analyze data
with Hadoop.
Package
Description
rhdfs
rhbase
rmr2
System Requirements
Before installing, verify that the machine on which you will install has the following:
Operating System. Red Hat Enterprise Linux 5.4, 5.5, 5.6, 5.7,5.8, 6.0,
6.1,6.2,6.3 (64-bit processors);
Software Dependencies
In order to properly install and configure the RHadoop, a set of supported software
dependencies must be installed first
Dependency and Version
Revolution R Enterprise 6.2
(and all of its dependencies)
HBase**
Configuration A
Configuration B
Configuration A
Configuration B
Configuration A
Configuration B
rhbase
Apache Thrift
Hbase Master
Thrift Server
Configuration A
rhbase
Apache Thrift
Hbase Master
Thrift Server
Edge Node
Configuration B
Installation
The RHadoop packages can be installed either manually or via a shell script. Both
methods are described in this section. However, the commands listed in the shell script
are to be used for guidance only, and should be adapted to standards of your IT
department.
Revo-Ent-6.2.0-RHEL5.tar.gz or Revo-Ent-6.2.0-RHEL6.tar.gz
RHadoop-2.0.2u2.tar.gz
prompt, type:
tar -xzf Revo-Ent-6.2.0-RHEL5.tar.gz
Note: If installing on Red Hat Enterprise Linux 5.x, replace RHEL6 with RHEL5
in the previous tar command.
3. Change directory to the versioned Revolution directory. At the prompt, type:
cd RevolutionR_6.2.0
4.
Because rmr2 uses the Rscript executable, and MapReduce jobs typically
run as their own user, youll need to create a symbolic link from Rscript
to a location that is in the PATH. Example of symbolic link:
ln -s /home/users/<user>/local/bin/Rscript /usr/bin
5. Unpack the contents of the RHadoop installation bundle. At the prompt, type:
cd ..
10
8. Update the environment variables needed by rmr2. The values for the
Important!
These environment variables only need to be set on the nodes that are
invoking the rmr2 MapReduce jobs (i.e. an Edge node as described earlier in this
document). If you dont know which nodes will be used, then set these variables
on each node. Also, it is recommended to add these environment variables to the
file /etc/profile so that they will be available to all users.
11
12
Installation of rhdfs
Important!
Important!
This environment variable only needs to be set on the nodes that are
using the rhdfs package (i.e. an Edge node as described earlier in this document).
Also, it is recommended to add this environment variable to the file /etc/profile
so that it will be available to all users.
3. Install rhdfs. At the prompt, type:
R CMD INSTALL rhdfs_1.0.5.tar.gz
13
Installation of rhbase
1. Install Apache Thrift
Important!
rhbase requires Apache Thrift Server. If you do not have thrift already
configured and installed, you will need to build and install Apache Thrift.
Reference web site: http://thrift.apache.org/
2. Install the dependencies for Thrift. At the prompt, type
yum -y install automake libtool flex bison pkgconfig gcc-c++
boost-devel libevent-devel zlib-devel python-devel ruby-devel
openssl-devel
Important!
5. Build the thrift library. We only need the C++ interface of Thrift, so we build
without ruby or python . At the prompt type the following two commands
./configure --without-ruby --without-python
make
Important!
14
15
#install rhbase
cd ..
R CMD INSTALL rhbase_1.1.tar.gz
16
2. Load and initialize the rmr2 package, and execute some simple commands
At the R prompt, type the following commands: (Note: the > symbol in the
following code is the R prompt and should not be typed.)
> library(rmr2)
> from.dfs(to.dfs(1:100))
> from.dfs(mapreduce(to.dfs(1:100)))
17
At the R prompt, type the following commands: (Note: the > symbol in the
following code is the R prompt and should not be typed.)
> library(rhdfs)
> hdfs.init()
> hdfs.ls("/")
At the R prompt, type the following commands: (Note: the > symbol in the
following code is the R prompt and should not be typed.)
> library(rhbase)
> hb.init()
> hb.list.tables()
18
5. Using the standard R mechanism for checking packages, you can verify that your
Be aware that running the tests for the rmr2 package may take a
significant time (hours) to complete
R CMD check rmr2_2.0.2.tar.gz
R CMD check rhdfs_1.0.5.tar.gz
R CMD check rhbase_1.1.tar.gz
If any error occurs, refer to the trouble shooting information in the previous
sections:
Note: errors referring to missing package pdflatex can be ignored
Error in texi2dvi("Rd2.tex", pdf = (out_ext == "pdf"), quiet =
FALSE, :
pdflatex is not available
Error in running tools::texi2dvi
19