Hadoop

Q 11.
Write a program in eclipse to find Max temperature of

cities in HDFS.
1. Open the eclipse ide present on cloudera.
2. Create an input file with the list of cities and corresponding

temperature.
3. Create a new Java Mapreduce project.

File>New>Project>JavaProject> Then name the project as
“MaxTemp” and click finish.
4. Adding hadoop libraries to our project.Right click on project

MaxTemp>Select “properties”> Click on “java build path”.
Click on Add external jar files>Filesystem>usr>lib>hadoop.
Select All jar files and click OK.
5. We need more external libs. Click on ”add external jars“ again and
select all jar files in “client” and click OK.
6. Create Java Mapper, reducer program. Right click on “src” folder of

MaxTemp. Click on New> Class> In name textbox write “MaxTemp”
and then click Finish.
7. In MaxTemp.java file write below code.
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemp {

public static void main(String[] args) throws Exception {
// Create a new job
@SuppressWarnings("deprecation")
Job job = new Job();
// Set job name to locate it in the distributed environment
job.setJarByClass (MaxTemp.class);
job.setJobName("Max Temperature");
// Set input and output Path, note that we use the default input format
// which is TextInputFormat (each record is a line of input)
FileInputFormat.addInputPath(job, new Path("/home/cloudera/Desktop/TempData"));
FileOutputFormat.setOutputPath(job, new Path("/home/cloudera/Desktop/TempDatal"));
//Set Mapper and Reducer class
job.setMapperClass (MaxTempMapper.class);
job.setCombinerClass (MaxTempReducer.class);
job.setReducerClass (MaxTempReducer.class);
// Set Output key and value
job.setOutputKeyClass (Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
MaxTempMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;|
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTempMapper extends Mapper<Longwritable, Text, Text, IntWritable>{

private IntWritable max = new IntWritable();
private Text word = new Text();
@Override
protected void map (Longwritable key, Text value, Context context) throws IOException,
InterruptedException {
StringTokenizer line = new StringTokenizer (value.toString(),"\t");
word.set(line.nextToken());
max.set(Integer.parseInt(line.nextToken()));
context.write(word,max);
}
}
MaxTempReducer.java
import java.util.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce. Reducer;
public class MaxTempReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

private int max_temp = Integer.MIN_VALUE;
private int temp = 0;
@Override
protected void reduce (Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
Iterator<IntWritable> itr = values.iterator();
while (itr.hasNext()) {
temp itr.next().get();
if temp max temp)
{
max_temp = temp;
}
}
context.write(key, new IntWritable(max_temp));
}
}
8. Run program by clicking on the run button.

9. An output file with maximum temperature will be created
Q12. Unix Commands
1. Make directory
hadoop fs -mkdir /user/input/directory1 /user/input/directory2
2. List the contents of directory

hadoop fs -ls /user/input/directory1
3. Upload file from local file system to Hadoop file system

hadoop fs -put /home/input/sample1.text
/home/input/sample2.txt
4. Download files to local file system

hadoop fs -get <hdfs_source> <local_destination>
hadoop fs -get /user/input/dtrectory1/sample.txt /home/
5. View content of a file

hadoop fs -cat /user/input/directory/sample.txt
6. Copy a file from source to destination
hadoop fs -cp /user/input/directory/sample.txt
/user/input/directory2
7. Copy a file from/To Local file system to HDFS copyFromLocal

Similar to put command, except that the source is restricted to a
local file reference.
Syntax: hadoop fs -copyFromLocal URI
hadoop fs -copyFromLocal /home/input/sample.txt
/user/input/sample.txt
Similar to get command, except that the destination is restricted to a
local file reference.
8. Remove a file or directory in HDFS

Remove files specified as arguments. Deletes directory only when it is
empty.
Syntax: hadoop fs -rm
hadoop fs -rm /user/input/directory/sample.txt
Recursive version of delete.
hadoop fs -rmr /user/input/
9. Display the last few lines of a file.
hadoop fs -tail /user/input/directory/sample.txt
10. Display the aggregate length of a file
hadoop fs -du /user/input/directory/sample.txt
Q 13. Write a Matrix Multiply program in eclipse in HDFS.
1. Open the eclipse ide present on cloudera.
2. Create a new Java Mapreduce project.

File>New>Project>JavaProject> Then name the project as “Matrix”
and click finish.
3. Adding hadoop libraries to our project.Right click on project
Matrix>Select “properties”> Click on “java build path”.
Click on Add external jar files>Filesystem>usr>lib>hadoop.
Select All jar files and click OK.

4. We need more external libs. Click on ”add external jars“ again and
select all jar files and click OK.
5. Create Java Mapper, reducer program. Right click on “src” folder of
Matrix. Click on New> Class> In name textbox write “MatrixDriver”,
”MatrixMapper” and ”MatrixReducer” and then click Finish.
6. In MatrixDriver.java file write below code.
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MatrixDriver

{
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
// M is an m-by-n matrix; N is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "2");
conf.set("p", "2");
Job job = Job.getInstance(conf, "MatrixMultiplication");
job.setJarByClass(MatrixDriver.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.submit();
}
}
MatrixMapper.java
import org.apache.hadoop.conf.*;
public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text>

{
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("M"))
{
for (int k = 0; k < p; k++)
{
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("M," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
else
{
for (int i = 0; i < m; i++)
{
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("N," + indicesAndValue[1] + "," +
indicesAndValue[3]); context.write(outputKey, outputValue);
}
}
}
}
MatrixReducer.java
import java.util.*;
public class MatrixReducer extends Reducer<Text, Text, Text, Text>

{
public void reduce(Text key, Iterable<Text> values, Context context) throws
IOException, InterruptedException
{
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer, Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer, Float>();
for (Text val : values)
{
value = val.toString().split(",");
if (value[0].equals("M"))
{
hashA.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
else
{
hashB.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
}
int n = Integer.parseInt(context.getConfiguration().get("n")); float result =
0.0f;
float a_ij; float b_jk;
for (int j = 0; j < n; j++)
{
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f)
{
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}
7. Export project as jar.Right click on wordcount>select ”export”>Java
>Jar file>Next>.In jar file textbox write
/home/cloudera/Matrix.jar .Click Finish and then OK.
8. View jar file exported .Click Applications> System tools>Terminal

Run command cd/home/cloudera and then ls in terminal.
9. Create input file for Map reduce program. In terminal write vi

M.txt. It will open inputfile.txt in and editor. Press Insert to write in
file. Then to save file click esc and then shift +:Write wq and press
enter. To view text file write cat M.txt
10. To look at hadoop file system write hdfs dfs -ls /
11. To Create a input directory run hdfs dfs -mkdir/input
12. Moving the input file to the hadoop file system hdfs dfs
-put/home/cloudera/M.txt/input/
13. Run mapreduce program by running hadoop jar/usr/lib/hadoop-

mapreduce/hadoop-mapreduce-examples.jar Matrix
/input/M.txt/outputfile.
14. To view the output of the program/job executed run hdfs dfs -ls
outputfile To view the output file run hdfs dfs -cat /outputfile/part-r-
00000

Hadoop

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop

Uploaded by

Copyright:

Available Formats

Q 11.

Write a program in eclipse to find Max temperature of

1. Open the eclipse ide present on cloudera.

2. Create an input file with the list of cities and corresponding

3. Create a new Java Mapreduce project.

4. Adding hadoop libraries to our project.Right click on project

Select All jar files and click OK.

6. Create Java Mapper, reducer program. Right click on “src” folder of

public class MaxTemp {

public class MaxTempMapper extends Mapper<Longwritable, Text, Text, IntWritable>{

public class MaxTempReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

8. Run program by clicking on the run button.

2. List the contents of directory

3. Upload file from local file system to Hadoop file system

4. Download files to local file system

5. View content of a file

7. Copy a file from/To Local file system to HDFS copyFromLocal

8. Remove a file or directory in HDFS

1. Open the eclipse ide present on cloudera.

2. Create a new Java Mapreduce project.

Click on Add external jar files>Filesystem>usr>lib>hadoop.

Select All jar files and click OK.

public class MatrixDriver

public class MatrixMapper extends Mapper<LongWritable, Text, Text, Text>

public class MatrixReducer extends Reducer<Text, Text, Text, Text>

8. View jar file exported .Click Applications> System tools>Terminal

9. Create input file for Map reduce program. In terminal write vi

13. Run mapreduce program by running hadoop jar/usr/lib/hadoop-

You might also like