Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Distributed and Cloud Computing 1st

Edition Hwang Solutions Manual


Visit to Download in Full: https://testbankdeal.com/download/distributed-and-cloud-co
mputing-1st-edition-hwang-solutions-manual/
Solutions to Homework Problems in Chapter 6
Hwang, Fox and Dongarra: Distributed and Cloud Computing,
Morgan Kaufmann Publishers, copyrighted 2012
Note: The solutions of Chapter 6 problems were assisted by graduate students from
Indiana University under the supervision of Dr. Judy Qiu:

Problem 6.1:
Get the source code from: http://dl.dropbox.com/u/12951553/bookanswers/answer6.1.zip

(a). We implemented a demo system, which is quite simple in its functionality: there’s a search
box used to find contacts, and once a contact has been found, we list recent emails and
attachments associated with the contact. To do this, the application offers 3 urls that are
called by the JavaScript running in the browser to obtain the data: search.json,
messages.json and files.json.
How the system respond to the request to get message history for a given contact
is done by calling /messages.json which accepts an email address as a GET parameter.
Note, this functionality requires an authentication step not shown here. The code behind
that call is as follows:

class MessagesHandler(webapp.RequestHandler):
def get(self):
current_user = users.get_current_user()
current_email = current_user.email()

emailAddr = self.request.get('email')
contextIO = ContextIO(api_key=settings.CONTEXTIO_OAUTH_KEY,
api_secret=settings.CONTEXTIO_OAUTH_SECRET,
api_url=settings.CONTEXTIO_API_URL)
response = contextIO.contactmessages(emailAddr,account=current_email)
self.response.out.write(simplejson.dumps(response.get_data()))

The code simply uses the contactmessages.json API call of and returns all the messages
including the subject, other recipients, thread ID, and even attachments in JSON format.
The complete code for this demo application has been made available by the Context.IO team
on our GitHub account (https://github.com/contextio/AppEngineDemo).
This answer is based on the Google App Engine Blog Post at
http://googleappengine.blogspot.com/2011/05/accessing-gmail-accounts-from-app.html.

(b). The dashboard of Google App Engine provides measurement on useful aspects of the
deployed application. For example, execution logs, version control, quota details, datastore
viewer, administration tools. It also provides detailed resource usage information as the
following:

6-1
Critical measurement can be easily retrieved from this powerful dashborad.

(c) . Automatic scaling is built in with App Engine, and it’s not visible to users.
http://code.google.com/appengine/whyappengine.html#scale

6-2
Problem 6.2:
Get the source code: http://dl.dropbox.com/u/12951553/bookanswers/answer6.2.zip

Here we design a very simple data storage system using the Blobstore service to
illustrate how Google App Engine handles data. The Blobstore API allows your application to
serve data objects, called blobs, that are much larger than the size allowed for objects in the
Datastore service. Blobs are useful for serving large files, such as video or image files, and for
allowing users to upload large data files. Blobs are created by uploading a file through an HTTP
request.

Typically, your applications will do this by presenting a form with a file upload field to the
user. When the form is submitted, the Blobstore creates a blob from the file's contents and
returns an opaque reference to the blob, called a blob key, which you can later use to serve the
blob. The application can serve the complete blob value in response to a user request, or it can
read the value directly using a streaming file-like interface. This system includes the following
functions: user login, data listing, data upload/download. Gzip compression is used when
possible to decrease the cost.

User login: This function is implemented using the User Service provided in GAE. If the user
is already signed in to your application, get_current_user() returns the User object for the user.
Otherwise, it returns None. If the user has signed in, display a personalized message, using the
nickname associated with the user's account. If the user has not signed in, tell webapp to
redirect the user's browser to the Google account sign-in screen. The redirect includes the URL
to this page (self.request.uri) so the Google account sign-in mechanism will send the user back
here after the user has signed in or registered for a new account.

user = users.get_current_user()
if user:
self.response.headers['Content-Encoding'] = 'gzip'
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('Hello, ' + user.nickname())
self.response.out.write('<a href=' + users.create_logout_url("/") +'>sign out</a><br/>');
else:
self.redirect(users.create_login_url(self.request.uri))

The content is gzip compressed when sent back from the server. Also, a log out link is provided.

Data listing: To list the data uploaded by a specific user, the GQL is used to guarantee users
can only see/access data belongs to him.

class Blob(db.Model):
"""Models a data entry with an user, content, name, size, and date."""
user = db.UserProperty()
name = db.StringProperty(multiline=True)
content = blobstore.BlobReferenceProperty(blobstore.BlobKey)
date = db.DateTimeProperty(auto_now_add=True)
size = db.IntegerProperty()

6-3
This defines a data blob class with five properties: user whose value is a User object, name
whose value is a String, content whose value is a BlobKey pointed to this blob, date whose
value is datetime.datetime, and size whose value is an Integer. GQL, a SQL-like query
language, provides access to the App Engine datastore query engine's features using a familiar
syntax. The query happens here:

blobs = db.GqlQuery("SELECT * "


"FROM Blob "
"WHERE user = :1", user)
This can return all blobs uploaded by this user.

Data upload: To create and upload a blob, follow this procedure:


Call blobstore.create_upload_url() to create an upload URL for the form that the user will fill
out, passing the application path to load when the POST of the form is completed:

upload_url = blobstore.create_upload_url('/upload')
There is an asynchronous version, create_upload_url_async(). It allows your application
code to continue running while Blobstore generates the upload URL.
The form must include a file upload field, and the form's enctype must be set to multipart
/form data. When the user submits the form, the POST is handled by the Blobstore API, which
creates the blob. The API creates an info record for the blob and stores the record in the
datastore, and passes the rewritten request to your application on a given path as a blob key:

self.response.out.write('<html><body>')
self.response.out.write('<form action="%s" method="POST" enctype="multipart/form-data">' %
upload_url)
self.response.out.write("""Upload File: <input type="file" name="file"><br> <input type="submit"
name="submit" value="Submit"> </form></body></html>""")

• In this handler, you can store the blob key with the rest of your application's data model.
The blob key itself remains accessible from the blob info entity in the datastore. Note that
after the user submits the form and your handler is called, the blob has already been
saved and the blob info added to the datastore. If your application doesn't want to keep
the blob, you should delete the blob immediately to prevent it from becoming orphaned:

class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
try:
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0]
myblob = Blob()
myblob.name = blob_info.filename
myblob.size = blob_info.size
myblob.user = users.get_current_user()
myblob.content = blob_info.key()
myblob.put()
self.redirect('/')
except:

6-4
self.redirect('/')

• The webapp framework provides


the blobstore_handlers.BlobstoreUploadHandler upload handler class to help you parse
the form data. For more information, see the reference for BlobstoreUploadHandler.
• When the Blobstore rewrites the user's request, the MIME parts of the uploaded files
have their bodies emptied, and the blob key is added as a MIME part header. All other
form fields and parts are preserved and passed to the upload handler. If you don't specify
a content type, the Blobstore will try to infer it from the file extension. If no content type
can be determined, the newly created blob is assigned content type application/octet-
stream.

Data download: To serve blobs, you must include a blob download handler as a path in your
application. The application serves a blob by setting a header on the outgoing response. The
following sample uses the webapp framework. When using webapp, the handler should pass
the blob key for the desired blob to self.send_blob(). In this example, the blob key is passed to
the download handler as part of the URL. The download handler can get the blob key by any
means you choose, such as through another method or user action.

class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.send_blob(blob_info)
The webapp framework provides the download handler class blobstore_handlers.
BlobstoreDownloadHandler to help you parse the form data. For more information, see the
reference for BlobstoreDownloadHandler. Blobs can be served from any application URL. To
serve a blob in your application, you put a special header in the response containing the blob
key. App Engine replaces the body of the response with the content of the blob.

Problem 6.3:
Source code: http://dl.dropbox.com/u/12951553/bookanswers/answer6.3.zip

For this question, we provided a JAVA SimpleDB application with all critical functions like
domain creation, data insertion, data edition, data deletion, and domain deletion. With these
functions demonstrate how to make basic requests to Amazon SimpleDB using the AWS SDK
for Java. The reader can easily scale this application up to meet the requirements from the
question.

Prerequisites: You must have a valid Amazon Web Services developer account, and be signed
up to use Amazon SimpleDB. For more information on Amazon SimpleDB, please refer to

http://aws.amazon.com/simpledb

http://aws.amazon.com/security-credentials

Problem 6.4:

6-5
Now, design and request an EC2 configuration on the AWS platform for parallel
multiplication of two very large matrices with an order exceeding 50,000.

Source code : http://156.56.93.128/PBMS/doc/answer6.4.zip

The parallel matrix multiplication is implemented using Hadoop 0.20.205, and experiments are
performed on Amazon EC2 platform with sample matrices between orders of 20,000 and
50,000. Steps to implement parallel matrix multiplication using Hadoop is as follows:

1) Split Matrix A and Matrix B into two grid of n*n blocked matrices. There will be 2*n*n
Map tasks, and n*n Reduce tasks.
2) Each Map task holds either A[p][q] or B[p][q] and then sends it to ‘n’ Reduce tasks
r[p][1<i<n], or r[1< j<n][q] respectively.
3) Each Reduce task r[p][q] receive 2*n sub-matrices which include A[p][1<i<n], and
B[q][1<j<n] from Map tasks, then Reduce task multiply A[p][1<i<n] to B[q][1<j<n], then
sum them up.

The advantages of this algorithm are: 1) splitting large matrix into small sub-matrices such
that working memory of sub-matrices can be fit in memory of small EC2 instance. 2) many small
tasks increase the application parallelism. The disadvantages include the parallel overhead in
terms of scheduling, communication, and sorting caused by many tasks.

EC2 configuration
In the experiments, we use instance type: EMR, M1.small: 1.7GB memory, 1core per node. We
created four instances group with 1, 2, 4, 8, 16 nodes respectively. One should note that
Hadoop jobtracker and namenode take one node for dedicated usage for the 2,4,8,16 nodes
cases.
Steps:
a. ./elastic-mapreduce --create --instance-count 16 –alive (apply resource)
b. ./elastic-mapreduce --jobflow j-22ZM5UUKIK69O –ssh (ssh to master node)
c. ./ s3cmd get s3://wc-jar/ matrix-multiply-hadoop.jar (download program jar file)
d. ./s3cmd get s3://wc-input/matrix-50k-5k ./50k-5k (download input data)
e. Hadoop dfs –put 50k-5k/* 50k-5k (upload data to HDFS)
f. Hadoop jar matrix-multiply-hadoop.jar 50k-5k output 50000 5000 10 (run
program)

Analysis
Figure1,2,3,4 show that our parallel matrix multiply implementation can scale well in EC2
especially for large matrices. For example, the relative speed-up for processing 20k,30k,40k,50k
data are 4.43, 7.75, 9.67, 11.58 respectively when using 16 nodes. The larger the matrices
sizes are, the better the parallel efficiency the application have. (The reason why performance
using two nodes is only a little faster than one node case is because the jobtracker and
tasktracker were run on separate nodes).
Other issues in the experiments:
Storage utilization: data size are 16GB+36GB+64GB+100GB for 20k, 30k,40k,50k data sets
respectively, and there are 216GB data in total. The total costs for the experiments are input
data transfer in: $0.1*216GB = $21.6; EC2 instances: M1.small, 290hours*$0.08/hour = $23.2.
System metric, such as resource utilization: using “CloudWatch” in AWS Management Console.
Fault tolerance, see answer for problem 4.10.

Experiments results

6-6
Figure 1:Parallel Matrix Multiply for 20K Figure 2:Parallel Matrix Multiply for 30K

Figure3:Parallel Matrix Multiply for 40K Figure4:Parallel Matrix Multiply for 50K

Problem 6.5:
We implemented the parallel matrix multiply application using EMR and S3 on AWS
platform. The basic algorithm and configuration are as the same as in problem 6.4. The only
difference is that in problem 6.6, Hadoop retrieve the input data from S3 rather than HDFS in
problem 6.4.

Analysis
Figure1,2,3,4 show that the parallel matrix multiply can scale well in EMR/S3 environment
especially for large matrices. The relative speed-up of processing 20k,30k,40k,50k data are
7.24, 12.3, 16.4, 19.39 respectively when using 16 nodes. The super-linear speedup results
were mainly caused by serious network contention when using single node to retrieve input data
from S3. As compared to results using HDFS in problem 6.4, the results of 20k, 30k, 40k, 50k
data sets using S3 on 16 nodes are 1.3, 1.65, 1.67, 1.66 times slower in job turnaround time
respectively. The results using fewer nodes are even much slower. For example, the results of
50k data using S3 using 2 nodes are 2.19 times slower than HDFS case. These results indicate
the big overhead when using Hadoop retrieves input data from S3. In figure 5, we show the
average speed of transferring data from S3 to EC2 instance is 8.54MB/sec. The detailed
algorithm, configuration and analysis of other issues such as speedup, cost-efficiency see
answers in problem 6.4.

Performance Results:

6-7
Figure 1:Parallel Matrix Multiply for 20K Figure 2: Parallel Matrix Multiply for 30K

Figure3: Parallel Matrix Multiply for 40K Figure4:Parallel Matrix Multiply for 50K

Figure 5: S3 data transferring speed

Problem 6.6:
Outline of Eli Lilly cloud usage

Eli Lilly uses cloud computing in its research area of the company. In silico analyses is a
large part of the research process for the pharmaceutical industry, and Eli Lilly is no exception.
Cloud computing provides Lilly the ability for bursting capabilities when its internal compute
environment is being utilized. Additionally, Eli Lilly relies on cloud computing for analyses on
public datasets, where there is little to no concern on intellectual property or security. By running
these analyses outside of its primary data centers, the company can free up internal resources
for high performance computing and high throughput computing workflows that either may not fit
well in the cloud or the analyses are considered more proprietary or regulated.

6-8
As of 2009, Eli Lilly was mainly using Amazon Web Services cloud, but have plans for
using many more cloud vendors in the future, requiring an orchestration layer between Eli Lily
and the various cloud services. According to Eli Lilly, a new server in AWS can be up and
running in three minutes compared to the seven and a half weeks it take to deploy a server
internally. A 64-node AWS Linux cluster can be online in five minutes compared with three
months it takes to set such a cluster internally.

One of the main drivers for Lilly to use the cloud is to speed development efforts through
the drug pipeline more quickly. If analyses can be done in a fraction of the time because of the
scale of the cloud then thousands of dollars spent on utility computing to speed up the pipeline
can generate millions of dollars of revenue in a quicker timeframe.

Sources:
http://www.informationweek.com/news/hardware/data_centers/228200755
http://www.informationweek.com/news/healthcare/clinical-systems/227400374
http://www.informationweek.com/cloud-computing/blog/archives/2009/01/whats_next_in_t.html

Problem 6.7:
The source codes of this application can be obtained from the following link:
http://dl.dropbox.com/u/27392330/forCloudBook/AzureTableDemo-gaoxm.zip .
Using the Azure SDK for Microsoft Visual Studio, we developed a simple web application as
shown in the following Figure. This application is extended from the Azure Table demo made by
Nancy Strickland (http://www.itmentors.com/code/2011/03/AzureUpdates/Tables.zip),
It can be used to demonstrate the application of Windows Azure Table, and to finish some
simple performance tests of Windows Azure Table. A Web role is created for this application,
which accesses the Windows Azure Table service from the Web server side. When the "Add
Customer" button is clicked, a new entity will be created and inserted in to an Azure table.
When the "Query Customer" button is clicked, the table is queried with the customer code and
the customer's name will be shown after "Name". And when proper values are set in the
"number of rows", "batch size", and "start rowkey" boxes, users can click the different "test"
buttons to complete different performance tests for Windows Azure Table.
Besides the local version, we also tried to deploy the application on a virtual machine in the
Azure cloud. Some experiences we got from writing and deploying this application are:
1. The concept and separation of "Web role", "VM role" and "Worker role" during development
are not straightforward to understand, and it takes some time to learn how to develop Azure
applications.

2. Users cannot remotely login to VMs by default. It takes some special configurations. Besides,
the security restrictions on VMs make it hard to operate the VMs. For example, almost all
websites are marked as "untrusted" by IE in the VMs, which makes it very hard to even
download something using the browser.

3. The SDK for Microsoft Visual Studio is powerful. The integration of the debugging and
deployment stages in Visual Studio is very convenient and easy to use. However, the
deployment process takes a long time, and it is hard to diagnose what is wrong if the
deployment fails.

6-9
4. Overall, we think the Amazon EC2 models and Amazon Web Services are easier to
understand and closer to developers' current experience

Figure 4. A simple Windows Azure Web application using Azure Table

Figure 5. Read and write speed for Windows Azure Table

Problem 6.8:
In Map-Reduce Programming model, there is a special case with implementing only the
map phase, which is also known as “map-only” problem. This achievement can enhance
existing application/binary to have high throughput with running them in parallel fashion; in other
word, it helps standalone program to utilize the large scale computing capability. The goal of
this exercise is to write a Hadoop “map-only” program with a bioinformatics application BLAST
(NCBI BLAST+: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.23/) under a Linux/Unix
environment.

6 - 10
Source code: http://dl.dropbox.com/u/12951553/bookanswers/feiteng_blast.zip

For details usage about the source code, please refer to


http://salsahpc.indiana.edu/tutorial/hadoopblast.html.

Problem 6.9:
This problem is research-oriented. Visit the posted Manjrasoft Aneka Software web site
for details and example Solutions.

Problem 6.10:
Repeat applications in Problems 6.1 to 6.7 using the academic/open source packages
described in Section 6.6 namely Eucalyptus, Nimbus, OpenStack, OpenNebula, Sector/Sphere.
This software is all available on FutureGrid http://www.futureGrid.org with a number of tutorials.

FutureGrid Tutorials - https://portal.futuregrid.org/tutorials


Using Eucalyptus on FutureGrid - https://portal.futuregrid.org/tutorials/eucalyptus
Using Nimbus on FutureGrid - https://portal.futuregrid.org/tutorials/nimbus
Using OpenStack on FutureGrid - https://portal.futuregrid.org/tutorials/openstack

Answer to question 6.15 also provides an overview of using Hadoop on FutureGrid cloud
envrionments.

Problem 6.11:
Test run the large-scale matrix multiplication program on two or three cloud performs (GAE,
AWS, and Azure). You can also choose another data-intensive application such as large-scale
search or business processing applications involving the masses from the general public.
Implement the application on at least two or all three cloud platforms, separately. The major
objective is to minimize the execution time of the application. The minor objective is to minimize
the user service costs. (a) Run the service on the Google GAE platform

(b) Run the service on the Amazon AWS platform

(c) Run the service on the Windows Azure platform

(d) Compare your compute and storage costs, design experiences, and experimental
results on all three cloud platforms. Report their relative performance and QoS results
measured.

Implementations:
The implementation of large-scale matrix multiplication program on AWS and Azure using
Hadoop and MPI are given in this chapter. The solution using Hadoop on Amazon AWS
platform was discussed in problem 6.4&6.6. Here we discuss the solution using MPI on Azure
HPC scheduler. A parallel matrix multiply algorithm, named Fox algorithm, was implemented
using MS.MPI. Then we created the host service and deployed the Windows HPC cluster on
Azure using Azure HPC Scheduler SDK tools. After that we logon to HPC cluster head node
and submit the large scale matrix multiplication there.
Source code : http://156.56.93.128/PBMS/doc/answer6.14.zip

Steps:
1) Setup Azure HPC SDK environment:
http://msdn.microsoft.com/en-us/library/windowsazure/hh545593.aspx

6 - 11
2) Configure and deploy HPC Cluster on Azure.
http://msdn.microsoft.com/en-us/library/hh560239(v=vs.85).aspx
3) Logon to head node of HPC cluster and copy executable binary on head node
4) Setup execution environment and configure firewall exception:
clusrun /nodegroup:computenode xcopy /E /Y \\HEADNODE1\approot\*.* F:\approot\
clusrun /nodegroup:computenode hpcfwutil register FoxMatrix.exe
F:\approot\FoxMatrix.exe
http://msdn.microsoft.com/en-us/library/hh560242(v=vs.85).aspx.
5) Submit MPI job to HPC scheduler:
job submit /nodegroup:computenodes /numnodes:16 mpiexec -n 16 -wdir F:\approot\
F:\approot\FoxMatrix.exe 16000
Comparison:
As compared with Amazon AWS, both the two platforms provide graphic interface for users to
deploy Hadoop or HPC cluster respectively. Developers can submit the HPC jobs and Hadoop
jobs to the dynamically deployed cluster either on the head node or on the client PC through job
submission API. In regard to the performance, both applications run on Azure and EC2 show
the performance fluctuation. Figure 1&2 show the maximum error of performance fluctuation of
Hadoop using S3, Hadoop using HDFS, MPIAzure, MPICluster are 8.1%, 1.9%, 5.3%, and
1.2% respectively. The network bandwidth fluctuation is the main reason lead to performance
fluctuation of Hadoop S3 implementation. The performance fluctuation of MPIAzure
implementation is due to the aggregated delay of MPI communication primitives caused by
system noise in guest OS in Cloud environment.

Figure 1: performance fluctuate of Hadoop using Figure 2: performance fluctuate of MPIAzure


HDFS and S3 for different problem sizes and MPIHPC for different problem sizes.

Performance analysis:
Performance analysis of parallel matrix multiplication on Amazon EC2 has been discussed
in problem 6.4. This section just analysis performance of MPIAzure implementation. Figure 1
show the speedup of the MPICluster implementation is 8.6%, 37.1%, and 19.3% faster than that
of MPIAzure implementation when using 4, 9, and 16 nodes respectively. Again, the
performance degradation of MPIAzure implementation is due to the poor network performance
in Cloud environment.
This is caused by the poor network performance in Cloud environment. Figure 4 shows
the performance of Fox algorithm of three implementations using 16 compute nodes. As
expected, MPIAzure is slower than MPICluster, but is faster than DryadCluser. Figure 4&5 show
the parallel overhead versus 1/Sqrt(n), where n refers to number of matrices elements per node.

6 - 12
In figure 5, the parallel overhead using 5x5, 4x4 and 3x3 nodes cases are linear in 1/Sqrt(n),
which indicate the Fox MS.MPI implementation scale well in our HPC cluster with the infinite
band network. In figure 4, the parallel overhead using 3x3 and 4x4nodes do not converge to X
axis for large matrices sizes. The reason is the serious network contention occurred in Cloud
environment when running with large matrices.

Figure 3: speedup for number of nodes using Figure 4: Job time of different runtime on Azure and
MPIAzure and MPICluster on difference nodes HPC cluster for different problem sizes

Figure 5: parallel overhead vs. 1/Sqrt(n) for the Figure 6: parallel overhead vs. 1/Sqrt(n) for the
Fox/MPIAzure/MKL on 3x3 and 4x4 nodes Fox/MPICluster/MKL on 3x3 and 4x4 nodes

Problem. 6.12:
Google Apache Hadoop Microsoft
Programming MapReduce MapReduce Dryad
Environment
Coding Language
and Programming
Model used
Mechanisms GFS(Google HDFS (Hadoop Shared directories
for Data File System) Distributed File and local disks
Handling System)
Failure handling Re-execute failed Re-execution of Re-execution of
Methods tasks and deplicated failed tasks; failed tasks;

6 - 13
execu- tion of the Duplicate Duplicate execution
slow tasks execution of slow of slow tasks
tasks
High-Level Sawzall Pig Latin, Hive DryadLINQ
Language
for data anlysis
OS and Cluster Linux Clusters Linux Clusters, Windows HPCS
Environment Amazon Elastic cluster
MapReduce
on EC2
Intermediate data By File transfer By File transfer File, TCP pipes,
transfer method or using the http or using the http shared-memory
links links FIFOs

Problem 6.13:
The following program illustrates a sample application for image filtering using Aneka’s
MapReduce Programming Model. Note that the actual image filtering is dependent on the
problem domain and you may use any algorithm you see fit.

class Program
{
/// Reference to the configuration object.
static Configuration configuration = null;
/// Location of the configuration file.
static string configurationFileLocation = "conf.xml";
/// Processes the arguments given to the application and according
to the parameters read runs the application or shows the help.
/// <param name="args">program arguments</param>
static void Main(string[] args)
{
try
{
//Process the arguments
Program.ProcessArgs(args);
Program.SetupWorkspace();
//configure MapReduceApplication
MapReduceApplication<ImageFilterMapper, ImageFilterReducer>
application = new MapReduceApplication<ImageFilterMapper,
ImageFilterReducer>("ImageFilter", configuration);
//invoke and wait for result
application.InvokeAndWait(new EventHandler<Aneka.Entity.
ApplicationEventArgs>
(OnApplicationFinished));
}
catch (Exception ex)
{
Console.WriteLine(" Message: {0}", ex.Message);
Console.WriteLine("Application terminated unexpectedly.");
}
}
/// Hooks the ApplicationFinished events and Process the results
if the application has been successful.
/// <param name="sender">event source</param>
/// <param name="e">event information</param>
6 - 14
static void OnApplicationFinished(object sender,
Aneka.Entity.ApplicationEventArgs e)
{
if (e.Exception != null)
{
Console.WriteLine(e.Exception.Message);
}

Console.WriteLine("Press enter to finish!");


Console.ReadLine();
}

/// Processes the arguments given to the application and according


to the parameters read runs the application or shows the help.
/// <param name="args">program arguments</param>

static void ProcessArgs(string[] args)


{
for (int i = 0; i < args.Length; i++)
{
switch (args[i])
{
case "-c":
i++;
configurationFileLocation = args[i];
break;
default:
break;
}
}
}

/// Initializes the workspace

static void SetupWorkspace()


{
Configuration conf = Configuration. GetConfiguration(
Program.configurationFileLocation);
Program.configuration = conf;
}
}

/// Class ImageFilterMapper. Mapper implementation for the ImageFilter


application. The Map method reads the source images and performs the
required filtering. The output of the Map function is the filtered image.

public class ImageFilterMapper : Mapper<string, BytesWriteable>


{
/// The Map function receives as input the name of the image and its
contents. The filtering is then performed on the contents before
writing the results back to the storage.
/// <param name="input">A key-value pair representing the name of the
/// file and its contents.</param>

6 - 15
protected override void Map(IMapInput<string, BytesWriteable> input)
{
byte[] image = input.Value.GetBytes();

// Put your image filtering algorithm here


// ...
// ...

Emit(input.Key, image);
}
}
/// Class ImageFilterReducer. Reducer implementation for the ImageFilter
application. The Reducer is an identity function which does no processing.

public class ImageFilterReducer : Reducer<string, BytesWriteable>


{
/// The Reduce function is an identify function which does no further
processing on the contents.
protected override void Reduce(IReduceInputEnumerator<BytesWriteable> input)
{
// This is an identity function. No additional processing is required.
}
}

Once you have written and compiled you code, run your application by varying first the
input size and then the number of nodes (for example: 2, 4, 8, 16, ..). Plot a single graph of
execution time (y-axis) versus input size (x-axis) for the different sets of nodes used, so that
your final graph shows the difference in execution time for each of the sets of nodes. Next plot a
graph of speed-up (x-axis) versus input size (y-axis) for the different sets of nodes used.

Problem 6.14:

Developing a platform service such as Hadoop on various cloud infrastructure can be an


arduous task. Below we break this task down into 3 categories: building the VM, instantiating
VMs, and setup Hadoop.

Building a VM:

a) Eucalyptus: The ideal way to build a hadoop VM on Eucalyptus 2.0 is to start with a pre-
prepared base image and package it into your own EMI. You can find a starter image at
http://open.eucalyptus.com/wiki/starter-emis. Once a starter image is selected, it is
unzipped, mounted as a filesystem, and the Hadoop installation packages can be
unzipped in a desired installation path (recommended /opt). After the image is properly
prepared. The image is bundled, uploaded, and registered using the euca-bundle, euca-
upload and euca-register commands described at
http://open.eucalyptus.com/wiki/EucalyptusImageManagement_v2.0.
b) Nimbus: Select the precompiled Hadoop cluster available at the Nimbus Marketplace
http://scienceclouds.org/marketplace/ and add it to the given Nimbus cloud being used, if
not already available.
c) OpenStack: Similar to Eucalyptus, select a base image, either form the eucalyptus
precompiled images or from the Ubuntu UEC http://uec-images.ubuntu.com/releases/.
Once a starter image is selected, it is unzipped, mounted as a filesystem, and the
Hadoop installation packages can be unzipped in a desired installation path
(recommended /opt). After the image is properly prepared. The image is bundled,

6 - 16
uploaded, and registered using the euca-bundle, euca-upload and euca-register
commands described at
http://open.eucalyptus.com/wiki/EucalyptusImageManagement_v2.0.

Instantiate VMs:

a) Eucalyptus: Using the euca2ools commands and assuming the user has the
appropriate credentials and keypairs created, call euca-run-instances with the
predefined EMI number retrieved form the previous step. Alternatively, use the boto2
library in Python to create your own startup script.
b) Nimbus: Assuming the necessary credentials are in place, start the hadoop image by
using the bin/cloud-client.sh –run command and specifying the image name in the –
name attribute.
c) OpenStack: Using the euca2ools commands and assuming the user has the appropriate
credentials and keypairs created, call euca-run-instances with the predefined EMI
number retrieved form the previous step. Alternatively, use the boto2 library in Python to
create your own startup script.

Setup Hadoop:

Once a number of VMs have been instantiated and in the “running” state, select one as the
master Hadoop node and designate the others as slave nodes. For each node, set the proper
configuration in /etc/hosts, make changes to Hadoop’s configuration files as described at
https://portal.futuregrid.org/salsahadoop-futuregrid-cloud-eucalyptus#Configuration . Once
ready, you can start Hadoop on each VM with the bin/start-all.sh command and test using Lynx
and connecting to the master node’s MapReduce and HDFS services (lynx 10.0.2.131:9001
and lynx 10.0.2.131:9003).

Run WordCount:

Once the Hadoop HDFS and MapReduce services are running properly, run the WordCount
program described at
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3A+WordCoun
t+v1.0.

Problem. 6.15:
Examine the tutorials at http://www.salsahpc.org and
http://www.iterativemapreduce.org/samples.html. Compare the Hadoop and Twister on cases
specified by instructor from examples given there. Discuss their relative strength and
weakness. We select KMeansClustering application to compare Hadoop and Twister.

KMeans Clustering

Twister strengths

a) Data Caching: Twister supports in-memory caching of loop-invariant input data


(KMeans input data points) across iterations, eliminating the overhead of retrieving and
parsing the data in each iteration. Hadoop does not support caching of input data and
have to read & parse data from disk (or from another node in case of a non-data local
map task) in each iteration adding a significant overhead to the computation.
b) Iterative extensions: Twister programming model contains a combiner step (after the
reduce step) to merge the reduce outputs (new centeroids) and supports data

6 - 17
broadcasting at the beginning of an iteration. Hadoop does not support data
broadcasting or providing broadcast data (KMeans centroids) as an input to the map
tasks. Users will have to use an auxiliary mechanism (eg: distributed cache) to
broadcast and receive the centroid data. Also the users have to manually merge the new
centroids in the driver program.
c) Intermediate data communication: Twister performs streaming intermediate data
transfers directly to the reducers using messaging or TCP. Hadoop first writes the
intermediate data to the disks before transferring adding a significant performance
overhead as KMeansClustering performs significant amount of intermediate data
transfers.

Hadoop strengths

d) Fault Tolerance – Hadoop supports fine-grained task level fault tolerance, where it re-
execute the failed tasks to recover the computations. Hadoop also supports duplicate
execution of slow tasks to avoid the tail of slow tasks. Twister supports fault-tolerance
only in the iteration level, where if a task fails the whole iteration needs to be re-
executed.
e) Load Balancing – Hadoop performs global queue based dynamic scheduling resulting in
natural load balancing of the computations. Hadoop also supports having multiple waves
of map tasks per iteration, resulting in better load balancing and offsetting some of the
intermediate data communication costs (overlapping communication with computation).
Twister only supports static scheduling and do not support multiple waves of map tasks.
f) Monitoring – Hadoop provides a web based monitoring UI, where the user can monitor
the progress of the computations. Twister only provides a command line monitoring
output.

(a) and (b) applies only to the iterative MapReduce applications like KMeansClustering and
PageRank. Others apply classic MapReduce and pleasingly parallel applications as well.

Problem 6.16:
The given program is written in Hadoop, namely WebVisCounter. Readers are encouraged
to trace through the program or test run it on a cloud platform you have access. Analyze the
programming tasks performed by this Hadoop program and learn from using Hadoop library.

Refer to the tutorials on http://hadoop.apache.org/common/docs/r1.0.1/ on how to setup and run


Hadoop. Refer to the answer for question 6.15 for instructions on running Hadoop on cloud
environments.

Problem 6.17:
Twister K-means extends the MapReduce programming model iteratively. Many data
analysis techniques require iterative computations. For example, K-means clustering is the
application where multiple iterations of MapReduce computations are necessary for the overall
computation. Twister is an enhanced MapReduce runtime that supports iterative MapReduce
computations efficiently. In this assignment you will learn the iterative MapReduce programming
model and how to implement the K-means algorithm with Twister.

Please Learn how to use Twister from Twister webpage, and here is a helpful link:
http://salsahpc.indiana.edu/ICPAD/twister_kmeans_user_guide.htm

Problem 6.18:

6 - 18
DryadLINQ PageRank is a well known link analysis algorithm. It calculates numerical value
to each element of a hyperlinked set of web pages, which reflects the probability that the
random surfer will access that page. Implementing PageRank with MapReduce is of some
difficulty in both efficiency and programmability due to the random access model in large scale
web graph. DryadLINQ provide the SQL-like queries API that help programmer implement
PageRank without much effort. Besides, the Dryad infrastructure helps to scale the application
out in an easy way. This assignment will help you learn how to implement a simple PageRank
application with DryadLINQ.

We provide scratch code of DryadLINQ PageRank that can be compiled successfully. You
will learn the PageRank algorithm, and learn how to implement PageRank with DryadLINQ API.

PageRank algorithm
PageRank is the well know link analysis algorithm. It calculates numerical value to each element
of a hyperlinked set of web pages, which reflects the probability that the random surfer will
access that page. In mathematics’ view, the process of PageRank can be understood as a
Markov chain which needs recursively calculation to converge. The formula for PageRank
algorithm is given below:

This equation calculates the PageRank value for any page A. The updated rank value of page A
is the sum of each adjacency page’s own rank value divided by the number of outbound links of
that page. The damping factor (d) showed in Fig.1 illustrates the probability that one person will
continue searching web by following the links in the current web page. The damping factor is
subtracted from 1 and the result is divided by the number of web pages (N) in the collection.
Then this term “(1‐d)/N” is added to the updated rank value of page A. The damping factor
defined in Fig 1. is set as 0.85 in this assignment.

DryadLINQ Implementation
DryadLINQ is a compiler which translates LINQ programs to distributed computations.
LINQ is an extension to .NET, launched with Visual Studio 2008, which provides declarative
programming for data manipulation. With DryadLINQ, the programmer does not need to have
much knowledge about parallel or distributed computation. Thus any LINQ programmers turn
instantly into a cluster computing programmer.
The PageRank algorithm requires multiple iterations during the overall computation. One
iteration of PageRank computation consists of two job steps: 1) join the rank values table and
linkage table to generate the partial rank values; 2) aggregate the partial rank values for each
unique web page. A driver program keeps looping the join job and aggregate job until a stop
condition is achieved. E.g, the number of rounds has exceeded the threshold, or the total
difference of all rank values between two iterations is less than a predefined threshold.
In DryadLINQ PageRank we use “IQueryable<Page> pages” to store the linkage table,
and the “IQueryable<Vertex> rankValues” to store the rank values table. The linkage table is
built from the adjacency matrix of web graph. All the adjacency matrix input files are defined in
the partition table “cwpartition.pt”. The rank values are updated by using a Join of the current
“rankValues” with the “pages” object. The output of the Join is a list of <dest, value> pairs that
contain the partial rank values. We can aggregate those partial results by using a “GroupBy” on
the first element of the <dest, value> tuple. Then the partial rank values of each webpage are
accumulated, forming the new rank values for the next iteration.

Sample Code

6 - 19
Here is sample code of DryadLINQ PageRank. We use the formula showed in Fig 1. to
calculate the new rank values in each iteration.

public void RunPageRank()


{ // string ptPath = @"file://\\MADRID‐HEADNODE\DryadData\Hui\PageRank\cwpartition.pt";
PartitionedTable<LineRecord> table = PartitionedTable.Get<LineRecord>(ptPath);
IQueryable<Page> pages = table.Select(lr => buildPage(lr.line));
Vertex[] ver = new Vertex[numPages]; double initialRank = 1.0 / numPages; for (int i = 0; i <
numPages; i++) {
ver[i].source = i+1;
ver[i].value = initialRank; } IQueryable<Vertex> rankValues =
ver.ToPartitionedTable("rankValues.pt"); IQueryable<Vertex> newRankValues = null;
for (int i = 0; i < 10; i++)
{ newRankValues = pages.Join(rankValues, page => page.source, vertex => vertex.source,
(page, vertex) => page.links.Select(dest => new Vertex(dest, vertex.value / page.numLinks))).
SelectMany(list => list). GroupBy(vertex => vertex.source).
Select(group => new Vertex(group.Key, group.Select(vertex => vertex.value).Sum()/numPages
*0.85+0.15/numPages));
rankValues = newRankValues;
Console.WriteLine(" pagerank iteration no:" + i); } SaveResults(

Problem 6.19:
The following program illustrates the use of Aneka’s Thread Programming Model for
matrix multiplication. The program takes as inputs two square matrices. Each AnekaThread
instance is a row-column multiplier, that is, a row from the first matrix is multiplied with the
corresponding row from the second matrix to produce the resulting cell for the final matrix. Each
of these row-column computations is performed independently on a Worker node. The results of
the computations are then put together by the client application.

/// Class <i><b>MatrixMultiplier</b></i>. Multiplies two square matrices, where


each element in the resulting matrix, C, is computed by multiplying the
corresponding row and column vectors of matrix A and B. Each is carried out
by distinct instances of AnekaThread multiplying two square matrices of
dimension n would thus requires n*n AnekaThread instances.

public class MatrixMultiplier


{
/// The application configuration
private Configuration configuration;
/// Creates an instance of MatrixMultiplier
/// <param name="schedulerUri">The uri to the Aneka scheduler</param>
public MatrixMultiplier(Uri schedulerUri)
{
configuration = new Configuration();
configuration.SchedulerUri = schedulerUri;
}

6 - 20
/// Multiplies two matrices A and B and returns the resulting matrix C.
This method creates a list of AnekaThread instances to compute each of
the elements in Matrix C. These threads are submitted to the Aneka
runtime for execution and the results of each of these executions are
used to compose the resulting matrix C.
/// <param name="matrixA">Matrix A</param>
/// <param name="matrixB">Matrix B</param>
/// <returns>The result, Matrix C</returns>

public Matrix Multiply(Matrix matrixA, Matrix matrixB)


{
// Create application and computation threads
AnekaApplication<AnekaThread, ThreadManager> application = new
AnekaApplication<AnekaThread, ThreadManager>(configuration);

IList<AnekaThread> threads = this.CreateComputeThreads


(application, matrixA, matrixB);
// execute threads on Aneka
this.ExecuteThreads(threads);

// gather results
Matrix matrixC = this.ComposeResult(threads, matrixA.Size);

// stop application
application.StopExecution();
return matrixC;
}
/// Creates AnekaThread instances to compute each of the elements in the
resulting matrix C. These threads are initialized to execute the
RowColumnMultiplier. DoMultiply method on the remote node.
/// <param name="application">The AnekaApp instance containinbg the
application configuration</param>
/// <param name="matrixA">Matrix A</param>
/// <param name="matrixB">Matrix B</param>
/// <returns>The result, Matrix C</returns>

private Ilist<AnekaThread> CreateComputeThreads(AnekaApp <AnnekaThread,|


ThreadManager> application, matrix MatrixA, MatrixB)
IList<AnekaThread> threads = new List<AnekaThread>();
int dimension = matrixA.Size;
for (int row = 0; row < dimension; row++)
{
double[] rowData = this.ExtractRow(matrixA.Data, row, matrixA.Size);
for (int column = 0; column < dimension; column++)
{
double[]columnData=this.ExtractColumn(matrixB.Data, column,matrixB.Size);
RowColumnMultiplier rcMultiplier=newRowColumnMultiplier(rowData,columnData);
AnekaThread anekaThread = new AnekaThread (newThreadStart
(rcMultiplier.DoMultiply), application);threads.Add(anekaThread);
}
}
return threads;
}
}

6 - 21
/// Executes the list of AnekaThread instances on the Aneka runtime environment.
/// <param name="threads">The list of AnekaThread instances to execute</param>

private void ExecuteThreads(IList<AnekaThread> threads)


{
foreach (AnekaThread thread in threads)
{
thread.Start();
}
}
/// composes resulting matrix C.
/// <param name="Threads"> The list of AnekaThread instances that were submitted
for execution<param>
/// <paramname="size"> The size of matrix ram>
/// <returns> The result, matrix C</returns>
Private Matreix ComposeResult(Ilist<AnekaThread> threads, int size)
{
// wait until all threads complete..
for each (Anekathread thread in threads)
{
thread.JHoin();
}
//Composes resultant matrix C
Matrix matrixC = new Matrix(size);
for (int row = 0; row < size; row++)
{
for(int column = 0; column < size; column++)
{
AnekaThread thread=threads[(row*size)+column]; RowColumnMultiplier
rcMultiplier=(RowColumnMultiplier)thread.Target;
matrixC.Data[row, column] = rcMultiplier.Result;
}
}
return matrixC;
}

/// Extracts a row from a two-dimensional array.


/// <param name="array">The two-dimensional array</param>
/// <param name="rowIndex">The index of the row to extract</param>
/// <param name="length">The length of the row to extract</param>
/// <returns>A one-dimensional array</returns>

private double[] ExtractRow(double[,] array, int rowIndex, int length)


{
double[] row = new double[length];
for (int x = 0; x < length; x++)
{
row[x] = array[rowIndex, x];
}
return row;
}

/// Extracts a column from a two-dimensional array.


/// <param name="array">The two-dimensional array<</param>
/// <param name="columnIndex">The index of the column to extract</param>
/// <param name="length">The length of the column to extract</param>
/// <returns>A one-dimensional array</returns>

6 - 22
private double[] ExtractColumn(double[,] array, int columnIndex, int length)
{
double[] column = new double[length];
for (int x = 0; x < length; x++)
{
column[x] = array[x, columnIndex];
}
return column;
}
/// The main entry point to the application
/// <param name="args"></param>
static void Main(string[] args)
{
Matrix matrixA = new Matrix(10);
Matrix matrixB = new Matrix(10);
matrixA.InitRandom();
matrixB.InitRandom();
Uri schedulerUri = new Uri("tcp://localhost:9090/Aneka");
MatrixMultiplier multiplier = new MatrixMultiplier(schedulerUri);
Matrix matrixC = multiplier.Multiply(matrixA, matrixB);matrixC.Print();
Console.ReadKey();
}
}
/// Class <i><b>Matrix</b></i>. Represents a square matrix where
/// each element occupies a slot in a two-dimensional array. [Serializable]

public class Matrix


{ /// Array of elements in the matrix.
private double[,] data;
/// Gets or sets the 2D array containing the elements in the matrix.
/// </summary>
public double[,] Data
{
get { return this.data; }
set { this.data = value; }
}
/// <summary>
/// The size of the square matrix.
/// </summary>
private int size;
/// <summary>
/// Gets or sets the size of the square matrix.
/// </summary>
public int Size
{
get { return this.size; }
set { this.size = value; }
}
/// The size of the square matrix.
private int size;
/// Gets or sets the size of the square matrix.
public int Size
{
get { return this.size; }
set { this.size = value; }
}

6 - 23
/// Creates a new square matrix of dimension <paramref name="size"/>
/// <param name="size">The dimension of square matrix. </param>

public Matrix(int size)


{
data = new double[size, size];
this.size = size;
}
/// Initializes the matrix with random doubles between 0 to 10.
public void InitRandom()
{
Random rand = new Random();
for (int x = 0; x < size; x++)
{
for (int y = 0; y < size; y++)
{
data[x, y] = rand.NextDouble() * 10;
}
}
}

/// Prints the elements in the matrix to the console.

public void Print()


{
for (int x = 0; x < size; x++)
{
for (int y = 0; y < size; y++)
{
Console.Write(data[x, y].ToString("0.00"));

if (y < size - 1)
{
Console.Write(", ");
}
}
Console.WriteLine();
}
}
}
/// Class <i><b>RowColumnMultiplier</b></i>. Multiplies a row matrix with a
column matrix to produce a matrix with one element. [Serializable]

public class RowColumnMultiplier


{
/// A row matrix to multiply
private double[] row;
/// A column matrix to multiply
private double[] column;
/// The result of row-column multiplication
private double result;
/// Gets the result of the row-column multiplication
public double Result
{
get { return this.result; }
}

6 - 24
/// Creates a new RowColumnMultiplier
/// <param name="row">The row to multiply</param>
/// <param name="column">The column to multiply</param>

public RowColumnMultiplier(double[] row, double[] column)


{
this.row = row;
this.column = column;
}

/// Multiplies the row and column matrices

public void DoMultiply()


{
// row and column are of the same dimension
for (int x = 0; x < row.Length; x++)
{
this.result += row[x] * column[x];
}
}
}

Once you have written and compiled you code, run your application by varying first the input
size of the matrix and then the number of nodes as specified in the question (for example: 10,
20, 30, ..). Plot a single graph of execution time (y-axis) versus input size (x-axis) for the
different sets of nodes used, so that your final graph shows the difference in execution time for
each of the sets of nodes. Next plot a graph of speed-up (x-axis) versus input size (y-axis) for
the different sets of nodes used.

The Aneka installation comes with a number of examples which you will find in the
installation’s destination directory. After installation take a look at the following directory (Note
that the directory will vary depending on where you installed Aneka.):

C:\Program Files\Manjrasoft\Aneka.2.1\Examples\Tutorials

Here you will find sample applications, including source code, for each of the different
programming models: MapReduce, Task and Thread. Open the Visual Studio solution for each
of these examples, compile and run them by varying the number of nodes as specified in the
question. Observe that as you increase the number of nodes, you’ll find that the execution time
reduces (given that you have a sufficiently large dataset to work with). Plot a graph showing the
execution time (x-axis) versus the input size (y-axis) for the different sets of nodes.

Problem 6.20:
This problem is research oriented. Study Section 6.2.3 and Section 6.2.5 to conduct a
comparative study. Then reveal their advantages and shortcoming based on the results of
deployed data-intensive Pig and Hadoop applications experimented. The conclusion may be
application-dependent.

6 - 25

You might also like