Grid Topologies Unit I

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

INTRODUCTION & WORKING

 Grid Computing can be defined as a network of computers working


together to perform a task that would rather be difficult for a single
machine.
 All machines on that network work under the same protocol to act like a
virtual supercomputer.
 The task that they work on may include analysing huge datasets or
simulating situations which require high computing power.
 Computers on the network contribute resources like processing power
and storage capacity to the network.
 Grid Computing is a subset of distributed computing, where a virtual
super computer comprises of machines on a network connected by some
bus, mostly Ethernet or sometimes the Internet.
 It can also be seen as a form of Parallel Computing where instead of
many CPU cores on a single machine, it contains multiple cores spread
across various locations.
 The concept of grid computing isn’t new, but it is not yet perfected as
there are no standard rules and protocols established and accepted by
people.

Working:

A Grid computing network mainly consists of these three types of machines

1. Control Node: A computer, usually a server or a group of servers which


administrates the whole network and keeps the account of the resources in
the network pool.
2. Provider: The computer which contributes it’s resources in the network
resource pool.
3. User: The computer that uses the resources on the network.
 When a computer makes a request for resources to the control node,
control node gives the user access to the resources available on the
network.
 When it is not in use it should ideally contribute it’s resources to the
network. Hence a normal computer on the node can swing in between
being a user or a provider based on it’s needs.
 The nodes may consist of machines with similar platforms using same OS
called homogenous networks, else machines with different platforms
running on various different OS called heterogenous networks.
 This is the distinguishing part of grid computing from other distributed
computing architectures.
 For controlling the network and it’s resources a software/networking
protocol is used generaly known as Middleware.
 This is responsible for administrating the network and the control nodes
are merely it’s executors.
 As a grid computing system should use only unused resources of a
computer, it is the job of the control node that any provider is not
overloaded with tasks.
Applications of Grid computing

1. General Applications of Grid Computing


 Distributed Supercomputing
 High-throughput Supercomputing
 On-demand Supercomputing
 Data-intensive Supercomputing
 Collaborative Supercomputing

2. Applications of Grid Computing Across Different Sectors


 Movie Industry
 Gaming Industry
 Life Sciences
 Engineering and Design
 Government

Distributed Supercomputing
Distributed supercomputing may seem like a fancy term but it’s very easy to understand. Let me put
it this way, a grid computing network spread across different geographical areas, even different
countries is distributed supercomputing. Easy right!
High-throughput Supercomputing
High throughput tasks give a little bit away by its name only, doesn’t it? Well, it can be characterized
by the tasks wherein you need a large amount of processing power for an extended amount of time.
Time can vary from few months to years. So to accomplish these tasks, there is a need for high
throughput supercomputing.
On-demand Supercomputing
One of the applications of grid computing is on-demand supercomputing. Came into existence to
overcome the problems that enterprises faced during fluctuating demand.
This happens when computing services are provided by the third party.
This model can be characterized by three attributes namely pay-per-use, self-service, and scalability.
On-demand supercomputing increases a business’ agility.
Data-intensive Supercomputing
This is a kind of parallel computing wherein a massive volume of data is divided into chunks of data
which are then processed simultaneously.
The data can be as big as the big data! This type of computing is very resourceful when there’s a
time constraint associated with the task/project as the operations work simultaneously.
So, the data is divided and worked upon at the same time.
Collaborative Supercomputing
 As the name suggests, this kind of computing happens when one organization wants to
collaborate with another to make use of its supercomputing abilities.
 Doesn’t matter if you’re a mid-size or a large size enterprise, if you don’t have the required
skill and resources this is the only way out.
 Also, Rolls-Royce, a British aerospace and aviation firm, is one of the first companies to join
this collaborative supercomputing initiative.
 So these were a few applications of grid computing. Now let me take you to a tour of how
some industries are applying grid computing.

Applications of Grid Computing Across Different Sectors

Grid Computing in the Movie Industry


 Nowadays, being in an IT position in the media industry is also a glamorous position to be in.
Large studios are trying hard to recruit the best IT specialists and programmers out there.
 One of the major reasons for recruiting the best talent is the increased need for realism in
the movies or what we call special effects.
 Many films can’t be made without the use of grid computing not only because of special
effects but also because of the fact that grid computing enables faster production of a film.
 Grid computing is an enhancement tool for the studios.

Grid Computing in the Gaming Industry

 Traditionally, the gaming industry was an in-house scenario.


 But with the growing popularity of online gaming, networking is becoming pivotal, thus
comes the grid computing.
 Despite all this hesitation, There are few areas where grid computing is gaining popularity,

• In-game art internal creation

• In-game cut scenes rendering

• Packaging game assets for multiple platforms

• Distribution of online program

• Hosting of a massively multiplayer online game

Grid Computing in Life Sciences

 The advances in the life sciences sector have led to some accelerated changes in the ways of
conducting drug discovery and drug treatment.
 With these rapid changes, new challenges are also being surfaced like massive amounts of
data analysis, data caching, data mining, and data movement.
 With great power comes great responsibility, likewise with these complexities came a few
surrounding requirements such as secured storage, privacy, secured data access, and more.
 This requires a grid computing architecture that can manage all these complexities and all
the while accurately analyze the data.
 It gives the ability to afford top-notch information while providing faster responses and
accurate results.

Grid Computing in Engineering and Design

 The pressure on the engineering firms and industry as a whole is increasing day by day
resulting in lesser turnaround time.
 The industry is in grave need of capturing data and speeding up the analysis part.

Grid computing can answer these complexities:

o Analysis of real-time data to find a particular pattern.


o Experiment modeling to create new designs.
o Verifying existing models for accuracy using simulation activities.
Grid Computing in Government

 We all can imagine the amount of data that the government has to process. Grid computing
can help in coordinating all the data, which is held across government agencies.
 This will make way for clear coordination, not only in case of emergencies but in normal
situations as well.
 Grid computing will enable virtual organizations which may include many government
agencies as participants.
 This is an essential step to provide the required data in real-time and simultaneously analyze
the data to detect any problem and its solution.
Grid Topologies
A grid topology refers to the structure and organization of resources
within a grid infrastructure. There are three basic topologies associated with
grid computing: intra-grids, extra-grids and inter-grids.

INTRA GRID

Intra-grids are the simplest of the three grid topologies. A typical intra-grid
topology exists within a single organization, providing a basic set of Grid
services. The primary characteristics of an intra-grid are a single security
provider, bandwidth on the private network is high and always available, and
there is a single environment within a single network.

EXTRA GRID

Extra-grids are more complicated in that they involve a consolidation of

different intragrids. An extra-grid is characterized by dispersed security,


multiple organization and remote/wide area network (WAN) connectivity. In

simple way, An extra-grid, typically involves more than one security provider,

and the leveL management complexity increases. Security is an increased

concern because data passes beyond organizational boundaries. Resources are

more heterogeneous (due to multiple organizations being involved), more

dynamic in nature (organizations do not have control over each other’s

resources) and typically require policies in order to control utilization of the

resources. Data and resources in an extragrid environment are confined to an

organization and the partners it wishes to share with.


INTER GRID

Inter-grids are the most complicated of the three topologies. Inter-grids have the

same characteristics as extra-grids, except that data and resources within the

environment are global and are available to the public.

Regardless of the topology being used, the user is still presented with the

same view of the system. That is, grid data and grid resources are made to

appear as if they are part of the local machine.


GRID INFRASTRUCTURES

An infrastructure is an underlying foundation that provides basic

facilities, services and installations needed for the functioning of an

organization or system. For example, the road system allows people to travel by

vehicle, the banking system allows people to transfer funds across borders, the

Internet allows people to communicate with each other and virtually any

electronic device, etc

Grid computing is an emerging infrastructure that provides scalable and

secure mechanisms for the access, sharing and discovery of resources amongst

dynamic collections of individuals and institutions. One of its goals is to

provide these services in a manner that hides the implementation details from

the user. The availability of high-speed networks, low cost components and the

ubiquity of the Internet and Web technologies make it feasible to realize this

vision.

Components and Layers


Components

There are some of the key software components that must be considered when

designing a grid infrastructure. These components are outlined in:

Management: All grid systems consist of some form of management software.

At the very least, a grid has a management component responsible for keeping

track of resource availability and the users of the grid. Some, if not most grids

contain software components that deal with the management of distributed

resources, workload and other grid components.

Contribution: Machines shared as part of a grid require a mechanism which

allows them to “contribute” themselves as a grid resource. This is typically

achieved through some form of contribution software. Contribution software

registers a machine as part of the grid, enables grid management functions, and

can accept and execute applications submitted to it by grid management

software. In addition, contribution software usually monitors resource

utilization on the local machine (e.g., processor utilization, disk space, etc.) and

application execution (i.e., did a submitted application execute successfully?)

and reports it to higher level management components. This information is used

to make high level resource utilization and scheduling decisions.

Submission: Submission software enables a grid user to submit applications to

the grid and perform grid queries. For example, the user might want to find out
if certain resources are present on the grid, or query the status of a submitted

application. These actions typically require some form of authentication on the

part of the user, such as requiring a username and password or obtaining some

type of authentication “document” such as one provided by a Certificate

Authority (CA). Submission software can be bundled as part of the contribution

software or implemented as a separate entity.

Communication: Communication software is responsible for making sure that

applications running on the grid are capable of locating and communicating

with one another. For example, an application might consist of several sub-jobs

that perform a distributed computation. Once these sub-jobs have been

scheduled and deployed onto local machines on the grid, they need to be able to

locate and establish a communication link between each other in order to

perform the computation.

Scheduling: Grid systems that support the execution of grid applications

typically contain software components that handle application scheduling.

Scheduling software is responsible for locating the appropriate machines on

which to run applications. A simple scheduler queues requests to execute an

application and schedules these requests in the order of arrival. A more

advanced scheduler would support application priority. An advanced scheduler

might react to grid workload, using information about machine utilization, by

scheduling applications to less utilized machines. Other advanced features


include detecting and re-submitting applications upon failure and a reservation

system that guarantees that an application is able to execute at a specified date

and time.

Monitoring: Monitoring software is responsible for monitoring the “health” of

the grid. It is sometimes referred to as “sensor” software. grids contain tools and

components that monitor the load and activity on each of its machines. Load

information is used in order to discover usage patterns and to make intelligent

scheduling and resource utilization decisions. Information received from

monitoring software can also be used for accounting purposes, for recording

application profiles, and many other things. Application profiles are used in

order to predict the run time and usage patterns for certain types of applications.
GRID ARCHITECTURE

Building a grid infrastructure requires the design and development of protocols

and services which address issues of security, resource aggregation, resource

discovery, resource selection, job scheduling, job execution management, and

more.

The key Layers involved in the architecture are:

 Fabric: This consists of all the distributed resources, owned by different

individuals and organizations, shared on the grid. This includes

workstations, resource management systems, storage systems, specialized

devices, etc.
 Resource and Connectivity Protocols: This contains core communication

and authentication protocols that provide secure mechanisms for

verifying the identity of users and resources and allow data to be shared

between resources.

 Collective Services: This contains Application Programming Interfaces

(APIs) and services that implement interactions across collections of

resources. This includes directory and brokering services for resource

allocation and discovery, monitoring and diagnostic services, application

scheduling and execution, and more.

 User Applications: This contains programming tools and user

applications that depend on grid resources and services during their

execution.
WEB SERVICES

Service-Oriented Architecture (SOA):A Service-Oriented Architecture

(SOA) is an architectural style involving a collection of services capable of

communicating with one another. This communication can involve either data

passing or it could involve two or more services coordinating some activity.

Web services are a form of SOA. Web service interfaces describe a

collection of operations that are network-accessible through the use of XML

messaging. The purpose of Web services is to facilitate application-to-

application communication. An example of this communication using the

Simple Object Access Protocol (SOAP).

Web Services define techniques for describing software components that

allow access to themselves, methods for accessing these components, and


techniques that allow for the identification and discovery of relevant service

providers.

There are three roles that are assumed in the Web Services model.

The requester is an entity that requests the use of a particular service. Services

are created and deployed on a server that is capable of receiving messages in a

particular encoding over a transfer protocol (e.g., Hypertext Transfer Protocol

(HTTP)). The server might support several styles of encoding and/or protocol

pairs, each of which is called a binding because it binds a high-level service

definition to a low-level means of invoking the service. From the Requester’s

point of view, a Registry represents a collection of services which may provide

suitable implementations of the interfaces that it needs to use. The requester can

search for the interface it needs by filtering out interfaces based on criteria

associated with the binding.

Web services are based on three key technologies:


 Simple Object Access Protocol (SOAP)

 Web Services Description Language (WSDL)

 Web Service Inspection (WSI)

SOAP is an XML-based messaging protocol used for encoding Web

Service request and response messages before sending them over a network.

SOAP defines a convention for Remote Procedure Calls (RPC) and a

convention for messaging independent of the underlying transport protocol.

The messages can be transported over a variety of protocols such as FTP,

SMTP, MIME and HTTP.

WSDL is an XML-based language that defines the set of messages, the

encodings and the protocols that are used to communicate with a service. It

allows multiple bindings for a single interface.

WSI is an XML-based language designed to assist in locating service

descriptions and involves rules that define how inspection-related

information should be made available. A WSI document is a collection of

references to WSDL documents.


TRADITIONAL PARADIGMS FOR DISTRIBUTED COMPUTING

1. Socket programming: Sockets provide a low-level API for writing distributed

client/server applications. Before a client communicates with a server, a socket endpoint

needs to be created. The transport protocol chosen for communications can be either

TCP or UDP in the TCP/IP protocol stack. The client also needs to specify the hostname

and port number that the server process is listening on. The standard socket API is well-

defined, however the implementation is language dependant. So, this means socket-

based programs can be written in any language, but the socket APIs will vary with each

language use. Typically, the socket client and server will be implemented in the same

language and use the same socket package, but can run on different operating systems

(i.e. in the Java case). Socket programming is a low-level communication

technique, but has the advantage of a low latency and high-bandwidth

mechanism for transferring large amount of data compared with other

paradigms. However, sockets are designed for the client/server

paradigm, and today many applications have multiple components

interacting in complex ways, which means that application


development can be an onerous and time-consuming task. This is due

to the need for the developer to explicitly create, maintain, manipulate

and close multiple sockets.

2. RPC: RPC is another mechanism that can be used to construct

distributed client/server applications. RPC can use either TCP or UDP

for its transport protocol. RPC relies heavily on an Interface Definition

Language (IDL) interface to describe the remote procedures executing

on the server-side. From an RPC IDL interface, an RPC compiler can

automatically generate a client-side stub and a server-side skeleton.

With the help of the stub and skeleton, RPC hides the low-level

communication and provides a high- level communication abstraction

for a client to directly call a remote procedure as if the procedure were

local. RPC itself is a specification and implementations such as Open

Network Computing (ONC) RPC from Sun Microsystems and

Distributed Computing Environment(DCE)RPC from the Open

Software Foundation (OSF) can be used directly for implementing

RPC-based client/server applications.


The steps to implement and run a client/server application

with RPC are:

 Write an RPC interface in RPC IDL;

 Use an RPC compiler to compile the interface to generate a

client-side stub and a server-side skeleton;

 Implement the server;

 Implement the client;

 Compile all the code with a RPC library;

 Start the server;

 Start the client with the IP address of the server.

3. Java RMI: The Java RMI is an object-oriented mechanism from Sun

Microsystems for building distributed client/server applications. Java

RMI is an RPC implementation in Java. Similar to RPC, Java RMI

hides the low-level communications between client and server by

using a client-side stub and a server-side skeleton (which is not needed


in Java 1.2 or later) that are automatically generated from a class that

extends java.rmi.UnicastRemoteObject and implements an RMI

Remote interface. At run time there are three interacting entities

involved in an RMI application.

These are:

i. A client that invokes a method on a remote object.

ii. A server that runs the remote object which is an ordinary object

in the address space of the server process.

iii. The object registry (rmi registry), which is a name server that

relates objects with names. Remote objects need to be registered

with the registry. Once an object has been registered, the

registry can be used to obtain access to a remote object using

the name of that object.

The steps to implement and run a Java RMI client/server app are:
• Write an RMI interface;

• Write an RMI object to implement the interface;

• Use RMI compiler (rmic) to compile the RMI object to generate

a client-side stub and an server-side skeleton;

• Write an RMI server to register the RMI object;

• Write an RMI client;

• Use Java compiler (javac) to compile all the Java source codes;

• Start the RMI name server (rmiregistry);

• Start the RMI server; • Start the RMI client.

4. DCOM: The Component Object Model (COM) is a binary standard

for building Microsoft-based component applications, which is

independent of the implementation language. DCOM is an extension

to COM for distributed client/server applications. Similar to RPC,

DCOM hides the low-level communication by automatically

generating a client-side stub (called proxy in DCOM) and a server-

side skeleton (called stub in DCOM) using Microsoft’s Interface

Definition Language (MIDL) interfaces. DCOM uses a protocol called

the Object Remote Procedure Call (ORPC) to invoke remote COM

components. DCOM is language independent; clients and DCOM

components can be implemented in different languages. Although

DCOM is available on non-Microsoft platforms, it has only achieved


broad popularity on Windows. Another drawback of DCOM is that it

only supports synchronous communications.

The steps to implement and run a DCOM client/server applications

are:

• Write an MIDL interface;

• Use an interface compiler (midl) to compile the interface to

 generate a client-side stub and a server-side skeleton;

• Write the COM component to implement the interface;

• Write a DCOM client; • Compile all the codes;

• Register the COM component with a DCOM server;

• Start the DCOM server; • Start the DCOM client.

5. CORBA: CORBA is an object-oriented middleware infrastructure

from Object Management Group (OMG) for building distributed

client/server applications. Similar to Java RMI and DCOM, CORBA


hides the low-level communication between the client and server by

automatically generating a client-side stub and a server-side skeleton

through an Interface Definition Language (IDL) interface. CORBA

uses Internet-Inter ORB Protocol (IIOP) to invoke remote CORBA

objects.The Object Request Broker (ORB) is the core of CORBA; it

performs data marshaling and unmarshalling between CORBA clients

and objects. Compared with Java RMI and DCOM, CORBA is

independent of location, a particular platform or programming

language. CORBA supports both synchronous and asynchronous

communications. CORBA has an advanced directory service called

COSNaming, which provides the mechanisms to allow the transparent

location of objects. However, CORBA itself is only an OMG

specification.

The steps to implement and run a CORBA client/server

application are:

• Write a CORBA IDL interface;

• Use an IDL compiler to compile the interface to generate a

client side stub and a server-side skeleton;

• Write a CORBA object to implement the interface;

• Write a CORBA server to register the CORBA object;

• Write a CORBA client; • Compile all the source codes;

• Start a CORBA name server; • Start the CORBA server;


• Start the CORBA client.

You might also like