Grid Computing: 1. Abstract

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

GRID COMPUTING

1. ABSTRACT organizational rather than the individual level. Thus,


expressing restrictive policies on a user-by-user
Grid computing is a method of harnessing basis often proves difficult. Also, frequently a
the power of many computers in a network to solve single transaction takes place across many grid
problems requiring a large number of processing nodes that are dynamic and unpredictable. Finally,
cyc1es and involving huge amounts of data. The unlike the Internet, a grid gives outsiders complete
grid computing helps in exploiting underutilized access to a resource, thus increasing the security
resources, achieving parallel CPU capacity; provide risk. Grid security is a multidimensional problem.
virtual resources for collaboration and reliability. Organizations participating in grids must use
Although commercial and research organizations appropriate policies, such as firewalls, to harden
might have collaborative or monetary reasons to their Infrastructures while enabling interaction with
share resources, they are unlikely to adopt such a outside resources.
distributed infrastructure until they can rely on the
confidentiality of the communication, the integrity In this paper, we briefly describe the reasons

of their data and resources, and the privacy of the for using grid computing and analyze the unique

user information. In other words, large-scale security requirements of large-scale grid computing.

deployment of grids will occur when users can We propose a security policy for grid systems that

count on their security. addresses requirements for single sign-on,


interoperability with local policies, and dynamically
Most organizations today deploy firewalls varying resource requirements. This policy focuses
around their computer networks to protect their on authentication of users, resources, and processes
sensitive proprietary data. But the central idea of and supports user-to resource, resource to user,
grid computing-to enable resource sharing makes process-to-resource, and Process to process
mechanisms such as firewalls difficult to use. On the authentication. We also describe security
grid, participants form virtual organizations architecture and associated protocols that
dynamically, and the trust established prior to such implement this policy.
collaborations often takes place at the
2. INTRODUCTION addresses requirements for single sign-on, inter

Grid Computing is a method of harnessing operability with local policies, and dynamically

the power of many computers in a network to solve varying resource requirements. This policy focuses

problems requiring a large number of processing on authentication of users, resources, and processes

cycles and involving huge amounts of data. Grid and supports user-to-resource, resource-to-user,

applications are distinguished from traditional client process-to-resource, and process-to-process

server applications by their simultaneous use of authentication.

large numbers of resources, dynamic resource


requirements, use of resources from multiple 2. Reasons for using Grid Computing
administrative domains, complex communication When you deploy a grid, it will be to meet a
structures and stringent performance requirements, set of customer requirements. To better match grid
among others. computing capabilities to those requirements, it is
While scalability, performance and useful to keep in mind the reasons for using grid
heterogeneity are desirable goals for any distributed computing.
system, the characteristics of computational grids Exploiting underutilized resources
lead to security problems that are not addressed by The easiest use of grid computing is to run
existing security technologies for distributed an existing application on a different machine. The
systems. For example parallel computations that machine on which the application is normally run
acquire multiple computational resources introduce might be unusually busy due to an unusual peak in
the need to establish security relationships not activity. The job in question could be run on an idle
simply between a client and a server, but among machine elsewhere on the grid. There are at
potentially hundreds of processes that collectively least two prerequisites for this scenario. First, the
span many administrative domains. Further more, application must be executable remotely and
the dynamic nature of grid can make it impossible without undue overhead. Second, the remote
to establish trust relationships between sites prior to machine must meet any special hardware, software,
application execution. Finally, by inter domain or resource requirements imposed by the
security solutions used for grids must be able to application.
inter operate with, rather than replace, the diverse For example, a batch job that spends a
intra domain access control technologies inevitable significant amount of time processing a set of input
encountered in individual domains. data to produce an output set is perhaps the most
In this paper, we describe new techniques ideal and simple use for a grid. If the quantities of
that overcome many of the cited difficulties. We input and output are large, more thought and
propose a security policy for grid systems that planning might be required to efficiently use the
grid for such a job. It would usually not make sense grid. In addition to pure scientific needs, such
to use a word processor remotely on a grid because computing power is driving a new evolution in
there would probably be greater delays and more industries such as the bio-medical field, financial
potential points of failure. modeling, oil exploration, motion picture animation,
In most organizations, there are large and many others.
amounts of underutilized computing resources. The common attribute among such uses is
Most desktop machines are busy less than 5 percent that the applications have been written to use
of the time. In some organizations, even the server algorithms that can be partitioned into
machines can often be relatively idle. Grid independently running parts. A CPU intensive grid
computing provides a framework for exploiting application can be thought of as many smaller “sub
these underutilized resources and thus has the jobs,” each executing on a different machine in the
possibility of substantially increasing the efficiency grid. To the extent that these sub jobs do not need to
of resource usage. communicate with each other, the more “scalable”
The processing resources are not the only the application becomes. A perfectly scalable
ones that may be underutilized. Often, machines application will, for example, finish 10 times faster
may have enormous unused disk drive capacity. if it uses 10 times the number of processors.
Grid Computing, more specifically, a “data grid”, Barriers often exist to perfect scalability.
can be used to aggregate this unused storage into a The first barrier depends on the algorithms used for
much larger virtual data store, possibly configured splitting the application among many CPUs. If the
to achieve improved performance and reliability algorithm can only be split into a limited number of
over that of any single machine. independently running parts, then that forms a
If a batch job needs to read a large amount scalability barrier. The second barrier appears if the
of data, this data could be automatically replicated parts are not completely independent; this can cause
at various strategic points in the grid. Thus, if the contention, which can limit scalability.
job must be executed on a remote machine in the For example, if all of the sub jobs need to
grid, the data is already there and does not need to read and write from one common file or database,
be moved to that remote point. This offers clear the access limits of that file or database will become
performance benefits. Also, such copies of data can the limiting factor in the application’s scalability.
be used as backups when the primary copies are Other sources of inter-job contention in a parallel
damaged or unavailable. grid application include message communications
Parallel CPU capacity latencies among the jobs, network communication

The potential for massive parallel CPU capacities, synchronization protocols, input-output

capacity is one of the most attractive features of a bandwidth to devices and storage devices, and
latencies interfering with real-time requirements. software, services, licenses, and others. These
Other sources of inter-job content in parallel grid resources are “virtualized” to give them a more
application include message communications uniform interoperability among heterogeneous grid
latencies among the jobs. participants.
Virtual resources and Virtual Organizations Reliability
for Collaborations High-end conventional computing systems
Another important grid computing use expensive hardware to increase reliability. They
contribution is to enable and simplify collaboration are built using chips with redundant circuits that
among a wider audience. In the past, distributed vote on results, and contain much logic to achieve
computing promised this collaboration and achieved graceful recovery from an assortment of hardware
it to some extent. Grid computing takes these failures. The machines also use duplicate processors
capabilities to an even wider audience, while with hot plug ability so that when they fail, one can
offering important standards that enable very be replaced without turning the other off. Power
heterogeneous systems to work together to form the supplies and cooling systems are duplicated. The
image of a large virtual computing system offering systems are operated on special power sources that
a variety of virtual resources. The users of the grid can start generators if utility power is interrupted.
can be organized dynamically into a number of All of this builds a reliable system, but at a great
virtual organizations, each with different policy cost, due to the duplication of high-reliability
requirements. These virtual organizations can share components.
their resources collectively as a larger grid. In the future, we will see a complementary
Sharing starts with data in the form of files approach to reliability that relies on software and
or databases. A “data grid” can expand data hardware. A grid is just the beginning of such
capabilities in several ways. First, files or databases technology. The systems in a grid can be relatively
can seamlessly span many systems and thus have Inexpensive and geographically dispersed. Thus, if
larger capacities than on any single system. Such there is a power or other kind of failure at one
spanning can improve data transfer rates through location, the other parts of the grid are not likely to
the use of striping techniques. Data can be be affected. Grid management software can
duplicated throughout the grid to serve as a backup automatically resubmit jobs to other machines on
and can be hosted on or near the machines most the grid when a failure is detected. In critical, real-
likely to need the data, in conjunction with time situations, multiple copies of the important
advanced scheduling techniques. jobs can be run on different machines throughout
Sharing is not limited to files, but also the grid. Their results can be checked for any kind
includes many other resources, such as equipment, of inconsistency, such as computer failures, data
corruption, or tampering. Such grid systems will importance with a specific deadline. A grid cannot

utilize “autonomic computing.” This is a type of perform a miracle and achieve a deadline when it is

software that automatically heals problems in the already too close. However, if the size of the job is

grid, perhaps even before an operator or manager is known, if it is a kind of job that can be sufficiently

aware of them. In principle, most of the reliability split into sub jobs, and if enough resources are

attributes achieved using hardware in today’s high available after preempting lower priority work, a

availability systems can be achieved using software grid can bring a very large amount of processing

in a grid setting in the future. power to solve the problem. In such situations, a
grid can, with some planning, succeed in meeting a
Resource balancing
surprise deadline.
A grid federates a large number of resources
3. Security in Grid Computing
contributed by individual machines into a greater
total virtual resource. For applications that are grid- a. The Grid Security Problem
enabled, the grid can offer a resource balancing We introduce of grid security problem with

effect by scheduling grid jobs on machines with low and example illustrated in figure1. We imagine a

utilization. This feature can prove invaluable for scientist, a member of a multi-institutional scientific

handling occasional peak loads of activity in parts collaboration, who receives e-mail from a colleague

of a larger organization. This can happen in two regarding a new data set. He starts an analysis

ways: An unexpected peak can be routed to program, which dispatches code to the remote

relatively idle machines in the grid and if the grid is location where the data is stored (site C). Once

already fully utilized, the lowest priority work being started, the analysis program determines that it

performed on the grid can be temporarily needs to run a simulation in order to compare the

suspended or even cancelled and performed again experimental results with predictions. Hence, it

later to make room for the higher priority work. contacts a resource broker service maintained by the

Without a grid infrastructure, such balancing collaboration (at site D), in order to locate the

decisions are difficult to prioritize and execute. resources that can be used for the simulation. The

Occasionally, a project may suddenly rise in resource broker in turn initiates


Computation on computers at two sites (E and
G).These computers access parameter values store
on a File system at another site (F) and also
communicate among themselves and with broker,
the original site, And the user.
We imagine a scientist, a member of a during its execution. Even in our simple
multi-institutional scientific collaboration, who example, the computation required
receives e-mail from a colleague regarding a new resources at five sites. In other words,
data set. He starts an analysis program, which throughout its lifetime, a computation is
dispatches code to the remote location where the composed of a dynamic group of processes
data is stored (site C). Once started, the analysis running on different resources and sites.
program determines that it needs to run a simulation 4. The processes constituting a computation
in order to compare the experimental results with may communicate by using a variety of
predictions. Hence, it contacts a resource broker mechanisms, including unicast and
service maintained by the collaboration (at site D), multicast. While these processes form a
in order to locate idle resources that can be used for single logical entity, low-level
the simulation. The resource broker in turn initiates communication connection may be created
computation on computers at two sites (E and G). and destroyed dynamically during program
These computers access parameter values stored execution.
on a file system at yet another site (F) and also 5. Resources may require different
communicate among themselves (perhaps using authentication and authorization
specified protocols, such as multicast) and with the mechanisms and policies, which we will
broker, the original site, and the user. have limited ability to change. In figure 1,
This example illustrates many of the we indicate this situation by showing the
distinctive characteristics of the grid computing local access control policies that apply at
environment: the different sites.
1. The user population is large and dynamic. 6. An individual user will be associated with
Participants in such virtual organizations as different local name spaces, credentials, or
this scientific collaboration will include accounts, at different sites, for the purposes
members of many institutions and will of accounting and access control.
change frequently. 7. Resources and users may be located in
2. The resource pool is large and dynamic. different countries.
Because individual institutions and users To summarize, the problem we face is providing
decide whether and when to contribute security solutions that can allow computations, such
resources, the quantity and location of as the one just described, to coordinate diverse
available resources can change rapidly. access control policies and to operate securely in
heterogeneous environments.
3. A computation may require, start processes
on, and release resources dynamically
Single sign-on: A user should be able to that will need to coordinate their activities as a
authenticate once (e.g., when starting a group.
computation) and initiate computations that acquire Support for multiple implementations: The
resources, use resources, release resources, and security policy should not dictate a specific
communicate internally, without further implementation technology. Further, it should be
authentication of the user. possible to implement the security policy with a
Protection of credentials: User credentials range of security technologies, based on both public
(passwords, private keys, etc.) must be protected. and shared key cryptography.
Interoperability with local security 1. Authentications the process by which a
solutions: While our security solutions may provide subject proves its identity to a requester,
inter domain access mechanisms, access to local typically through the use of a credential.
resources} will typically be determined by a loc~ Authentication in which both parties (i.e.,
security policy that is enforced by a local security the requester and the requester) authenticate
mechanism. It is impractical to modify every local themselves to one another simultaneously is
resource to accommodate inter domain access; referred to as mutual authentication.
instead, one or more entities in a domain (e.g., inter 2. An object is a resource that is being
domain security servers) must act as agents of protected by the security policy.
remote clients/users for local resources. 3. A trust domain is a logical, administrative
Exportability: We require that the code be (a) structure within which a single, consistent
exportable and (b) executable in multinational test local security policy holds.
beds. In short, the exportability issues mean that our With these terms in mind, we define our
security policy cannot directly or indirectly require security policy as follows:
the use of bulk encryption. 1. The grid
Uniform credentials/certification
infrastructure: Inter domain access requires, at a
minimum, a common way of expressing the identity
environment consists of multiple trust
of a security principal such as an actual user or a
domains.
resource. Hence, it is imperative to employ a
2. Operations that are confined to single trust
standard (such as X.509v3) for encoding credentials
domain are subject to local security policy
for security principals.
only.
Support for secure group communication. A
3. Both global and local subjects exist. For
computation can comprise a number of processes
each trust domain, there exists a partial
mapping from global to local subjects.
4. Operations between entities located in
different trust domains require mutual
authentication.
5. An authentication global subject mapped We are interested in computational
into a local subject is assumed to be environments so the objects in our architecture
equivalent to being locally authenticated as would be computers, data repositories, networks
the local subject. and so forth. The subjects are users and processes.
6. All access control decisions are made locally Grid computations may grow and shrink
on the basis of the local subject. dynamically acquiring resources when required to
7. A program or process is allowed to act on solve a problem and releasing them when they are
behalf of a user and be delegated a subset of no longer needed. Each time a computation obtains
the user’s rights. a resource, it does so on behalf of a particular user.
8. Processes running on behalf of the same However, it is frequently impractical for that “user”
subject with in the same trust domain may to interact directly with each such resource for the
share a single set of credentials. purposes of authentication: the number of resources
d. Grid Security Architecture involved may be large, or, because some

The security policy defined in section c applications may run for extended period of time,

provides a context with in which we can construct the user may wish to allow a computation to operate

specific security architecture. In doing so, we without intervention. Hence, we introduce the

specify the set of subjects and objects that will be concept of a user proxy that can act on a user’s

under the jurisdiction of the security policy and behalf without requiring user intervention.

define the protocols that will govern interactions Definition: A user proxy is a session

between these subjects and objects. Fig 2 shows an manager process given permission to act on behalf

overview of out security architecture. of a user for limited period of time.

Figure 1: Protocol 1: User proxy creation


With in the architecture, we also define an
entity that represents a resource, serving as the
interface between the grid security architecture and
the local security architecture.
Definition: A resource proxy is an agent used to
translate between interdomain security operations
and local intradomain mechanisms.
Notations used in the rest of the paper are
shown in Table1.Within our architecture; we meet
the above requirement by allowing a user to “logon” hides the complexity from the user who will see the
to the grid system, creating a user proxy using grid as a huge computing and storage device.
Protocol 1. The user proxy can then allocate We have described security architecture for
resources using protocol 2. Using protocol 3, a large-scale distributed computations. The
process created can allocate additional resources architecture is immediately useful and, in addition,
directly. Finally, Protocol 4 can be used to define a provides a firm foundation for investigations of
mapping from a global to local subject. more sophisticated mechanisms.
We now describe each of these protocols in more Our architecture addresses most of the
detail. requirements introduced in Section 4.b. The
introduction of a user proxy addresses the single
sign-on requirement and also avoids the need to
communicate user credentials. The resource proxy
enables interoperated with local security solutions,
as the resource proxy can translate between
interdomain and intradomain security solutions.
Because encryption is not used within the
associated protocol, export control issues and hence
international use are simplified.
The security design presented addresses a
number of scalability issues. The sharing of
credentials by processes created by a single
resource allocation request means that the
. establishment of process credentials will not, we
4. Conclusion expect, be a bottleneck. The fact that resource

Traditional computing environments don’t allocation requests must pass via the user proxy is a

provide flexibility for sharing resources to form potential bottleneck this must be evaluated in

“virtual organizations”. Grid computing provides a realistic applications and, if required, addressed in

promising and efficient way of using computing and future work. One major scalability issue that is not

storage resources. It serves as “Computing on addressed is the number of users and resources.

Demand” model similar to the way electrical power Clearly, other approaches to the establishment of

is used. Ideal for collaborative environments global to local mappings ~ be required when the

because it provides dynamic resource sharing number of users and/or resources are large on

among different geographic locations and also it example is the use-condition approaches to
authorization. However, we believe the current
approach can deal with this.

5. References

 Foster and C. Kasehnan, editors.


Computational Grids: The Future of High
Performance Distributed Computing.
Morgan Kaubann, 1998.
 I. Foster and C. Kessehnan. The Globus
project: A progress report. In
Heterogeneous Computing Workshop,
March 1998.
 I. Foster, C. Kessehnan, and S. Tuecke. The
Nexus approach to integrating,
multithreading and communication. Journal
of Parallel and Distributed Computing.

You might also like