Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 12

GRID COMPUTING

TECHNICAL PAPER ON ICT

JOHN THOMAS

School of Management Studies


CUSAT Kochi-22
Email: johnjojoin@hotmail.com

Abstract: Grid computing is a form of distributed computing whereby a "super and virtual computer" is
composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large
tasks. This technology has been applied to computationally-intensive scientific, mathematical, and academic
problems through volunteer computing, and it is used in commercial enterprises for such diverse
applications as drug discovery, economic forecasting, seismic analysis, and back-office data processing in
support of e-commerce and web services.

What distinguishes grid computing from typical cluster computing systems is that grids tend to be more
loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing grid may be
dedicated to a specialized application, it is often constructed with the aid of general purpose grid software
libraries and middleware.

Keywords: Grid computing,Internet


INTRODUCTION

According to a 2008 paper published by IEEE Internet Computing,"Grid Computing is a paradigm in which
information is permanently stored in servers on the Internet and cached temporarily on clients that include
desktops, entertainment centers, table computers, notebooks, wall computers, handhelds, sensors,
monitors, etc."

The term Grid computing originated in the early 1997s as a metaphor for making computer power as easy to
access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a
new computing infrastructure".

CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in
1999 by SETI@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive
research problems.

The ideas of the grid (including those from distributed computing, object oriented programming, web
services and others) were brought together by Ian Foster, Carl Kesselman and Steve Tuecke, widely
regarded as the "fathers of the grid[1]." They led the effort to create the Globus Toolkit incorporating not just
computation management but also storage management, security provisioning, data movement, monitoring
and a toolkit for developing additional services based on the same infrastructure including agreement
negotiation, notification mechanisms, trigger services and information aggregation. While the Globus Toolkit
remains the defacto standard for building grid solutions, a number of other tools have been built that answer
some subset of services needed to create an enterprise or global grid.

During 2007 the term cloud computing came into popularity, which is conceptually similar to the canonical
Foster definition of grid computing (in terms of computing resources being consumed as electricity is from
the power grid). Indeed grid computing is often (but not always) associated with the delivery of cloud
computing systems.

Understanding Grid Computing

Grid computing describes both a platform and a type of application. A Grid computing platform dynamically
provisions, configures, reconfigures, and deprovisions servers as needed. Grid applications are those that
are extended to be accessible through the Internet. These Grid applications use large data centers and
powerful servers that host Web applications and Web services.

Shashi B Mal, Director, Systems & Technology Group, IBM India/South Asia explained, “Grid computing is
an emerging approach to shared infrastructure in which large pools of systems are linked together to provide
IT services. Grid Computing will allow corporate data centers to operate more like the Internet by enabling
computing across a distributed, globally accessible fabric of resources, rather than on local machines or
remote server systems. Organizations can use them as much as they want and as wireless broadband
connection options grow, wherever they need them.”

Grid computing describes how computer programs are hosted and operated over the Internet. The key
feature of Grid computing is that both the software and the information held in it live on centrally located
servers rather than on a end-user’s computer. A Google spokesperson added, “This means people can
access the information that they need from any device with an Internet connection—including mobile and
handheld phones—rather than being chained to the desktop. It also means lower costs, since there is no
need to install software or hardware.”

Grids versus conventional supercomputers

"Distributed" or "grid" computing in general is a special type of parallel computing which relies on complete
computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network
(private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to
the traditional notion of a supercomputer, which has many processors connected by a local high-speed
computer bus.

The primary advantage of distributed computing is that each node can be purchased as commodity
hardware, which when combined can produce similar computing resources to a multiprocessor
supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware,
compared to the lower efficiency of designing and constructing a small number of custom supercomputers.
The primary performance disadvantage is that the various processors and local storage areas do not have
high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel
computations can take place independently, without the need to communicate intermediate results between
processors.

The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for
connectivity between nodes relative to the capacity of the public Internet.

There are also some differences in programming and deployment. It can be costly and difficult to write
programs so that they can be run in the environment of a supercomputer, which may have a custom
operating system, or require the program to address concurrency issues. If a problem can be adequately
parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs to run on
multiple machines (but each given a different part of the same problem). This makes it possible to write and
debug on a single conventional machine, and eliminates complications due to multiple instances of the same
program running in the same shared memory and storage space at the same time.
Data Grid

A data grid is a grid computing system that deals with data — the controlled sharing and management of
large amounts of distributed data. These are often, but not always, combined with computational grid
computing systems.

Many scientific and engineering applications require access to large amounts of distributed data (terabytes
or petabytes). The size and number of these data collections has been growing rapidly in recent years and
will continue to grow as new experiments and sensors come on-line, the costs of computation and data
storage decrease and performances increase, and new computational science applications are developed.

Current large-scale data grid projects include the Biomedical Informatics Research Network (BIRN), the
Southern California Earthquake Center (SCEC), and the Real-time Observatories, Applications, and Data
management Network (ROADNet), all of which make use of the SDSC Storage Resource Broker as the
underlying data grid technology. These applications require widely distributed access to data by many
people in many places. The data grid creates virtual collaborative environments that support distributed but
coordinated scientific and engineering research.

In Memory Data Grid also referred to as IMDG. A data grid is a grid computing system that deals with data
— the controlled sharing and management of large amounts of distributed data. These are often, but not
always, combined with computational grid computing systems. Full description can be found under data grid
definition.

Space-Based Architecture

Space-Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of
stateful, high-performance applications using the tuple space paradigm. It follows many of the principles of
Representational State Transfer, Service-Oriented Architecture and Event-Driven Architecture, as well as
elements of grid computing. With a space-based architecture, applications are built out of a set of self-
sufficient units, known as processing-units (PU). These units are independent of each other, so that the
application can scale by adding more units.

The SBA model is closely related to other patterns that have been proved successful in addressing the
application scalability challenge, such as Shared-Nothing Architecture, used by Google, Amazon.com and
other well-known companies. The model has also been applied by many firms in the securities industry for
implementing scalable electronic securities trading applications.
Grid File System

A Grid File System is a computer file system whose goal is improved reliability and availability by taking
advantage of many smaller file storage areas.

Comparisons

Because current File Systems are designed to appear as a single disk for a single computer to manage
(entirely), many new challenges arise in a grid scenario whereby any single disk within the grid should be
capable of handling requests for any data contained in the grid.

Features

Most file storage utilizes layers of redundancy to achieve a high level of data protection (inability to lose
data). Current means of redundancy include replication and parity checks. Such redundancy can be
implemented via a RAID array (whereby multiple physical disks appear to a local computer as a single disk,
which may include data replication, and/or disk partitioning). Similarly, a Grid File System would consist of
some level of redundancy (either at the logical file level, or at the block level, possibly including some sort of
parity check) across the various disks present in the "Grid".

Framework

First and foremost, a File Table mechanism is necessary. Additionally, the file table must include a
mechanism for locating the (target/destination) file within the grid. Secondly, a mechanism for working with
File Data must exist. This mechanism is responsible for making File Data available to requests.

Implementation

With the recent advent of Torrent technology, a parallel can be drawn to a Grid File System, in that a torrent
tracker (and search engine) would be the "File Table", and the torrent applications (transmitting the files)
would be the "File Data" component. An RSS-Feed like mechanism could be utilized by File Table nodes to
indicate when new files are added to the table, to instigate replication and other similar components.

A File system which incorporates Torrent technology (distributed replication, distributed data
request/fulfillment) would likely be a good start for such a technology.

If both such systems (file table, and file data) were capable of being addressed as a single entity (ie: using
virtual nodes in a cluster), then growth into such a system could be easily controlled simply by deciding
which uses the grid member would be responsible (File Table and file lookups, and/or File Data).
Availability

Assuming there exists some method of managing data replication (assigning quotas, etc) autonomously
within the grid, data could be configured for high availability, regardless of loss or outage.

Troubles

The largest problem currently revolves around distributing data updates. Torrents support minimal heiarchy
(currently implemented either as metaData in the torrent tracker, or strictly as UI and basic categorization).
Updating multiple nodes concurrently (assuming atomic transactions are required) presents latency during
updates and additions, usually to the point of not being feasible. Additionally, a grid (network based) file
system breaks traditional TCP/IP paradigms in that a File System (generally low level, ring 0 type of
operations) require complicated TCP/IP implementations, introducing layers of abstraction and complication
to the process of creating such a grid file system.

Examples

Current examples of high available data include: Network Load Balancing / CARP - splitting incoming
requests to multiple computers, usually configured identically or as one whole Shared Storage Clustering /
SANs - a single disk (one or more physical disks acting as a single logical disk) is presented to multiple
computers which split incoming requests. This is usually used when more computing power is required than
disk access. Data Replication / Mirroring - multiple computers may attempt to synchronize data (usually
point-in-time or snapshot based). Used more often for either Reporting (based on last snapshot) or backup
purposes. Data Partitioning - splitting data among multiple computers. In databases, data is often partitioned
based on tables (certain tables exist on certain computers, or a table is split among multiple computers at
certain "break points")... general files tend to be partitioned either by category (cetegory based folders), or
location (geographically separated).

Grid computing would bring the benefits from many such solutions, if it were widely adopted.

Semantic Grid

The Semantic Grid refers to an approach to Grid computing in which information, computing resources and
services are described using the semantic data model. In this model the data and metadata are expressed
through facts (small sentences). Therefore it becomes directly understandable for humans. This makes it
easier for resources to be discovered and joined up automatically, which helps bring resources together to
create virtual organizations. The descriptions constitute metadata and are typically represented using the
technologies of the Semantic Web, such as the Resource Description Framework (RDF).
By analogy with the Semantic Web, the Semantic Grid can be defined as "an extension of the current Grid in
which information and services are given well-defined meaning, better enabling computers and people to
work in cooperation."

This notion of the Semantic Grid was first articulated in the context of e-Science, observing that such an
approach is necessary to achieve a high degree of easy-to-use and seamless automation enabling flexible
collaborations and computations on a global scale.

The use of Semantic Web and other knowledge technologies in Grid applications is sometimes described as
the Knowledge Grid. Semantic Grid extends this by also applying these technologies within the Grid
middleware.

Some Semantic Grid activities are coordinated through the Semantic Grid Research Group of the Global
Grid Forum.

Architecture

The majority of Grid computing infrastructure currently consists of reliable services delivered through next-
generation data centers that are built on compute and storage virtualization technologies. The services are
accessible anywhere in the world, with The Grid appearing as a single point of access for all the computing
needs of consumers. Commercial offerings need to meet the quality of service requirements of customers
and typically offer service level agreements. Open standards and open source software are also critical to
the growth of Grid computing.

Grid computing comes into focus only when you think about what IT always needs: a way to increase

capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or

licensing new software. Grid computing encompasses any subscription-based or pay-per-use service that, in

real time over the Internet, extends IT's existing capabilities.

Grid computing is at an early stage, with a motley crew of providers large and small delivering a slew of

Grid-based services, from full-blown applications to storage services to spam filtering. Yes, utility-style

infrastructure providers are part of the mix, but so are SaaS (software as a service) providers such as

Salesforce.com. Today, for the most part, IT must plug into Grid-based services individually, but Grid

computing aggregators and integrators are already emerging.

InfoWorld talked to dozens of vendors, analysts, and IT customers to tease out the various components of

Grid computing. Based on those discussions, here's a rough breakdown of what Grid computing is all about:

1.SaaS
This type of Grid computing delivers a single application through the browser to thousands of customers

using a multitenant architecture. On the customer side, it means no upfront investment in servers or

software licensing; on the provider side, with just one app to maintain, costs are low compared to

conventional hosting. Salesforce.com is by far the best-known example among enterprise applications, but

SaaS is also common for HR apps and has even worked its way up the food chain to ERP, with players

such as Workday. And who could have predicted the sudden rise of SaaS "desktop" applications, such as

Google Apps and Zoho Office?

2. Utility computing
The idea is not new, but this form of Grid computing is getting new life from Amazon.com, Sun, IBM, and

others who now offer storage and virtual servers that IT can access on demand. Early enterprise adopters

mainly use utility computing for supplemental, non-mission-critical needs, but one day, they may replace

parts of the datacenter. Other providers offer solutions that help IT create virtual datacenters from

commodity servers, such as 3Tera's AppLogic and Cohesive Flexible Technologies' Elastic Server on

Demand. Liquid Computing's LiquidQ offers similar capabilities, enabling IT to stitch together memory, I/O,

storage, and computational capacity as a virtualized resource pool available over the network.
3. Web services in the Grid
Closely related to SaaS, Web service providers offer APIs that enable developers to exploit functionality

over the Internet, rather than delivering full-blown applications. They range from providers offering discrete

business services -- such as Strike Iron and Xignite -- to the full range of APIs offered by Google Maps, ADP

payroll processing, the U.S. Postal Service, Bloomberg, and even conventional credit card processing

services.

4.Platform as a service
Another SaaS variation, this form of Grid computing delivers development environments as a service. You

build your own applications that run on the provider's infrastructure and are delivered to your users via the

Internet from the provider's servers. Like Legos, these services are constrained by the vendor's design and

capabilities, so you don't get complete freedom, but you do get predictability and pre-integration. Prime

examples include Salesforce.com's Force.com, Coghead and the new Google App Engine. For extremely

lightweight development, Grid-based mashup platforms abound, such as Yahoo Pipes or Dapper.net.

5. MSP (managed service providers)


One of the oldest forms of Grid computing, a managed service is basically an application exposed to IT

rather than to end-users, such as a virus scanning service for e-mail or an application monitoring service

(which Mercury, among others, provides). Managed security services delivered by SecureWorks, IBM, and

Verizon fall into this category, as do such Grid-based anti-spam services as Postini, recently acquired by
Google. Other offerings include desktop management services, such as those offered by CenterBeam or

Everdream.

6. Service commerce platforms


A hybrid of SaaS and MSP, this Grid computing service offers a service hub that users interact with. They're

most common in trading environments, such as expense management systems that allow users to order

travel or secretarial services from a common platform that then coordinates the service delivery and pricing

within the specifications set by the user. Think of it as an automated service bureau. Well-known examples

include Rearden Commerce and Ariba.

Today, with such Grid-based interconnection seldom in evidence,

Grid computing might be more accurately described as "sky computing," with many isolated Grids of

services which IT customers must plug into individually. On the other hand, as virtualization and SOA

permeate the enterprise, the idea of loosely coupled services running on an agile, scalable infrastructure

should eventually make every enterprise a node in the Grid. It's a long-running trend with a far-out horizon.

But among big metatrends, Grid computing is the hardest one to argue with in the long term

Grid computing goes beyond convenience. A generation used to posting and sharing photos on orkut,
instant messaging with friends and interacting online for a good chunk of its spare time is having an impact
on expectations in the workplace. By reducing the traditional costs and labor associated with deploying,
maintaining and upgrading business technology, IT departments are increasingly becoming free to devote
their limited resources to projects more strategic to the business. And since software lives in the Grid, it can
be improved as often as needed without tying up the IT department or inconveniencing users. This
“versionless” software eliminates upgrade projects and helps technology keep pace with the speed of
business, giving employees access to new technology early and often rather than forcing them to wait for a
final, packaged product to be shipped.

The architecture used at Salesforce.com consists of Force.com Development as a Service, a set of


development tools and APIs that enable enterprise developers to easily harness the promise of Grid
computing. Jeremy Cooper, Vice President Marketing Asia Pacific & Japan, Salesforce.com said,
“Development-as-a-Service provides full access to the database, logic and user interface capabilities of the
Force.com Platform and unites the productivity of development and IT collaboration tools with the power of
Force.com Platform-as-a-Service. Force.com Development-as-a-Service includes the new Force.com Meta
data API, the Force.com Integrated Development Environment (IDE), the Force.com Sandbox, and
Force.com Code Share to provide developers with a comprehensive set of services to build enterprise
Software-as-a-Service applications.”
The Force.com Platform provides the necessary building blocks to enable business application creation and
delivery, without the need for software and hardware client-server infrastructure via the Internet. “By
replacing the cost and complexity of software platforms with a complete, scalable service, Force.com
provides developers the fastest path to turn ideas into business impact. The Force.com encompasses a
complete feature set for the creation of business applications, including the ability to create any database on
demand, a workflow engine for managing collaboration between users, the Apex Code programming
language for building complex logic, the Force.com Web Services API for programmatic access, mash-ups,
and integration with other applications and data, and Visualforce for a framework to build any User-Interface-
as-a-Service,” added Jeremy.

Advantages of the Grid

According to a Google spokesperson, Grid computing is particularly valuable to small and medium
businesses, where effective and affordable IT tools are critical to helping them become more productive
without spending lots of money on in-house resources and technical equipment. “But we are seeing large
businesses moving to the Grid as well, for a variety of reasons, such as cost savings, remote access, ease
of availability and real-time collaboration capabilities. The time is right for business users to embrace online
applications in ways that make sense for them. There will always be a need for desktop applications that
provide advanced functionality that’s best delivered from a client such as high powered computations in
spreadsheets; but less and less work is like that.”

Storing data in the Grid already has some distinct advantages over client-based access. We can leverage
the sheer processing power of the Grid to do things that traditional productivity applications cannot do. “For
instance, users can instantly search over 25 GB worth of e-mail online, which is nearly impossible to do on a
desktop. To take another example, each document created through Google Apps is easily turned into a
living information source, capable of pulling the latest data from external applications, databases and the
Web. This revolutionizes processes as simple as creating a Google spreadsheet to compare stock prices
from vendors over time, because the cells can be populated and updated as the prices change in real time,”
explained Google spokesperson.

Grid computing offers almost unlimited computing power and collaboration at a massive scale for
enterprises of all sizes. “The Force.com Platform-as-Service provides the necessary building blocks to make
Grid computing real for the enterprise. Our customers have already developed applications in the Grid such
as Accounts Receivable, Bug Enhancement Tracking, Employee Compliance and Training, Emergency
Room Staffing, Expense Reporting, Food Ingredient Management, Recruiting, Time Management,” said
Cooper.

“As part of the Force.com Grid Computing Architecture, we have announced a new utility pricing model for
the Force.com Platform. The pricing gives customers the flexibility to deploy Platform-as-a-Service
applications to users throughout their enterprise based on their usage needs and patterns. CIOs and IT
managers now have the power to deploy Force.com for unlimited or per-login usage for regular or
occasional users depending on the specific needs of their enterprise,” he concluded.
Conclusion

The Grid computing, by separating enterprises from their servers and offering universal (secured) access to
the servers, allows Grid providers or third parties to bundle the computing with value-added services starting
with simple management and going all the way up to full outsourced IT operations. This combination of
scalable computing and services is what truly lowers the technological and cost barriers to entry in the web-
facing application market. The biggest savings comes in the form of reduced capital (from utility billing) and
staffing investments (due to sharing services with other customers) needed to reach enterprise-class service
delivery standards necessary to run a SaaS business. This is truly the breakthrough that Grid Computing
offers, allowing anyone with an idea and a little programming skill to operate a commercially viable website
such as a SaaS service. It also means that there will be a host of other Grid providers other than the Big
Three providing both computing and services with unique offerings tailored to meet a wide variety of needs.

References

1. http://www.thestandard.com/article/0,1902,5466,00.html

2. http://knowledge.wpcarey.asu.edu/article.cfm?articleid=1614

3. www.seekingalpha.com, July 29, 2008.

4. www.capgemini.com/ctoblog/2008/06/Grid_computing_the_invisible.php

5. emergic.org/2008/09/08/saas-Grid-computing/

You might also like