Report

Data Grid Management Portal
Done as part of Course Curriculum
Web Technology ITM112-05
Project Report
by
Nithya Sam
Pradheepa.C
Rashmi Nair
Vimi Soman
Indian Institute of Information

Technology & Management – Kerala
Trivandrum, Kerala 695 581
November 22, 2006

Contents
1 Abstract 3
2 Introduction 4
3 Problem Analysis 6
4 Review of related work 7
5 Design and Implementation 9

5.1 Use Case Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Technologies Used: . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1 Grid Computing . . . . . . . . . . . . . . . . . . . . . 12
5.2.2 Globus Toolkit . . . . . . . . . . . . . . . . . . . . . . 14
5.2.3 OGSA−DAI WSRF 2.2 . . . . . . . . . . . . . . . . . . 16
5.2.4 GridSphere . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Results and Evaluation 28

6.1 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Integration testing . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 System testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7 Timeline 32
8 Conclusion 33
9 Screen Shots 35
List of Figures
1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . 9
2 Grid Admin USE CASE diagram . . . . . . . . . . . . . . . . 10
3 VO Admin USE CASE diagram . . . . . . . . . . . . . . . . . 11
4 User USE CASE diagram . . . . . . . . . . . . . . . . . . . . . 12
5 Grid overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Grid Admin decides the policies . . . . . . . . . . . . . . . . 14
7 OGSA−DAI Architecture . . . . . . . . . . . . . . . . . . . . 18
8 Interaction between Data Resources . . . . . . . . . . . . . . 22
9 Timeline diagram . . . . . . . . . . . . . . . . . . . . . . . . . 32
10 ScreenShot1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1
11 ScreenShot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
12 ScreenShot3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
13 ScreenShot4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
14 ScreenShot5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
15 ScreenShot6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
16 ScreenShot7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
17 ScreenShot8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2
1 Abstract
The aim of this project is to develop a portal, which involves scientists
who are interested in computations that depend on data from several data
sources. In general, the ability to securely access diverse data sources in
a collaborative environment, called networked Virtual Organization (VO),
is becoming an essential requirement for optimizing the design and devel-
opment phase of the product lifecycle. Dynamic and secure transparent
data access as well as integration of heterogeneous data sources is a key
issue in many collaboration environments today.
In order to offer a framework that facilitates the coordination and coop-

eration between distributed data sources our project has adopted OGSA-
DAI (Open Grid Services Architecture - Data Access and Integration) mid-
dleware framework. OGSA-DAI has been exploited as a generic high-
level grid middleware, which offers capabilities such as data federation
and distributed query processing (DQP). To provide front-end interface
for accessing data resources Gridsphere framework is used. It provides
a portal based framework, which has customized high-level application
programming interface (API) for developing Grid-enabled portlets.
This project is done as a part of fulfillment of the course ITM112-05 (Web

Technology).
3
2 Introduction
In an increasing number of scientific disciplines, large data collections
are emerging as important resources. The volume of scientific data can
even be measured in terabytes. The communities of researchers that need
to access and analyze this data are often large and are almost always ge-
ographically distributed, as are the computing and storage resources that
these communities rely upon to store and analyze their data.
Data Grid Management Portal is designed for managing data sources on

a grid. This project is intended for scientists who are interested in activi-
ties like computations, wherein data is distributed and it is in different for-
mats - Relational, XML and files. The existing system has several servers
accessing databases, hosting data from different domains. Data from these
different database servers need to be brought into and then processed on
the servers.
This project addresses this requirement by providing a portal based

framework for managing the access to various databases on a grid through
a single sign-on facility. This portal enables scientists to do their work
more efficiently and securely, without having to worry too much about
how the underlying grid infrastructure implements the discovery, access
and management of resources needed for that computation.
The Grid is an emerging infrastructure that supports the discovery, ac-

cess and use of distributed computational resources. Grid computing re-
flects a conceptual framework rather than a physical resource. It involves
sharing heterogeneous resources, located in different places belonging to
different administrative domains over a network using open standards. In
short, it involves virtualizing computing resources.
Grid ensures security through authorization and access control tools for
supporting decentralized control mechanisms. A central concept in GSI
(Grid Security Infrastructure) [7] authentication is the certificate. Every user
and service on the Grid is identified via a certificate, which contains infor-
mation vital to identifying and authenticating the user or service. Once
mutual authentication is performed, GSI gets out of the way so that com-
munication can occur without the overhead of constant encryption and
decryption.
4
To facilitate the coordination and cooperation between distributed data
sources we use OGSA-DAI(Open Grid Services Architecture-Data Access and
Integration) middleware framework. It provides a means for users to Grid
enable their data resources including relational, XML and files. This frame-
work provides a compact way of handling multiple interactions with a
service within a single request via an XML document, called a perform
document, where data is pipelined between different set of activities that
operate on a data stream coming out a data resource, or going into a data
resource. Whenever a client submits a perform document to a data service,
a response document will be received in return. A response document de-
scribes the status of execution of a perform document and also identifies
the session that the request was joined. With Data Access and Integration
(DAI) heterogeneous resources are seen as a single virtual resource on a
grid.
In future we will be able to execute queries in parallel over OGSA-DAI

data services and other services on the Grid using OGSA-DQP (Open Grid
Services Architecture Distributed Query Processor). It will combine data ac-
cess with analysis. OGSA-DQP uses Grid Data Services (GDSs) provided
by OGSA-DAI to hide data source heterogeneities and ensure consistent
access to data and metadata. OGSA-DQP will accept queries from a front
end interface then construct, optimize and execute the query.
In addition to it we are using GridSphere portal framework for building

the front end of the portal. The Gridsphere framework is packaged as a
web application that provides a portlet container for managing deployed
portlets. Portlets provide ”mini-applications” that can either display in-
formational content or provide access to other services. It has high-level
application programming interface (API) for developing customized Grid-
enabled portlets.
5
3 Problem Analysis
The existing system has several servers accessing databases, hosting
data from different domains and of different formats. Data from these
different database servers need to be brought into and then processed on
the servers.
The following are the problems faced by the existing system
Data Resources are distributed Databases distributed at various loca-

tions need to be connected through a single interface
Databases contain data of different formats Databases contain data of

different formats - Relational, XML and files. Our portal should provide a
uniform interface for accessing different databases.
Integrating data from legacy systems The system developed should be

able to interface with existing legacy data and computational systems that
are already running.
Accessing data through web is not secure Web services must provide
authentication, authorization as well as supporting functions for manag-
ing user credentials.
No single sign-on facility Individual users of the system should have a

web-based interface where they can specify, launch, and control jobs
6
4 Review of related work
Biogrid project
Purpose: For biology and medical science.
The DataGrid group in the Biogrid project [1] has developed a system
for the federation of bio-related databases with Globus Toolkit 3.0.2/OGSA-
DAI, aiming at an application to Drug Discovery. The system has cur-
rently bridged 11 databases in heterogeneous communities: such as biol-
ogy, medical science and pharmaceutics. Under this project, Osaka Uni-
versity and other relevant institutions are in the process of developing a
Computer Grid Technology to meet IT needs specialized in biology and
medical science. Apart from research activities, the project also includes
peta grid technology business development and educational training in
bioinformatics.
DataTAG project
Purpose: To create a large-scale intercontinental Grid testbed
The DataTAG project [2] creates a large-scale intercontinental Grid testbed

that focus upon advanced networking issues and interoperability between
these intercontinental Grid domains, hence extending the capabilities of
each and enhancing the worldwide program of Grid development.
This project addresses the issues, which arise in the sector of high per-
formance inter-Grid networking, including sustained and reliable high
performance data replication, end-to-end advanced network services, and
novel monitoring techniques. It also addresses the issues, which arise in
the sector of interoperability between the Grid middleware layers such as
information and security services.
GARUDA
Purpose: Grid networked computing to research labs and industry
7
GARUDA [3]is a collaboration of science researchers and experimenters
on a nation wide grid of computational nodes, mass storage and scientific
instruments that aims to provide the technological advances required to
enable data and compute intensive science for the 21st century. One of
GARUDAs most important challenges is to strike the right balance be-
tween research and the daunting task of deploying that innovation into
some of the most complex scientific and engineering endeavors being un-
dertaken today.
The Department of Information Technology (DIT), Government of In-

dia has funded the Centre for Development of Advanced Computing (C-
DAC) to deploy the nation-wide computational grid GARUDA which will
connect 17 cities across the country in its Proof of Concept (PoC) phase
with an aim to bring Grid networked computing to research labs and in-
dustry. GARUDA will accelerate India’s drive to turn its substantial re-
search investment into tangible economic benefits. This project
• Brings together all potential research, development and user groups

to develop a national initiative on Grid computing
• Created a test bed for the research and engineering of technologies,

architectures, standards and applications in Grid Computing
• Created the foundation for the next generation grids by addressing

long term research issues in grid computing
8
5 Design and Implementation
Overall Architecture of the system showing the portal, grid data services
and the databases is shown below:
Figure 1: Overall Architecture
The databases shown in the figure are heterogeneous and are located
remotely. The aim of our project is to design a portal having a single sign-
on facility to access these databases and to ultimately make a join from
these databases and present an output as requested by the user. The Grid
Data Services includes a software framework called Open Grid Services
Architecture (OGSA), which forms the interface between the application
and data resource. The grid-based framework is provided by a software
toolkit called Globus. The front-end interface is provided by gridsphere.
5.1 Use Case Diagrams

Use case diagrams Use case diagrams depict the functions that the vari-
ous actors of the system can use. The following sections provide use case
diagrams for the Grid Admin, VO Admin, user.
9
Grid Admin USE CASE diagram
• Grid Administrator
• Log in
• Create a new virtual organization
• Modify an existing virtual organization
• View details of an existing virtual organization
• Delete an existing virtual organization
• Change his password
• Logout
Figure 2: Grid Admin USE CASE diagram
VO Admin USE CASE diagram

• VO (Virtual Organization) Administrator
• Log in
10
• Create a new user
• View details of an existing user
• Delete an existing user
• Add new resource
• Logout
Figure 3: VO Admin USE CASE diagram
User USE CASE diagram

• User
• Log in
• Query a database
• Navigate through database
• Logout
11
Figure 4: User USE CASE diagram
5.2 Technologies Used:

The following software technologies are used in our project:
• Grid Computing
• Globus Toolkit
• OGSA−DAI
• Gridsphere
5.2.1 Grid Computing

An environment that provides the ability to share and transparently ac-
cess resources across a distributed and heterogeneous environment not
only requires the technology to virtualize certain resources, but also tech-
nologies and standards in the areas of scheduling, security, accounting,
systems management, and so on. cross-organizational grids are also being
implemented and will be an important part of computing and business
optimization in the future.
The distinctions between intraorganizational grids and interorganiza-
tional grids are not based in technological differences. Instead, they are
12
based on configuration choices given: Security domains, degrees of isola-
tion desired, type of policies and their scope, and contractual obligations
between users and providers of the infrastructures. These issues are not
fundamentally architectural in nature
Grid computing involves an evolving set of open standards for Web ser-
vices and interfaces that make services, or computing resources, available
over the Internet. One definition of grid computing to be distributed com-
puting across virtualized resources.
The goal is to create the illusion of a simple yet large and powerful
virtual computer out of a collection of connected (and possibly heteroge-
neous) systems sharing various combinations of resources.
Figure 5: Grid overview
Capability enabled by grid computing is to provide an environment

for collaboration among a wider audience.
VIRTUAL ORGANIZATION
The users of the grid can be organized dynamically into a number of vir-
tual organizations, each with different policy requirements. These virtual
organizations can share their resources collectively as a larger grid.
13
The participants and users of the grid can be members of several real
and virtual organizations. The grid can help in enforcing security rules
among them and implement policies, which can resolve priorities for both
resources and users.
Administrators can change any number of policies that affect how the
different organizations might share or compete for resources.
Figure 6: Grid Admin decides the policies
Grid computing enables organizations (real and virtual) to take ad-

vantage of various computing resources in ways not previously possible.
They can take advantage of under utilized resources to meet business re-
quirements while minimizing additional costs. The nature of a computing
grid allows organizations to take advantage of parallel processing, making
many applications financially feasible as well as allowing them to com-
plete sooner. Grid computing makes more resources available to more
people and organizations .
5.2.2 Globus Toolkit

The open source Globus Toolkit is a fundamental enabling technology
for the ”Grid”, letting people share computing power, databases, and other
tools securely online across corporate, institutional, and geographic bound-
aries without sacrificing local autonomy.The toolkit includes software ser-
14
vices and libraries for resource monitoring, discovery, and management,
plus security and file management.
The toolkit includes software for security, information infrastructure, re-

source management, data management, communication, fault detection,
and portability. It is packaged as a set of components that can be used
either independently or together to develop applications.
Globus Toolkit 4 installation and configuration
Globus Toolkit 4 is supported on a variety of operating systems. Binary

packages are available for Linux environments (SuSE Linux 9/8, Red Hat
Linux 9, Fedora Core Linux 2/3, and Debian 3.1), and Solaris 9.
Packages of Globus Toolkit 4
Globus Toolkit 4 is available in three ways.
• Download the full binary package from the Globus site.
• Download the full source package from the Globus site.
• Get source codes from a CVS server.
We used the binary package.
Installation
For Globus Toolkit 4 installation the following software are required to

be installed.
• Java SDK − 1.4.2 or later
• Apache Ant 1.5.1 or later
15
To install from a binary package:
1. Obtain the Globus Toolkit 4 binary package from the Globus site.
2. Extract the binary package as the Globus user.

[globus@hosta] $ tar xvzf gt4.0.0−ia32 − redhat9 − binary − installer.tar.gz
−C /tmp
3. Set environmental variables for the Globus location.

[globus@hosta] $ export GLOBUS LOCATION = /usr/local/globus
− 4.0.0
4. Create and change the ownership of directory for user and group
globus.
[globus@hosta] $ su
Password:
[root@hosta] # mkdir $ GLOBUS LOCATION
[root@hosta] # chown globus:globus $ GLOBUS LOCATION
[root@hosta] # exit
exit
[globus@hosta] $
5. Configure and install Globus Toolkit 4
[globus@hosta] $ cd /tmp/gt4.0.0 − ia32 − redhat9 − binary − in-

staller
[globus@hosta] $ ./configure –prefix = $ GLOBUS LOCATION

[globus@hosta] $ make 2 > &1 | tee build.log cd gpt − 3.2auto-
tools2004 && OBJECT MODE = 32 ./build gpt
[globus@hosta] $ make install
5.2.3 OGSA−DAI WSRF 2.2

OGSA−DAI WSRF 2.2 [5] is a Globus Toolkit 4.0.3 − based version of
OGSA−DAI. There are two distributions − one source and one binary.
The distributions also contain the minimal set of Globus Toolkit 4.0.3 JARs
required to run OGSA−DAI WSRF clients. The releases are still compati-
ble with Globus Toolkit 4.0.1 and 4.0.2.
16
OGSA−DAI is a middleware product that allows data resources, such as
relational or XML databases, to be accessed via web services. An OGSA−DAI
web service allows data to be queried, updated, transformed and deliv-
ered. OGSA−DAI can be used to provide web services that offer data
integration services to clients. OGSA−DAI provides a means for users to
Grid enable their data resources.
OGSA−DAI can support the following:
• Different types of data resources − including relational, XML and

files − can be exposed via web services. A number of popular data
resource products are supported.
• Data within each of these types of resource can be queried and up-
dated.
• Data can be transformed (using XSLT), compressed and decompressed

(using ZIP and GZIP compression).
• Data can be delivered to clients, other OGSA−DAI web services,

URLs, FTP servers, GridFTP servers, or files.
• Requests to OGSA−DAI web services have a uniform format irre-

spective of the data resource exposed by the service (though the ac-
tions specified within each request may be data resource−specific).
• Provides information about the data resources exposed by an OGSA−DAI

web service and the functionality supported by the service to clients.
• OGSA−DAI web services may be extended to expose their own data

resources and to support application−specific functionality, in addi-
tion to that already provided by this OGSA−DAI distribution.
OGSA−DAI provides web services compliant with two popular web

services specifications:
• Web Services Inter−operability (WS/−I).
• Web Services Resource Framework (WSRF).
17
Figure 7: OGSA−DAI Architecture
OGSA−DAI Architecture
Data Layer
The data layer consists of the data resources that can be exposed via
OGSA-DAI. Currently these include:
• Relational databases such as MySQL, SQL Server, DB2, Oracle, Post-

greSQL.
• XML databases such as eXist, Xindice.
• Files and directories in formats such as OMIM, SWISSPROT and

EMBL.
Data Layer ←→ Business Logic Layer Interface
This interface allows information to be communicated between the data

layer and the business logic layer in both directions. It is realised by com-
ponents known as data resource accessors. Each data service resource
has its own data resource accessor which controls access to an underlying
18
data resource. OGSA-DAI includes data resource accessors for relational
databases, XML databases and file systems. This component is an exten-
sibility point, so users can develop their own data resource accessors in
order to expose new kinds of data resource.
Business Logic Layer
This layer encapsulates the core functionality of OGSA−DAI. It consists

of components known as data service resources. Multiple data service
resources can be deployed to expose multiple data resources. There is a
one−one relationship between data service resources and data resources.
The responsibilities of a data service resource include:
• Execution of perform documents - a perform document describes
the actions that a data service resource should take on behalf of the
client. Each action is known as an activity. OGSA-DAI already in-
cludes a large number of activities for performing common opera-
tions such as database queries, data transformations and data deliv-
ery.
• Generation of response documents - a response document describes
the status of execution of a perform document and may contain re-
sult data, such as the results from a database query.
• Data resource access - interactions with data resources take place via
the data resource accessor component.
• Data transport functionality - data can be streamed in and out of
data service resources to and from clients and other data service re-
sources.
• Session management - the creation, access and termination of ses-
sion objects allowing state to be stored across multiple requests to
the data service resource. All perform document requests are pro-
cessed within a session. Sessions are also used for storing the streams
used by the data transport functionality. These are known as session
streams.
• Property management - the creation, access and removal of proper-
ties associated with the data service resource. These are known as
data service resource properties and are generally used for exposing
metadata such as the status of a request or the schema of the under-
lying data resource.
19
Business Logic Layer → Presentation Layer Interface
This interface allows information to be communicated between the busi-

ness logic layer and the presentation layer in both directions. It supports
the invocation of OGSA-DAI functionality within the business logic layer
in a way that is independent of a particular web environment. In fact,
the same interfaces could even be used by a stand-alone client application
outside of any web environment.
When SOAP Requests arrive at OGSA-DAI WSRF or WSI data services,

this interface is used to pass information and instructions to and from the
business logic layer. The flow of information between these two layers is
described below.
Presentation Layer → Business Logic Layer
• Data service resource names, data service resource property names,

session stream identifiers.
• Client proxy certificates and credentials in a web-independent for-

mat.
• Perform documents and data from clients.
• Data service resource configuration information including data re-

source drivers, data resource URIs, database user names and pass-
words, information on supported activities, information on data ser-
vice resources session and concurrency support.
Business Logic Layer → Presentation Layer
• Response documents and result data.
• The status of the request processing within a particular session. This

is known as the session request status.
• Data service resource property values such as the schema of the un-
derlying data resource.
• Information about the activities that are supported by the data ser-
vice resource. These are the activities that can be requested by the
user within perform documents.
20
Presentation Layer
This layer encapsulates the functionality required to expose data service

resources using web service interfaces. OGSA-DAI includes two realisa-
tions, one compliant with WSRF and the other compliant with WSI
Client Layer
A client can interact with a data service resource via a corresponding

data service. OGSA-DAI also includes a Java Client Toolkit which pro-
vides a higher-level API for interacting with data services. The Client
Toolkit simplifies the development of client applications by providing con-
venient ways to construct and send requests and interpret the subsequent
responses.
Interaction between Data Resources
OGSA−DAI supports interaction with data service resources via a document-

oriented interface. The client does not speak directly to the data service
resource but instead sends a perform document to a data service. Perform
documents are used by clients to instruct data service resources to perform
activities. These activities may include data resource queries and updates,
data transformations or data delivery operations. Perform documents are
expressed using XML. The data service then forwards the document to
a data service resource, which represents the actual data resource. The
data service resource interprets the perform document and performs the
actions described within it. These actions may involve interacting in some
way with the underlying data resource, for instance, by executing an SQL
query statement. A response document describing the results of the re-
quest is then composed by the data service resource and returned to the
client via the data service. A response document describes the status of
execution of a perform document and also identifies the session that the
request was joined to. Depending on the activities specified in the per-
form document, the response document may also contain result data such
as the XML-encoded results of a database query.
Data service resources are able to process multiple requests concurrently.

When the concurrency limit has been reached, the data service resource
begins queuing further requests. The concurrency limit and maximum
21
Figure 8: Interaction between Data Resources
queue size are specified when a data service is deployed and can be ad-
justed later through properties of the web service deployment descriptor.
Installation of OGSA-DAI
Prerequisite Software To use OGSA-DAI WSRF we need the following

software:
• Java 1.4.x and above OGSA-DAI WSRF has been tested on various
versions of Java 1.4 and also
Java 1.5. We used Java 1.5.0 for our project.
• Apache ANT 1.5 or 1.6 .
We used Apache ANT 1.6.5 for our project.
• Globus Toolkit 4.0.3 Java Web Services Core
• Apache Tomcat
To use OGSA-DAI WSRF with Tomcat we have to deploy GT4 onto
Tomcat as shown below:
• Set a GLOBUS LOCATION environment variable to point to the
location of your GT4 distribution.
$ export GLOBUS LOCATION=/ path / to / Globus / directory
• Set a CATALINA HOME environment variable to point to the loca-
tion of Tomcat.
$ export CATALINA HOME=/ path / to / Tomcat / directory
22
• Run the following commands :
$ cd path/to/Globus/directory
$ ant -f share/globus wsrf common / tomcat / tomcat.xml deploy-
Tomcat -tomcat.dir= / path / to / Tomcat
$ cd path / to / OGSA-DAI / binary / directory
To Expose Data Services using OGSA-DAI To expose a data resource

via an OGSA-DAI WSRF data service is a three-step process:
1. Deploy an OGSA-DAI data service. This data service initially ex-

poses 0 data service resources.
2. Deploy a data service resource. The data service resource contains

information about a data resource and the activities clients can per-
form
3. Request that the data service expose the data service resource. This
instructs the data service to expose the data service resource and so
allows clients to interact with the data service resource, thereby in-
teracting with a data resource.
1. Deploy a new data resource
(a) Run the following command :

$ ant deployService -Ddai.container = /path/to/Web/services/container
-dai.service.name=service/name
• dai.container specifies the path to your Web services con-
tainer.
• dai.service.name specifies the name of the service.
(b) The data service will be deployed onto Web services container.
We need to shutdown and restart the container before clients
are able to access the new data service.
(c) Test the data service resource using the ListResources client, en-
suring that the service is available and exposes 0 data service
resources.
$ ant listResourcesClient - Ddai.url=http://localhost:8080/wsrf/services/
ogsadai/DataService
2. Deploy a new Data Service Resource
23
(a) The characteristics of a data service resource are specified in a
data service resource file. A data service resource file is used
to specify the properties of your data service resource. This is a
simple properties file consisting of argument=value values. You
can reuse the same file to deploy the same data service resource
under different names or to customise a specific configuration
in small or major ways. In the OGSA-DAI WSRF distribution
directory there is data.service.resource.properties file. The file
is fully commented.
The properties are as follows:
• dai.resource.id = - name for the data service resource.
• dai.data.resource.type = [Relational | XML | Files | MultiRe-
source] - the type of data resource to which the data service
resource provides access.
• dai.product.name = - data resource product name (optional
- ignored for files and multi resources).
• dai.product.vendor = - data resource product vendor (op-
tional - ignored for files and multi resources).
• dai.product.version = - data resource product version (op-
tional - ignored for files and multi resources).
• dai.data.resource.uri = - data resource URI. This must be
compatible with the driver class specified next (ignored for
multi resources).
• dai.driver.class = - data resource driver class name (ignored
for files and multi resources).
• dai.credential = - Grid certificate credentials of a user per-
mitted to access the data resource. If omitted then any user
will be allowed access (ignored for multi resources).
• dai.user.name = - data resource user name. Optional only if
there is no user name required for a database (ignored for
multi resources).
• dai.password = - corresponding data resource password.
Optional if there is no user name required, or if the pass-
word is null (ignored for multi resources).
• dai.data.service.resource.uri.one = - for multi resources only,
the URL of a service exposing the first of the aggregated re-
lational resources.
• dai.data.service.resource.id.one = - for multi resources only,
the ID of the first of the aggregated relational resources.
24
• dai.data.service.resource.description.one = - for multi resources
only, a description of the first of the aggregated relational
resources (optional).
• dai.data.service.resource.uri.two = - for multi resources only,
the URL of a service exposing the second of the aggregated
relational resources.
• dai.data.service.resource.id.two = - for multi resources only,
the ID of the second of the aggregated relational resources.
• dai.data.service.resource.description.two = - for multi re-
sources only, a description of the second of the aggregated
relational resources (optional).
(b) Run the following command from within the OGSA-DAI WSRF
binary distribution directory:
$ ant deployResource -Ddai.container = /path/to/Web/services/container
-Ddai.resource.file = DAI-SERVICE-RESOURCE-FILE
tainer.
• dai.resource.file specifies the location of a data service re-
source properties file.
(c) Exposing data services
• Run the following command from within the OGSA-DAI
WSRF binary distribution directory:
$ ant exposeResource -Ddai.container = /path/to/Web/services
/container
-Ddai.service.name = service/name
-Ddai.resource.id = ResourceID
tainer.
• dai.service.name specifies the name of the service.
• dai.resource.id= is the ID of the data service resource that
the data service is to expose.
(d) We need to shutdown and restart Web services container before
clients are able to access the new data service resource via the
data service. After that test whether the new data service re-
source was successfully exposed using the listResourceClient.
To undeploy a data service resource :
25
• Run the following command from within the OGSA-DAI WSRF bi-
nary distribution directory:
$ ant withdrawResource -Ddai.container=/path/to/Web/services/container
-Ddai.service.name=service/name
-Ddai.resource.id=ResourceID
– dai.container specifies the path to your Web services container
– dai.service.name specifies the name of the service
– dai.resource.id is the ID of the data service resource that the
data service is no longer to expose.
To uninstall OGSA-DAI WSRF

• Run the following from within the OGSA-DAI binary distribution
directory:
$ ant uninstall -Ddai.container = /path/to/Web/services/container
– dai.container specifies the path to your Web services container.
5.2.4 GridSphere
The GridSphere portal framework [4] provides an open-source portlet
based Web portal. GridSphere enables developers to quickly develop and
package third-party portlet web applications that can be run and admin-
istered within the GridSphere portlet container.
GridSphere provides two portlet implementations; one is the JSR 168 de

facto portlet API standard and the other is based upon the IBM WebSphere
Portlet API. GridSphere supports the development of re-usable portlets
and portlet services. It includes a set of core portlets and portlet services
that provide the basic infrastructure required for developing and admin-
istering Web portals. A key feature of the design of GridSphere is that it
builds upon the web application repository (WAR) deployment model to
support third-party portlet web applications. In this way, portlet devel-
opers can easily distribute and share their work with other portal projects
that use GridSphere to support their portal development.
Although several other open source portals are available, GridSphere

has made grid computing the niche. The integration of GridSphere portal
framework with the collection of grid portlets provided, forms a cohe-
sive ”grid portal” end-user environment for managing users, groups and
26
supporting remote job execution, file staging and providing access to in-
formation services. It has high-level application programming interface
(API) for developing customized Grid-enabled portlets.
Installing Gridsphere using Eclipse IDE Prerequisite Software To in-

stall gridsphere-using eclipse we need the following software
1. Java 1.4.x and above

OGSA-DAI WSRF has been tested on various versions of Java 1.4
and also Java 1.5. We used Java 1.5.0 for our project.
2. Apache ANT 1.5 or 1.6

We used Apache ANT 1.6.5 for our project.
3. Eclipse (optional) Any latest version of Eclipse IDE.
Installation using Eclipse:
• Set environmental variables ANT HOME and CATALINA HOME
• Create a new java project in eclipse with name gridsphere
• Export the downloaded and unpacked source contents of Gridsphere

into the newly created project in Eclipse
• Now, classpaths to libraries needs to be set required by Gridsphere

to be installed properly.
– click the gridsphere module and go to the properties.

– click on the java build path, go to the libraries tab
– click on add external jars button and choose the different jars
needed for each library (gridsphere/lib, tomcat /common/lib,
ant/lib)
• When Eclipse recognizes that all the jars needed to compile Grid-
sphere is complete, then error should disappear
• Now right click on the build.xml within the gridsphere module and
choose the second option to run ant (the second option will give
chance to edit the default build command target, unselect help and
choose install and run ant)
27
6 Results and Evaluation
The overall testing effort of this project is as follows
Functional Testing: The different types of functional tests that were car-
ried out are discussed briefly in the following sub-sections.
6.1 Unit testing

A unit is defined as the smallest testable piece of software. Unit testing
is a stage at which the developers are supposed to conduct an informal
test of the code. This type of testing focuses on individual software units,
groups of related units or modules. It is a type of white-box where the
testing is done based on the knowledge of the source code, internal struc-
ture and logic. These types of tests are aimed to check if a unit satisfies the
functional specifications of that unit or not.
6.2 Integration testing

Integration is the process of aggregating features to create modules and
aggregating modules to create larger modules. This second level of the
functional testing includes the testing of the integration of features and
modules. It focuses on combining modules to evaluate the interaction
among them. The defects that can occur due to the integration of the mod-
ules are identified.
These types of tests map to the design of the system and test the systems
conformance to the design. Hence integration tests are useful in validating
the design of the system.
6.3 System testing

After the successful completion of the above two tests, the system test-
ing was performed. The system testing was carried out on a built system.
In system testing, the focus shall be to ensure the smooth overall work-
ing of the system, including a successful run-through of the entire system.
28
System testing is a black box type testing wherein the tests are per-
formed based on the end users view of the system. System testing is aimed
at revealing defects/bugs that cannot be attributed to,
• Any particular component
• Inconsistencies between components
• Planned interaction between components
Functional tests These test the functionality of the system. The following
points may be noted in the case of functional tests
• Installation shall be treated as a function of the system, and tests on

the installation kit are considered to be functional tests.
• While simple function tests are usually restricted to a particular screen

or a feature, more complex function tests should describe full-fledged
scenarios involving multiple screens and features.
• Functional tests shall be restricted to only the environment on which

the system shall be deployed.
Create Test Database : Sample databases in MySQL and PostgreSQL

were created with a username of ogsadai and password of ogsadai. Then
a littleblackbook table is created in these databases.
• Database Drivers (JARS) for these databases is copied into the OGSA-
DAI/lib directory of the OGSA-DAI binary distribution
• Then we need to set up the classpath, by running the following script

$ cd OGSA-DAI
$ source ./setenv.sh
Each client takes the following (optional) arguments:
• -driverclass - the database driver class, to connect to the database.
• -host - database management system host.
• -port - database management system port.
• -database - name of database within which table is to be created, for

example ogsadai.
29
• -username - database username.
• -password - database password.
• -tablename - name of table to be created, for example littleblackbook
(for relational DBMS only).
• -rows - number of rows of data to be inserted into the table (for rela-
tional DBMS only).
• -collectionname - name of collection to be created, for example little-
blackbook (for XML DBMS only).
• -documents - number of documents to be inserted into the collection
(for XML DBMS only).
• -help - show these options and the default settings.
To see a client’s options and default settings:

$ java uk.org.ogsadai.client.dbcreate.CreateTestMySQLDB -help
• For MySQL
MySQL CreateTestMySQLDB Client : The CreateTestMySQLDB Client
supports the creation of the database itself, as well as the creation
and population of the table within it. This can be invoked by the use
of the following additional arguments:
This is pertaining to our local database
-rootusername root
-rootpassword mysql
To create a database and a table use, for example:

$ java uk.org.ogsadai.client.dbcreate.CreateTestMySQLDB -host lo-
calhost -port 3306 -database ogsadai -username ogsadai -password
ogsadai -tablename littleblackbook -rows 10000 -rootusername root
-rootpassword mysql
• Similarly for PostgreSQL:

PostgreSQL CreateTestPostgreSQLDB Client :
$ java uk.org.ogsadai.client.dbcreate.CreateTestPostgreSQLDB -host
localhost -port 5432 -database ogsadai -username ogsadai -password
ogsadai -tablename littleblackbook -rows 10000
30
• File containing comma-separated values: CreateTestCsvDB Client
This client creates a file containing comma-separated values.
$ java uk.org.ogsadai.client.dbcreate.CreateTestCsvDB -directory . -
filename littleblackbook.csv -delimiter ”,” -rows 10000
31
7 Timeline
Figure 9: Timeline diagram
From the above timeline graph, we have completed phase1 of our imple-
mentation by configuring OGSA-DAI to connect to heterogeneous databases
and have configured gridsphere. The second phase of the timeline graph
involves in implementing a distributed query processor(DQP) to execute
queries in parallel over OGSA-DAI data services and other services on
the Grid and the third phase involves in writing portlet application pro-
gramming interface(API) for accessing the data resources with a front end
interface.The second and third phase of the project is continued in the in-
ternship period (Jan 06 - May 06).
The challenges which we faced during our phase1 are mentioned in
our blog [6]
32
8 Conclusion
The project was aimed to develop a portal for accessing distributed het-
erogeneous data resources using a general query mechanism in a secure
manner. This requirement was addressed by implementing OGSA-DAI
on grid through which data is accessed from distributed resources. Grid
ensures security through authorization with the help of certificates. To
have a general query mechanism against the data sources, in future a dis-
tributed query processing mechanism would be implemented. To have a
front end portal framework gridsphere is configured, in future it would be
tied to the underlying infrastructure so the user is transparent about the
grid infrastructure.
33
References
[1] Biogrid project. http://www.biogrid.jp/.
[2] Datatag project. http://datatag.web.cern.ch/datatag/.
[3] Garuda project. http://www.garudaindia.in/index.asp.
[4] Gridsphere portal framework. http://www.gridsphere.org.
[5] Ogsa-dai middleware framework. http://www.ogsadai.org.
[6] Project blog. http://www.dgmp.blogspot.com/.
[7] Overview of the grid security infrastructure. http://www.globus.

org/security/overview.html, 2005. Last accessed August 2006.
34
9 Screen Shots
The screenshots of our system:
Figure 10: ScreenShot1

The databrowser shown in Figure 10 is a GUI interface for adding or
deleting a resource
35
To add a resource the data resource URL is given as input.
36
Once resources are added, it is listed in this format.
37
To query a particular resource, select the resource by highlighting the
option and give corresponding queries to retrieve the data from the
resource.
38
The data is retrieved and it is displayed in a table format.
39
We can update the data from the resources by giving corresponding
sqlupdate query to the resource.
40
The data is updated
41
Now we can see the result of the updated details.
42

Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report

Uploaded by

Copyright:

Available Formats

Data Grid Management Portal

Done as part of Course Curriculum

Web Technology ITM112-05

Indian Institute of Information

Trivandrum, Kerala 695 581

November 22, 2006

4 Review of related work 7

5 Design and Implementation 9

6 Results and Evaluation 28

In order to offer a framework that facilitates the coordination and coop-

This project is done as a part of fulfillment of the course ITM112-05 (Web

Data Grid Management Portal is designed for managing data sources on

This project addresses this requirement by providing a portal based

The Grid is an emerging infrastructure that supports the discovery, ac-

In future we will be able to execute queries in parallel over OGSA-DAI

In addition to it we are using GridSphere portal framework for building

Data Resources are distributed Databases distributed at various loca-

Databases contain data of different formats Databases contain data of

Integrating data from legacy systems The system developed should be

No single sign-on facility Individual users of the system should have a

Purpose: For biology and medical science.

Purpose: To create a large-scale intercontinental Grid testbed

The DataTAG project [2] creates a large-scale intercontinental Grid testbed

Purpose: Grid networked computing to research labs and industry

The Department of Information Technology (DIT), Government of In-

• Brings together all potential research, development and user groups

• Created a test bed for the research and engineering of technologies,

• Created the foundation for the next generation grids by addressing

Figure 1: Overall Architecture

5.1 Use Case Diagrams

• Create a new virtual organization

• Modify an existing virtual organization

• View details of an existing virtual organization

• Delete an existing virtual organization

• Change his password

Figure 2: Grid Admin USE CASE diagram

VO Admin USE CASE diagram

Figure 3: VO Admin USE CASE diagram

User USE CASE diagram

5.2 Technologies Used:

5.2.1 Grid Computing

Figure 5: Grid overview

Capability enabled by grid computing is to provide an environment

Figure 6: Grid Admin decides the policies

Grid computing enables organizations (real and virtual) to take ad-

5.2.2 Globus Toolkit

The toolkit includes software for security, information infrastructure, re-

Globus Toolkit 4 installation and configuration

Globus Toolkit 4 is supported on a variety of operating systems. Binary

Packages of Globus Toolkit 4

Globus Toolkit 4 is available in three ways.

• Download the full binary package from the Globus site.

• Download the full source package from the Globus site.

• Get source codes from a CVS server.

We used the binary package.

For Globus Toolkit 4 installation the following software are required to

• Java SDK − 1.4.2 or later

• Apache Ant 1.5.1 or later

2. Extract the binary package as the Globus user.

3. Set environmental variables for the Globus location.

5. Configure and install Globus Toolkit 4

[globus@hosta] $ cd /tmp/gt4.0.0 − ia32 − redhat9 − binary − in-

[globus@hosta] $ ./configure –prefix = $ GLOBUS LOCATION