Professional Documents
Culture Documents
Report
Report
Project Report
by
Nithya Sam
Pradheepa.C
Rashmi Nair
Vimi Soman
2 Introduction 4
3 Problem Analysis 6
7 Timeline 32
8 Conclusion 33
9 Screen Shots 35
List of Figures
1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . 9
2 Grid Admin USE CASE diagram . . . . . . . . . . . . . . . . 10
3 VO Admin USE CASE diagram . . . . . . . . . . . . . . . . . 11
4 User USE CASE diagram . . . . . . . . . . . . . . . . . . . . . 12
5 Grid overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Grid Admin decides the policies . . . . . . . . . . . . . . . . 14
7 OGSA−DAI Architecture . . . . . . . . . . . . . . . . . . . . 18
8 Interaction between Data Resources . . . . . . . . . . . . . . 22
9 Timeline diagram . . . . . . . . . . . . . . . . . . . . . . . . . 32
10 ScreenShot1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1
11 ScreenShot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
12 ScreenShot3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
13 ScreenShot4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
14 ScreenShot5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
15 ScreenShot6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
16 ScreenShot7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
17 ScreenShot8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2
1 Abstract
The aim of this project is to develop a portal, which involves scientists
who are interested in computations that depend on data from several data
sources. In general, the ability to securely access diverse data sources in
a collaborative environment, called networked Virtual Organization (VO),
is becoming an essential requirement for optimizing the design and devel-
opment phase of the product lifecycle. Dynamic and secure transparent
data access as well as integration of heterogeneous data sources is a key
issue in many collaboration environments today.
3
2 Introduction
In an increasing number of scientific disciplines, large data collections
are emerging as important resources. The volume of scientific data can
even be measured in terabytes. The communities of researchers that need
to access and analyze this data are often large and are almost always ge-
ographically distributed, as are the computing and storage resources that
these communities rely upon to store and analyze their data.
Grid ensures security through authorization and access control tools for
supporting decentralized control mechanisms. A central concept in GSI
(Grid Security Infrastructure) [7] authentication is the certificate. Every user
and service on the Grid is identified via a certificate, which contains infor-
mation vital to identifying and authenticating the user or service. Once
mutual authentication is performed, GSI gets out of the way so that com-
munication can occur without the overhead of constant encryption and
decryption.
4
To facilitate the coordination and cooperation between distributed data
sources we use OGSA-DAI(Open Grid Services Architecture-Data Access and
Integration) middleware framework. It provides a means for users to Grid
enable their data resources including relational, XML and files. This frame-
work provides a compact way of handling multiple interactions with a
service within a single request via an XML document, called a perform
document, where data is pipelined between different set of activities that
operate on a data stream coming out a data resource, or going into a data
resource. Whenever a client submits a perform document to a data service,
a response document will be received in return. A response document de-
scribes the status of execution of a perform document and also identifies
the session that the request was joined. With Data Access and Integration
(DAI) heterogeneous resources are seen as a single virtual resource on a
grid.
5
3 Problem Analysis
The existing system has several servers accessing databases, hosting
data from different domains and of different formats. Data from these
different database servers need to be brought into and then processed on
the servers.
The following are the problems faced by the existing system
Accessing data through web is not secure Web services must provide
authentication, authorization as well as supporting functions for manag-
ing user credentials.
6
4 Review of related work
Biogrid project
The DataGrid group in the Biogrid project [1] has developed a system
for the federation of bio-related databases with Globus Toolkit 3.0.2/OGSA-
DAI, aiming at an application to Drug Discovery. The system has cur-
rently bridged 11 databases in heterogeneous communities: such as biol-
ogy, medical science and pharmaceutics. Under this project, Osaka Uni-
versity and other relevant institutions are in the process of developing a
Computer Grid Technology to meet IT needs specialized in biology and
medical science. Apart from research activities, the project also includes
peta grid technology business development and educational training in
bioinformatics.
DataTAG project
This project addresses the issues, which arise in the sector of high per-
formance inter-Grid networking, including sustained and reliable high
performance data replication, end-to-end advanced network services, and
novel monitoring techniques. It also addresses the issues, which arise in
the sector of interoperability between the Grid middleware layers such as
information and security services.
GARUDA
7
GARUDA [3]is a collaboration of science researchers and experimenters
on a nation wide grid of computational nodes, mass storage and scientific
instruments that aims to provide the technological advances required to
enable data and compute intensive science for the 21st century. One of
GARUDAs most important challenges is to strike the right balance be-
tween research and the daunting task of deploying that innovation into
some of the most complex scientific and engineering endeavors being un-
dertaken today.
8
5 Design and Implementation
Overall Architecture of the system showing the portal, grid data services
and the databases is shown below:
The databases shown in the figure are heterogeneous and are located
remotely. The aim of our project is to design a portal having a single sign-
on facility to access these databases and to ultimately make a join from
these databases and present an output as requested by the user. The Grid
Data Services includes a software framework called Open Grid Services
Architecture (OGSA), which forms the interface between the application
and data resource. The grid-based framework is provided by a software
toolkit called Globus. The front-end interface is provided by gridsphere.
9
Grid Admin USE CASE diagram
• Grid Administrator
• Log in
• Logout
• Log in
10
• Create a new user
• View details of an existing user
• Delete an existing user
• Add new resource
• Change his password
• Logout
11
Figure 4: User USE CASE diagram
• Grid Computing
• Globus Toolkit
• OGSA−DAI
• Gridsphere
12
based on configuration choices given: Security domains, degrees of isola-
tion desired, type of policies and their scope, and contractual obligations
between users and providers of the infrastructures. These issues are not
fundamentally architectural in nature
Grid computing involves an evolving set of open standards for Web ser-
vices and interfaces that make services, or computing resources, available
over the Internet. One definition of grid computing to be distributed com-
puting across virtualized resources.
The goal is to create the illusion of a simple yet large and powerful
virtual computer out of a collection of connected (and possibly heteroge-
neous) systems sharing various combinations of resources.
VIRTUAL ORGANIZATION
The users of the grid can be organized dynamically into a number of vir-
tual organizations, each with different policy requirements. These virtual
organizations can share their resources collectively as a larger grid.
13
The participants and users of the grid can be members of several real
and virtual organizations. The grid can help in enforcing security rules
among them and implement policies, which can resolve priorities for both
resources and users.
Administrators can change any number of policies that affect how the
different organizations might share or compete for resources.
14
vices and libraries for resource monitoring, discovery, and management,
plus security and file management.
Installation
15
To install from a binary package:
1. Obtain the Globus Toolkit 4 binary package from the Globus site.
4. Create and change the ownership of directory for user and group
globus.
[globus@hosta] $ su
Password:
[root@hosta] # mkdir $ GLOBUS LOCATION
[root@hosta] # chown globus:globus $ GLOBUS LOCATION
[root@hosta] # exit
exit
[globus@hosta] $
16
OGSA−DAI is a middleware product that allows data resources, such as
relational or XML databases, to be accessed via web services. An OGSA−DAI
web service allows data to be queried, updated, transformed and deliv-
ered. OGSA−DAI can be used to provide web services that offer data
integration services to clients. OGSA−DAI provides a means for users to
Grid enable their data resources.
• Data within each of these types of resource can be queried and up-
dated.
17
Figure 7: OGSA−DAI Architecture
OGSA−DAI Architecture
Data Layer
The data layer consists of the data resources that can be exposed via
OGSA-DAI. Currently these include:
18
data resource. OGSA-DAI includes data resource accessors for relational
databases, XML databases and file systems. This component is an exten-
sibility point, so users can develop their own data resource accessors in
order to expose new kinds of data resource.
19
Business Logic Layer → Presentation Layer Interface
• Data service resource property values such as the schema of the un-
derlying data resource.
• Information about the activities that are supported by the data ser-
vice resource. These are the activities that can be requested by the
user within perform documents.
20
Presentation Layer
Client Layer
21
Figure 8: Interaction between Data Resources
queue size are specified when a data service is deployed and can be ad-
justed later through properties of the web service deployment descriptor.
Installation of OGSA-DAI
• Java 1.4.x and above OGSA-DAI WSRF has been tested on various
versions of Java 1.4 and also
Java 1.5. We used Java 1.5.0 for our project.
• Apache ANT 1.5 or 1.6 .
We used Apache ANT 1.6.5 for our project.
• Globus Toolkit 4.0.3 Java Web Services Core
• Apache Tomcat
To use OGSA-DAI WSRF with Tomcat we have to deploy GT4 onto
Tomcat as shown below:
• Set a GLOBUS LOCATION environment variable to point to the
location of your GT4 distribution.
$ export GLOBUS LOCATION=/ path / to / Globus / directory
• Set a CATALINA HOME environment variable to point to the loca-
tion of Tomcat.
$ export CATALINA HOME=/ path / to / Tomcat / directory
22
• Run the following commands :
$ cd path/to/Globus/directory
$ ant -f share/globus wsrf common / tomcat / tomcat.xml deploy-
Tomcat -tomcat.dir= / path / to / Tomcat
$ cd path / to / OGSA-DAI / binary / directory
3. Request that the data service expose the data service resource. This
instructs the data service to expose the data service resource and so
allows clients to interact with the data service resource, thereby in-
teracting with a data resource.
23
(a) The characteristics of a data service resource are specified in a
data service resource file. A data service resource file is used
to specify the properties of your data service resource. This is a
simple properties file consisting of argument=value values. You
can reuse the same file to deploy the same data service resource
under different names or to customise a specific configuration
in small or major ways. In the OGSA-DAI WSRF distribution
directory there is data.service.resource.properties file. The file
is fully commented.
The properties are as follows:
• dai.resource.id = - name for the data service resource.
• dai.data.resource.type = [Relational | XML | Files | MultiRe-
source] - the type of data resource to which the data service
resource provides access.
• dai.product.name = - data resource product name (optional
- ignored for files and multi resources).
• dai.product.vendor = - data resource product vendor (op-
tional - ignored for files and multi resources).
• dai.product.version = - data resource product version (op-
tional - ignored for files and multi resources).
• dai.data.resource.uri = - data resource URI. This must be
compatible with the driver class specified next (ignored for
multi resources).
• dai.driver.class = - data resource driver class name (ignored
for files and multi resources).
• dai.credential = - Grid certificate credentials of a user per-
mitted to access the data resource. If omitted then any user
will be allowed access (ignored for multi resources).
• dai.user.name = - data resource user name. Optional only if
there is no user name required for a database (ignored for
multi resources).
• dai.password = - corresponding data resource password.
Optional if there is no user name required, or if the pass-
word is null (ignored for multi resources).
• dai.data.service.resource.uri.one = - for multi resources only,
the URL of a service exposing the first of the aggregated re-
lational resources.
• dai.data.service.resource.id.one = - for multi resources only,
the ID of the first of the aggregated relational resources.
24
• dai.data.service.resource.description.one = - for multi resources
only, a description of the first of the aggregated relational
resources (optional).
• dai.data.service.resource.uri.two = - for multi resources only,
the URL of a service exposing the second of the aggregated
relational resources.
• dai.data.service.resource.id.two = - for multi resources only,
the ID of the second of the aggregated relational resources.
• dai.data.service.resource.description.two = - for multi re-
sources only, a description of the second of the aggregated
relational resources (optional).
(b) Run the following command from within the OGSA-DAI WSRF
binary distribution directory:
$ ant deployResource -Ddai.container = /path/to/Web/services/container
-Ddai.resource.file = DAI-SERVICE-RESOURCE-FILE
• dai.container specifies the path to your Web services con-
tainer.
• dai.resource.file specifies the location of a data service re-
source properties file.
(c) Exposing data services
• Run the following command from within the OGSA-DAI
WSRF binary distribution directory:
$ ant exposeResource -Ddai.container = /path/to/Web/services
/container
-Ddai.service.name = service/name
-Ddai.resource.id = ResourceID
• dai.container specifies the path to your Web services con-
tainer.
• dai.service.name specifies the name of the service.
• dai.resource.id= is the ID of the data service resource that
the data service is to expose.
(d) We need to shutdown and restart Web services container before
clients are able to access the new data service resource via the
data service. After that test whether the new data service re-
source was successfully exposed using the listResourceClient.
25
• Run the following command from within the OGSA-DAI WSRF bi-
nary distribution directory:
$ ant withdrawResource -Ddai.container=/path/to/Web/services/container
-Ddai.service.name=service/name
-Ddai.resource.id=ResourceID
– dai.container specifies the path to your Web services container
– dai.service.name specifies the name of the service
– dai.resource.id is the ID of the data service resource that the
data service is no longer to expose.
5.2.4 GridSphere
The GridSphere portal framework [4] provides an open-source portlet
based Web portal. GridSphere enables developers to quickly develop and
package third-party portlet web applications that can be run and admin-
istered within the GridSphere portlet container.
26
supporting remote job execution, file staging and providing access to in-
formation services. It has high-level application programming interface
(API) for developing customized Grid-enabled portlets.
• When Eclipse recognizes that all the jars needed to compile Grid-
sphere is complete, then error should disappear
• Now right click on the build.xml within the gridsphere module and
choose the second option to run ant (the second option will give
chance to edit the default build command target, unselect help and
choose install and run ant)
27
6 Results and Evaluation
The overall testing effort of this project is as follows
Functional Testing: The different types of functional tests that were car-
ried out are discussed briefly in the following sub-sections.
These types of tests map to the design of the system and test the systems
conformance to the design. Hence integration tests are useful in validating
the design of the system.
In system testing, the focus shall be to ensure the smooth overall work-
ing of the system, including a successful run-through of the entire system.
28
System testing is a black box type testing wherein the tests are per-
formed based on the end users view of the system. System testing is aimed
at revealing defects/bugs that cannot be attributed to,
Functional tests These test the functionality of the system. The following
points may be noted in the case of functional tests
• Database Drivers (JARS) for these databases is copied into the OGSA-
DAI/lib directory of the OGSA-DAI binary distribution
29
• -username - database username.
• -password - database password.
• -tablename - name of table to be created, for example littleblackbook
(for relational DBMS only).
• -rows - number of rows of data to be inserted into the table (for rela-
tional DBMS only).
• -collectionname - name of collection to be created, for example little-
blackbook (for XML DBMS only).
• -documents - number of documents to be inserted into the collection
(for XML DBMS only).
• -help - show these options and the default settings.
30
• File containing comma-separated values: CreateTestCsvDB Client
This client creates a file containing comma-separated values.
$ java uk.org.ogsadai.client.dbcreate.CreateTestCsvDB -directory . -
filename littleblackbook.csv -delimiter ”,” -rows 10000
31
7 Timeline
From the above timeline graph, we have completed phase1 of our imple-
mentation by configuring OGSA-DAI to connect to heterogeneous databases
and have configured gridsphere. The second phase of the timeline graph
involves in implementing a distributed query processor(DQP) to execute
queries in parallel over OGSA-DAI data services and other services on
the Grid and the third phase involves in writing portlet application pro-
gramming interface(API) for accessing the data resources with a front end
interface.The second and third phase of the project is continued in the in-
ternship period (Jan 06 - May 06).
The challenges which we faced during our phase1 are mentioned in
our blog [6]
32
8 Conclusion
The project was aimed to develop a portal for accessing distributed het-
erogeneous data resources using a general query mechanism in a secure
manner. This requirement was addressed by implementing OGSA-DAI
on grid through which data is accessed from distributed resources. Grid
ensures security through authorization with the help of certificates. To
have a general query mechanism against the data sources, in future a dis-
tributed query processing mechanism would be implemented. To have a
front end portal framework gridsphere is configured, in future it would be
tied to the underlying infrastructure so the user is transparent about the
grid infrastructure.
33
References
[1] Biogrid project. http://www.biogrid.jp/.
34
9 Screen Shots
The screenshots of our system:
35
Figure 11: ScreenShot2
To add a resource the data resource URL is given as input.
36
Figure 12: ScreenShot3
Once resources are added, it is listed in this format.
37
Figure 13: ScreenShot4
To query a particular resource, select the resource by highlighting the
option and give corresponding queries to retrieve the data from the
resource.
38
Figure 14: ScreenShot5
The data is retrieved and it is displayed in a table format.
39
Figure 15: ScreenShot6
We can update the data from the resources by giving corresponding
sqlupdate query to the resource.
40
Figure 16: ScreenShot7
The data is updated
41
Figure 17: ScreenShot8
Now we can see the result of the updated details.
42