Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Managing Scientific Models in Bio-phenomena Interpretation

M. Cavalcanti, COPPE Sistemas - UFRJ, Brazil; M. Campos, DCC/IM - UFRJ, Brazil; M. Mattoso, COPPE Sistemas - UFRJ, Brazil; M. Carvalho and P. Barreto, CENPES-Petrobras, Brazil. Introduction
Corrosion monitoring on oil platforms over the Brazilian coastal zone is one of the main concerns of scientists from CENPES-Petrobrs. Some of these scientists are biologists that study corrosion caused by bacteria. To identify the main cause of bio-corrosion events such as oil spills, scientists have to collect heterogeneous distributed data and apply an adequate scientific model. First, scientists collect water or pipe samples from the region under investigation. Then, laboratory analyses provide numerical data sets from these samples, which are then interpreted or analyzed by means of scientific models in order to derive new data, or some useful conclusion. The analysis of oil spills usually requires combining multiple models originating from different disciplines. The choice of a model is usually guided by an archive of previous case studies. Once models are chosen, scientists can run the corresponding programs. However, choosing the right models and running the adequate programs are done empirically. Model characteristics are described in several ways, and the experience from a successful model application is either lost or stored in paper reports. Consequently, scientists have difficulties in model management activities, such as, comparing different models, finding the associations of models and programs, and more importantly, taking advantage from a large number of previous experiences. Moreover, in a multidisciplinary and distributed scientific environment, scientists need to understand models out of their scope of expertise and use remote data and programs. Thus it is important to describe and represent scientific models as well as their associated program implementations.

Architecture
In this scenario, the target user is not the scientific modeler, but the scientific application user. These users need to have access to scientific resources for direct real case usage. On the other hand, scientific modelers should be able to publish models. However, to make these resources really useful, it is required an architecture that includes mechanisms for description and management of such resources. Moreover, considering that our target is a multidisciplinary and distributed scientific environment, such architecture should be: (i) portable, i.e., easily installed on any operational system; (ii) interoperable, i.e., easily connected to other applications, and (iii) flexible, i.e. easily configurable to domain specific characteristics. To address all these requirements, we propose a web-based middleware architecture composed by a set of interconnected modules. There are two main modules that manage scientific resources: the Resource Operation Module and the Resource Description Module. The Resource Operation Module deals with data and programs. The Resource Description Module plays the role of a metadata repository manager, dealing with data and program descriptions, and also with model descriptions. The architecture also presents a second layer composed of three other modules: Publication, Execution and Navigation. The Navigation Module interacts with scientists by allowing them to browse scientific resources and their correspondent descriptions. After browsing models and data the user chooses a program and specific data to be used as input. According to the user choices, the Navigation Module interacts with the Execution Module, which is responsible for verifying program constraints by querying the Resource Description Module. The Execution Module interacts back with the Navigation Module, helping the user on the selection process. Then, if the choice is validated, the Execution Module interacts with the Resource Operation Module by issuing an execute job command. After the Operation Module starts the program execution with the specified input data, the Execution Module can keep track of the ongoing experiment, by issuing job query commands. As the execution of an experiment may take days to finish, it is very useful to have a job-monitoring interface.

In summary, the Execution Module guides the user on the correct use of the available models, providing an on-the-fly interface for executing them. Finally, the Execution Module should be able to publish the finished experiment by interacting with the Publication Module. The Publication Module is responsible for providing a user interface for publishing scientific resources. When a publisher enters some resource descriptions, the module checks these inputs by interacting with both the Resource Operation and Resource Description Modules. Once validated, the Resource Description Module stores these inputs. The Scientific Publication Model (SPM) is the metamodel (schema) behind the Resource Description Module, and is described in more detail in [1]. SPM main contribution is the explicit semantic representation of scientific models. The distinction between program and model provides the representation of models at both theoretic and operational levels. Each resource instance is expressed in XML documents [4], which are validated and stored in accordance with the SPM XML Schema [5].

Conclusion
This work proposes a scientific management architecture that focuses on overcoming difficulties of environmental applications. The idea is to provide a better metadata support for managing distributed basic scientific resources, i.e., programs and data. The proposed architecture is useful, not only for bio-phenomena interpretation scientists, but for many other environmental scientists and decision makers. Through this architecture, it becomes possible to monitor the usage of scientific resources. As we mentioned before, the choice of a model is usually based on previous cases. Therefore, the history of models usage is another very important scientific resource. Moreover, by investigating these historic data, it becomes possible to identify usage patterns, such as the frequent use of a sequence of models, and include them as valuable resources. We believe that our architecture, with little enhancements, will be able to deal with all these resources. A prototype of this architecture is under development, and uses two existing systems as its management modules. The Resource Description Module is supported by the Goa System [2], an ODBMS prototype developed at COPPE-Sistemas, UFRJ, Brazil. The Resource Operation Module is supported by LeSelect [3], a mediator-based heterogeneous distributed database architecture developed at INRIA, France.

References
1. Cavalcanti, M., Mattoso, M., Campos, M, Llirbat, F., Simon, E. Sharing Scientific Models in Environment Applications. In: Proc. of ACM Symposium on Applied Computing, Madrid, Spain, March 2002. 2. Goa System In: http://www.cos.ufrj.br/~goa, accessed in Jan. 2002. 3. LeSelect In: http://caravel.inria.fr/ Fprototype_LeSelect.html, accessed in Jan. 2002. 4. W3C XML Specification In: http://www.w3c.org/XML, accessed in Jan. 2002. 5. W3C XML Schema Specification In: http://www.w3c.org/XML/Schema, accessed in Jan. 2002.

You might also like