Proposal For A Scientific Software Lifecycle Model: November 2017

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/320742803

Proposal for a Scientific Software Lifecycle Model

Conference Paper · November 2017


DOI: 10.1145/3144763.3144767

CITATIONS READS
0 1,474

2 authors, including:

Anshu Dubey
Argonne National Laboratory
79 PUBLICATIONS   1,156 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

FLASH Code View project

Code Comparison View project

All content following this page was uploaded by Anshu Dubey on 04 January 2018.

The user has requested enhancement of the downloaded file.


Proposal for a Scientific Software Lifecycle Model
Anshu Dubey Lois Curfman McInnes
Mathematics and Computer Science Division Mathematics and Computer Science Division
Argonne National Laboratory Argonne National Laboratory
Lemont, IL 60439 Lemont, IL 60439
Flash Center for Computational Science Email:curfman@mcs.anl.gov
University of Chicago
Chicago, IL 60637
Email:adubey@anl.gov

Abstract—Improvements in computational capabilities have stages are cycled over for subsets of requirements until the
lead to rising complexity in scientific modeling, simulation, and final product is obtained; spiral [4], where iterations occur for
analytics and thus the software implementing them. In addition, ongoing and new requirements; big bang [16], where develop-
a paradigm shift in platform architectures has added another
dimension to complexity, to the point where software produc- ment occurs without defined process; and agile, which allows
tivity (or the time, effort, and cost for software development, cycling through any group of phases emphasizing incremental
maintenance, and support) has emerged as a growing concern for changes [1]. (See [18], [13] for a general description of various
computational science and engineering. Clearly communicating software lifecycle models.)
about the lifecycle of scientific software provides a foundation Because of the unique requirements of scientific software,
for community dialogue about processes and practices for various
lifecycle phases that can improve developer productivity and soft- a mismatch exists between the needs of scientific software
ware sustainability—key aspects of overall scientific productivity. developers and the theory of mainstream software engineering.
While the mainstream software engineering community have Aspects of some lifecycle models apply; for example, the gen-
produced lifecycle models that meet the needs of software projects eral philosophy of the agile approach fits well. But the methods
in business and industry, none of the available models adequately used in nonscientific software under the agile approach do not
describes the lifecycle of scientific computing software. In par-
ticular, software for end-to-end computations for obtaining sci- fit nearly as well for scientific code. The biggest challenge
entific results has no formalized development model. Examining in having well-defined methods and timelines for scientific
development approaches employed by teams implementing large software development is that often the numerical methods and
multicomponent codes reveals a great deal of similarity in their abstractions being used in implementations are themselves
strategies. In earlier work, we organized related approaches into subject of research, and therefore not fully specified ahead
workflow schematics, with loose coupling between submodels for
development of scientific capabilities and reusable infrastructure. of time. There have been efforts to adopt the agile model
Here we consider an orthogonal approach, formulating models for research-driven software. For example, the TriBITS [2]
that capture the workflow of software development in slightly effort has produced a package that also incorporates an agile
different scenarios, and we propose a scientific software lifecycle lifecycle model for research-driven software development.
model based on agile principles. The model addresses concerns of software that downstream
Index Terms—scientific computing, software engineering, soft-
ware lifecycle becomes a component in a larger software collection. The
Blue Brain Project [8], [10] is adapting this model for their
own computational needs. In general this model is suitable
I. I NTRODUCTION
for software that implements research ideas and becomes a
The topic of software lifecycles for business and commercial reusable component in other larger collections of interoperable
software is well researched with many models that meet the software.
needs of different types of projects. A lifecycle model decom- However, many projects exist in scientific domains where
poses software development into distinct phases, where each the primary software objective is to be the means for con-
phase has its own requirements, specifications, and methods. ducting research instead of being the product or the goal
Many reasons make such decomposition into phases desirable. of the research. End-to-end simulation codes fall into this
For example, each phase can control its own quality and result category. They may use libraries and other third-party software
in higher quality software overall. Similarly, in larger projects, as components, but the codes have different usage models
phases can help define roles for developers and bring clarity and user expectations. In most successful scientific software
to the development process. Some standard lifecycle models development projects, there is an implicit understanding of
are waterfall [12], where each stage is completed before the the software lifecycle, even if is not articulated. In earlier
next stage can begin; V-shaped [17], which is an extension work [7] presented as an idea paper at the WSSSPE work-
of the waterfall model that also incorporates testing phases shop in 2016, we devised schematics of scientific software
for each development stage; iterative [14], where development development workflows with a view toward engaging the
community in examining this important aspect of software
productivity. Here we refine ideas from the earlier work and observa)on   Phenomena  of  
Interest  
take the next step toward devising a lifecycle model applicable model  
to simulation software and data analytics associated with model  fidelity  
Valida)on   Equa)ons  
scientific simulations. Our methodology follows a three-step
model  fidelity  
process. Requirement gathering, design, implementation, and output  
discre)ze  
verification & validation are four distinct phases in a typical
Numerical  
development cycle for all kinds of software. In the first step Verifica)on  
Solvers  
we map the activities during the development of scientific op)miza)on    
accuracy   accuracy  
Performance    stability    stability  
software to these well known and well understood phases. In
the second step we examine the existing lifecycle models and Implementa)on  
evaluate their applicability to the conceptualized workflows.
In the third step we use the insights from the first two steps
to propose a lifecycle model that covers important aspects of
Fig. 1. Workflow for developing multiphysics software: overall perspective.
scientific software development and maintenance.
II. W ORKFLOW FOR S CIENTIFIC S OFTWARE
D EVELOPMENT
model,  framework  
Requirements      
We begin by looking at the workflow for typical multi- gathering  
data,  expecta3ons  
workflow  
physics scientific software development projects. Examples of
such development projects include FLASH [6], Uintah [3], approxima3ons,  numerics  
Design   storage,  cura3on,  retrieval,  analysis  
Pluto [11], Ramses [15], Cactus [5], and many more. All of steps  in  scien3fic  process  
these codes use high performance computing (HPC) platforms
solvers,  infrastructure  
for running simulations. Figure 1 captures essential features Implementa3on   algorithms,  data  structures  
of the workflow that has been implicit during development tools,  interfaces  

in many such projects. All boxes in the figure can be, and convergence,  order,  correc3on  
Verifica3on     valida3on,  observa3ons  
usually are, under research. The research topic may be of valida3on    
provenance  
general interest going beyond the project, such as numerical
methods being applied, or it may be driven by the needs of the
project itself. Many feedback loops in the workflow indicate
Fig. 2. Interaction among development phases for multiphysics software:
ongoing research and refinement in corresponding sections of overall perspective.
the workflow based on insights gained during the project.
The process starts with devising a mathematical model for
the phenomena of interest. The equations are discretized, and Another important aspect of scientific software functionality
numerical methods are devised and implemented for solving is reusable infrastructure, or using a loose definition of the
the equations. Here the workflow for scientific software de- word framework (e.g., see [9]), the entity that provides basic
velopment begins to diverge from that of mainstream software services (such as data structures related to discretizations and
development. The verification of scientific software addresses data layout, parallelization, and I/O), enables composability,
not only expected behavior, but also convergence and stability and allows orchestration of calculations. A flexible and exten-
of the numerical solvers. A failure in either takes the workflow sible framework is a critical component of scientific software,
back to numerical solvers, which may need to be revised or with unique requirements in its development cycle, therefore
redesigned. Similarly, validation of output against observations deserving its own separate workflow and design space. Frame-
may reveal that the discretization or approximations used in work development comes closest to other general business
the mathematical model are inadequate, which in turn can and commercial software, in that the control flow from one
completely reset the workflow to the first step. phase to another is linear, as shown in Figure 3. This reusable
Figure 1 illustrates a rough mapping of the workflow for infrastructure is the most stable part of the resulting scientific
software development of scientific capabilities onto four basic software, and once it has been implemented, a change to
phases: requirements gathering, design, implementation, and the framework is a major undertaking. Modifications to a
verification. Setting aside issues such as release, maintenance, framework would normally require starting at the requirement
and user support, these phases apply to any standard software gathering phase. Not surprisingly, the diagram of interaction
process. What differs in the realm of scientific software is the among phases is also linear, as shown in Figure 4.
feedback among various phases. From the perspective of these We also examine the scientific process workflow from the
distinct phases of development, the workflow can be simplified perspective of data used in computational science, which
into interactions among the phases as shown in Figure 2. Each may be data generated by or input to simulations, or ob-
phase in the figure shows entities that are resolved in the servational data used for validation, or all used together for
corresponding phase. advancing scientific understanding. Some of the processes
Processing  
Services  Provided  
Needed  
model  

Separa)on  of   Data  Layout/  


Interoperability   Storage   Accessibility   Analy)cs  
Concerns   Interface  

Implementa)on  
Implementa)on  
robustness  
reliability  
Verifica)on   Verifica)on  

Fig. 3. Workflow for developing multiphysics software: infrastructure. Fig. 5. Workflow for developing multiphysics software: data management
and analytics.

interoperability  concerns   data  volume  


Requirements       Requirements       needed  analy3cs  
interfaces  
gathering   gathering  
process  constraints   accessibility  

separa3on  of  concerns   storage  modes,  accessor  func3ons  

Design   data  scoping  and  ownership   Design   data  scoping  and  retrieval  algorithms  
encapsula3on   modifica3on,  control  

backbone   archive,  analysis  

Implementa3on   interfaces   Implementa3on   interfaces  

helper  func3ons   helper  func3ons  

interoperability   tes3ng  

Verifica3on     expected  behavior   Verifica3on     expected  behavior  

robustness   robustness  

Fig. 4. Interaction among development phases for multiphysics software: Fig. 6. Interaction among development phases for multiphysics software: data
infrastructure. management and analytics.

are similar to computations, e.g., much of the analysis starts design meant for exploration of new ideas and insights, linear
with a hypothesis, models are mathematical, and analysis progression is incompatible with its goals. Framework design
may involve numerical methods also used in simulations. and development comes the closest to being able to follow this
Some other processes are different, for example archiving model.
and retrieval, which have no equivalent in simulations. The b) V-shaped model: differs from the waterfall model
schematic in Figure 5 captures the workflow characteristics of in having testing phases corresponding to each development
scientific data management and analysis, and Figure 6 shows phase. Because it also needs to have no unknown requirements,
the corresponding phase interactions. More feedback loops it has similar drawbacks for adoption by scientific software as
exist for data analytics than in framework design because the waterfall model.
insights and inferences can lead to modifying or replacing c) Iterative model: operates by allowing the waterfall
algorithms used in analysis (similar to the feedback loops for model to proceed for a subset of requirements, and then going
scientific capabilities, shown in Figure 2). back to the beginning for the next set of requirements. It
overcomes one problem of the other two models discussed
III. E XISTING S OFTWARE L IFECYCLE M ODELS so far in that it allows going back to the first phase. However,
We now examine existing software lifecycle models to see it still lacks the flexibility of permitting evolving requirements
what, if any, mapping is possible between the models and the that can happen in scientific software.
development workflow of scientific software. d) Spiral model: is a refinement of the iterative model
a) Waterfall model: is the simplest of the software where phases are repeated for previously implemented require-
lifecycle models, and it is also the least applicable to scientific ments as well as new requirements over and over until the
software. The main reason is that it relies upon the a relation- project objectives are met. However, this model is still not
ship between phases where the next phase cannot begin until adequate for scientific software, because, as seen in Figure 2,
the first phase is complete. Because scientific software is by feedback loops exist among multiple phases in the workflow,
be needed, hence the arrow from the maintenance box to
Initial Development the implementation box. Normally implementation phase will
resolve most issues and bugs reported, however, occasionally
Requirements Capability
Integration of Design
New Research gathering Addition the severity of the issue may cause going back earlier in
the development cycle, to design or even to requirement
Verification
Implementation
gathering phases. The diagonal arrows among the boxes permit
validation
escalation of development complexity as needed. Similarly,
capability addition is normally expected to plug into the
development cycle at the design phase, while integration of
Release
Maintenance
Ongoing Testing
Distribution new research is likely to cause going back all the way to
User Support
Issues and Bug Resolution requirements gathering. Because this model permits nullifying
arrows and phases as needed, it provides the flexibility of
bypassing one or more phases for either capability addition or
Fig. 7. Overall lifecycle model for scientific software derived by analyzing integrating new research if needed. Therefore, for any stage in
workflow in Figure 1 and mapping it to lifecycle phases in Figure 2. Figures 4 software development, the cycle can be made simpler or more
and 6 show details of interactions among development phases of infrastructure complex depending upon the needs of the moment. Similar to
and analysis.
the agile methodology, our model supports frequent releases
whenever there is a stable code version.
so the spiral may end up folding back on itself.
V. C ONCLUSIONS AND F UTURE W ORK
e) Big bang model: is the model in which the vast
majority of scientific software development projects have Through the process of mapping typical workflows for
operated until recently. This model does not have a well development of scientific software, especially as it applies
defined process or requirements and thus is inherently risky. to the most complex multiphysics codes, we have unraveled
This model can be acceptable for small projects with just a the dependencies and feedback loops within the lifecycle of
few developers; however, as demonstrated by the many failed such software. We have synthesized a lifecycle model that,
large projects in the scientific world, it clearly does not apply by permitting null instances of phases and connecting arrows,
to any moderate to large project. unifies many complex workflows into a simple schematic. This
f) Agile model: comes closest to being applicable to lifecycle model captures the essential features and phases of
scientific software development because it allows cycling the most complex scientific software development.
through any group of phases and emphasizes incremental One aspect of scientific software that we have not addressed
changes. Its philosophy applies, though many of the methods in our current model is that of refactoring existing software.
that implement the philosophy do not. For example, sprints The software in question could be a legacy code or a well
have very little use in software that is used for research and constructed software that nevertheless has to be refactored be-
is being researched. cause of the exigencies of platform requirements. An important
common feature of such development is that the new structure
IV. P ROPOSED L IFECYCLE M ODEL needs to be built while retaining large chunks of original
The proposed lifecycle model for scientific software, shown code. This approach provides a path to incremental adoption,
in Figure 7, is derived from agile methodology and includes necessary in most scientific refactoring projects. Considering
steps beyond the initial development cycle discussed in Section a lifecycle model for refactoring will be a next step.
II. The initial development phase is taken from Figure 2, since
the scientific capabilities of multiphysics software have the ACKNOWLEDGMENTS
most demanding interaction among phases. The other aspects This work was supported by the U.S. Department of
of scientific software (infrastructure, shown in Figure 4 and Energy Office of Science Office of Advanced Scientific
analysis, shown in Figure 6) have a subset of complexity Computing Research. The submitted manuscript has been
of interaction among phases. Two-way arrows represent tight created by UChicago Argonne, LLC, Operator of Argonne
coupling and feedback loops that exist among various phases National Laboratory (Argonne). Argonne, a U.S. Department
in the development cycle. Note that any of the arrows or phases of Energy Office of Science laboratory, is operated under
can be nullified in a traversal of the cycle depending upon the Contract No. DE-AC02-06CH11357. The U.S. Government
need of the project. retains for itself, and others acting on its behalf, a paid-up
Boxes outside the development phase represent later stages nonexclusive, irrevocable worldwide license in said article to
in the software lifecycle, with their arrows pointing to the reproduce, prepare derivative works, distribute copies to the
phase where they are more likely to plug into the development public, and perform publicly and display publicly, by or on
cycle. For example, the two way arrow between maintenance behalf of the Government. The Department of Energy will
and release boxes indicates user interactions with issues and provide public access to these results of federally sponsored
bugs reported back. Sometimes the issues may be resolved research in accordance with the DOE Public Access Plan.
with discussion; at other times a new implementation may http://energy.gov/downloads/doe-public-access-plan.
R EFERENCES
[1] Agile methodology. http://agilemethodology.org/.
[2] R. A. Bartlett, M. A. Heroux, and J. M. Willenbring. Overview of
the tribits lifecycle model: A lean/agile software lifecycle model for
research-based computational science and engineering software. In E-
Science (e-Science), 2012 IEEE 8th International Conference on, pages
1–8. IEEE, 2012.
[3] M. Berzins, J. Luitjens, Q. Meng, T. Harman, C. Wight, and J. Peterson.
Uintah - a scalable framework for hazard analysis. In TG ’10: Proc. of
2010 TeraGrid Conference, New York, NY, USA, 2010. ACM.
[4] B. W. Boehm. A spiral model of software development and enhance-
ment. Computer, 21(5):61–72, 1988.
[5] Cactus Computational Toolkit, 2013.
[6] A. Dubey, K. Antypas, M. Ganapathy, L. Reid, K. Riley, D. Sheeler,
A. Siegel, and K. Weide. Extensible component-based architecture for
FLASH, a massively parallel, multiphysics simulation code. Parallel
Computing, 35(10-11):512–522, 2009.
[7] A. Dubey and L. McInnes. Idea paper: Software lifecycle for scientific
simulation software. Working Towards Sustainable Software for Science:
Practice and Experience (WSSSPE4), http://wssspe.researchcomputing.
org.uk/wp-content/uploads/2016/06/WSSSPE4 paper 16.pdf.
[8] M.-O. Gewaltig and R. Cannon. Current practice in software devel-
opment for computational neuroscience and how to improve it. PLoS
Comput Biol, 10(1):e1003376, 2014.
[9] D. E. Keyes, L. C. McInnes, C. Woodward, W. Gropp, E. Myra, M. Per-
nice, et al. Multiphysics simulations: Challenges and opportunities.
The International Journal of High Performance Computing Applications,
27(1):4–83, 2013.
[10] H. Markram. The blue brain project. Nature Reviews Neuroscience,
7(2):153–160, 2006.
[11] A. Mignone, C. Zanni, P. Tzeferacos, B. van Straalen, P. Colella, and
G. Bodo. The PLUTO Code for Adaptive Mesh Computations in
Astrophysical Fluid Dynamics. The Astrophysical Journal Supplement
Series, 198:7, Jan. 2012.
[12] K. Petersen, C. Wohlin, and D. Baca. The waterfall model in large-scale
development. In International Conference on Product-Focused Software
Process Improvement, pages 386–400. Springer, 2009.
[13] Robert Half International. 6 basic SDLC methodologies:
Which one is best? https://www.roberthalf.com/technology/blog/
6-basic-sdlc-methodologies-the-pros-and-cons.
[14] I. Spence and K. Bittner. What is iterative development? https://www.
ibm.com/developerworks/rational/library/may05/bittner-spence/.
[15] R. Teyssier. Cosmological hydrodynamics with adaptive mesh refine-
ment. A new high resolution code called RAMSES. Astronomy and
Astrophysics, 385:337–364, Apr. 2002.
[16] Tutorials Point. SDLC: Big bang model. https://www.tutorialspoint.
com/sdlc/sdlc bigbang model.htm.
[17] Tutorials Point. SDLC: V-model. https://www.tutorialspoint.com/sdlc/
sdlc v model.htm.
[18] Tutorials Point. Software development life cycle tutorial. http://www.
tutorialspoint.com/sdlc/index.htm.

View publication stats

You might also like