Ubc 2021 May Rodriguez Gilberto

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 201

How to Improve Prediction Accuracy

in the Analysis of Computer


Experiments
Exploitation of Low-Order Effects and
Dimensional Analysis
by

Gilberto Alexi Rodrı́guez Arelis,

B.Sc., Tecnológico de Monterrey, 2007


M.Sc., Tecnológico de Monterrey, 2013

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF


THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

The Faculty of Graduate and Postdoctoral Studies

(Statistics)

THE UNIVERSITY OF BRITISH COLUMBIA


(Vancouver)
December 2020
c Gilberto Alexi Rodrı́guez Arelis, 2020
The following individuals certify that they have read, and recommend to
the Faculty of Graduate and Postdoctoral Studies for acceptance, the dis-
sertation entitled:

How to Improve Prediction Accuracy in the


Analysis of Computer Experiments
Exploitation of Low-Order Effects and Dimensional Analysis

submitted by Gilberto Alexi Rodrı́guez Arelis in partial fulfillment of the


requirements for the degree of Doctor of Philosophy in Statistics.

Examining Committee:

William J. Welch, Statistics, UBC


Supervisor

James V. Zidek, Statistics, UBC


Supervisory Committee Member

Daniel J. McDonald, Statistics, UBC


University Examiner

Lutz Lampe, Electrical and Computer Engineering, UBC


University Examiner

Additional Supervisory Committee Member:

Marie Auger-Méthé, Statistics, UBC


Supervisory Committee Member

ii
Abstract
A wide range of natural phenomena and engineering processes make phys-
ical experimentation hard to apply, or even impossible. To overcome these
issues, we can rely on mathematical models that simulate these systems
via computer experiments. Nonetheless, if the experimenter wants to ex-
plore many runs, complex computer codes can be excessively resource and
time-consuming.
Since the 1980s, Gaussian Stochastic Processes have been used in com-
puter experiments as surrogate models. Their objective can be predicting
outputs at untried input runs, given a model fitted with a training design
coming from the computer code. We can exploit different modelling strate-
gies to improve prediction accuracy, e.g., the regression component or the
correlation function. This thesis makes a comprehensive exploitation of two
additional strategies, which the existing literature has not fully addressed
in computer experiments.
One of these strategies is implementing non-standard correlation struc-
tures in model training and testing. Since the beginning of factorial designs
for physical experiments in the first half of the 20th century, there have been
basic guidelines for modelling from three effect principles: Sparsity, Hered-
ity, and Hierarchy. We explore these principles in a Gaussian Stochastic
Process by suggesting and evaluating novel correlation structures.
Our second strategy focuses on output and input transformations via
Dimensional Analysis. This methodology pays attention to fundamental
physical dimensions when modelling scientific and engineering systems. It
goes back at least a century but has recently caught statisticians’ attention,
particularly in the design of physical experiments. The core idea is to analyze
dimensionless quantities derived from the original variables.
While the non-standard correlation structures depict additive and low-
order interaction effects, applying the three principles above relies on a
proper selection of effects. Similarly, the implementation of Dimensional
Analysis is far from straightforward; choosing the derived quantities is par-
ticularly challenging. Hence, we rely on Functional Analysis of Variance as
a variable selection tool for both strategies. With the “right” variables, the

iii
Abstract

Gaussian Stochastic Process’ prediction accuracy improves for several case


studies, which allows us to establish new modelling frameworks for computer
experiments.

iv
Lay Summary
We use computer experiments to simulate natural phenomena and engi-
neering processes where physical experimentation is too costly or infeasi-
ble. One example is the prediction of hurricane flood hazards as a function
of diverse environmental factors. These experiments rely on mathematical
models via computer codes. Nonetheless, these codes can be excessively
resource-consuming. Thus we could rely on Gaussian Stochastic Processes,
which are computationally faster surrogate statistical models for these codes.
In this thesis, we pay attention to the prediction accuracy of these sur-
rogates using two strategies: novel data correlation structures and Dimen-
sional Analysis. Both have significant challenges in terms of variable selec-
tion, which we address with a statistical tool called Functional Analysis of
Variance. We apply our strategies in several case studies to improve pre-
diction accuracy. Thus we propose new modelling frameworks in computer
experiments to accomplish more accurate surrogate models.

v
Preface
This doctoral dissertation has been written under the supervision of Dr.
William J. Welch. Potential approaches were discussed in weekly research
meetings. I was responsible for conducting the coding for all the case studies,
along with necessary mathematical formulations. Furthermore, I was in
charge of the thesis composition. Subsequent revisions were discussed with
Professor Welch.
Chapters 3 and 4 are based on a paper manuscript already written
on non-standard correlation structures in a Gaussian Stochastic Process.
It details the basis of the principles coming from traditional factorial ex-
periments described in Chapter 1. The study cases are centred on the
Michalewicz, Friedman, and Franke functions. They are the main thread
in this work whose final objective is to improve prediction accuracy in a
Gaussian Stochastic Process, with the help of Functional Analysis of Vari-
ance. Another two additional case studies from Chapter 4 are likely to
be included in this manuscript since they are applications that go beyond
mathematical test functions.
Chapter 5 was initially projected as a single manuscript on the use of
Dimensional Analysis in a Gaussian Stochastic Process, along with Func-
tional Analysis of Variance based on three case studies. However, further
research allows us to split up this chapter in two paper manuscripts (already
in process) as follows:

• The first two case studies in this chapter are an appropriate example
of the use of Dimensional Analysis: Buckingham’s Π-theorem. They
set up a basis for the second manuscript.

• The third case study, on hurricane storm surge, is the culmination of


the previous two cases. Its complexity justifies a single and separate
paper manuscript. This case study’s data stem from the collaboration
with Whitney Huang (Clemson University; South Carolina, USA) and
Taylor Asher (The University of North Carolina at Chapel Hill, USA).

vi
Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Lay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Use of a Factorial Effect Framework . . . . . . . . . . . 3
1.1.1 Historical Background of Experimental Guidelines . . 3
1.1.2 Formal Definitions of the Effect Principles . . . . . . 4
1.2 Dimensional Analysis . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Why Dimensionality Should Be Important for Statis-
ticians? . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Early Works in Dimensional Analysis . . . . . . . . . 7
1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . 9

2 The Gaussian Stochastic Process Model . . . . . . . . . . . 12


2.1 The Regression Component . . . . . . . . . . . . . . . . . . . 13
2.2 Choice of Regression Component . . . . . . . . . . . . . . . . 13
2.3 Correlation in Random Function . . . . . . . . . . . . . . . . 14
2.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

vii
Table of Contents

2.4.1 Maximum Likelihood Estimation . . . . . . . . . . . 17


2.4.2 Optimization Procedure . . . . . . . . . . . . . . . . . 17
2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 Best Linear Unbiased Predictor . . . . . . . . . . . . 19
2.5.2 Prediction Accuracy . . . . . . . . . . . . . . . . . . . 20
2.6 The Effect of the Nugget Term . . . . . . . . . . . . . . . . . 21
2.7 Training and Testing Designs . . . . . . . . . . . . . . . . . . 21

3 Main-Effect Gaussian Processes . . . . . . . . . . . . . . . . . 23


3.1 Basis Functions and Interpolation . . . . . . . . . . . . . . . 24
3.2 Implication of the Standard Correlation Structure . . . . . . 25
3.3 Main-Effect Correlation Structures . . . . . . . . . . . . . . . 26
3.4 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Michalewicz Function . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1 Overview of the Composite Gaussian Process Model . 32
3.5.2 Performance of the Composite Gaussian and Standard
Stationary Gaussian Stochastic Processes . . . . . . . 32
3.5.3 Simulation Settings . . . . . . . . . . . . . . . . . . . 33
3.5.4 Prediction Results . . . . . . . . . . . . . . . . . . . . 35
3.6 Weighted Michalewicz Function . . . . . . . . . . . . . . . . 40
3.6.1 Simulation Settings . . . . . . . . . . . . . . . . . . . 42
3.6.2 Prediction Results . . . . . . . . . . . . . . . . . . . . 42
3.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 46

4 Gaussian Processes with Low-Order Joint Effects . . . . . 48


4.1 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 Progressive Covariance Function . . . . . . . . . . . . 49
4.1.2 Disjoint Correlation Structure . . . . . . . . . . . . . 50
4.2 Joint-Effect Correlation Structures up to All 2-Input Interac-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Friedman Function . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 Overview of the Multivariate Adaptive Regression Splines
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.2 Simulation Settings . . . . . . . . . . . . . . . . . . . 54
4.3.3 Prediction Results . . . . . . . . . . . . . . . . . . . . 55
4.4 Joint-Effect Correlation Structures up to Selected 2-Input In-
teractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Functional Analysis of Variance . . . . . . . . . . . . . . . . 57
4.6 Franke Function . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6.1 Simulation Settings . . . . . . . . . . . . . . . . . . . 62

viii
Table of Contents

4.6.2 Prediction Results . . . . . . . . . . . . . . . . . . . . 67


4.7 Joint-Effect Correlation Structures up to All 3-Input Interac-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.8 Joint-Effect Correlation Structure with a Residual Effect Term
70
4.9 Output Transformerless Circuit Function . . . . . . . . . . . 72
4.9.1 Overview of the Project Pursuit Regression Model . . 72
4.9.2 Simulation Settings . . . . . . . . . . . . . . . . . . . 73
4.9.3 Prediction Results . . . . . . . . . . . . . . . . . . . . 78
4.10 Joint-Effect Correlation Structures up to Selected 3-Input In-
teractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.11 Nilson-Kuusk Model . . . . . . . . . . . . . . . . . . . . . . . 82
4.11.1 Simulation Settings . . . . . . . . . . . . . . . . . . . 82
4.11.2 Prediction Results . . . . . . . . . . . . . . . . . . . . 87
4.12 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 89

5 Dimensional Analysis . . . . . . . . . . . . . . . . . . . . . . . 90
5.1 Previous Applications of Dimensional Analysis . . . . . . . . 93
5.2 Buckingham’s Π-Theorem . . . . . . . . . . . . . . . . . . . . 96
5.3 Borehole Function . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.1 Dimensional Analysis . . . . . . . . . . . . . . . . . . 100
5.3.2 Simulation Settings . . . . . . . . . . . . . . . . . . . 104
5.3.3 Prediction Results . . . . . . . . . . . . . . . . . . . . 106
5.3.4 Extrapolation . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Heat Transfer in a Solid Sphere . . . . . . . . . . . . . . . . 111
5.4.1 Dimensional Analysis . . . . . . . . . . . . . . . . . . 113
5.4.2 Simulation Settings . . . . . . . . . . . . . . . . . . . 118
5.4.3 Prediction Results . . . . . . . . . . . . . . . . . . . . 120
5.4.4 Extrapolation . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Storm Surge Model . . . . . . . . . . . . . . . . . . . . . . . 123
5.5.1 The Joint Probability Method . . . . . . . . . . . . . 124
5.5.2 Advanced Circulation Model Settings . . . . . . . . . 124
5.5.3 Computation of Additional Inputs . . . . . . . . . . . 125
5.5.4 Distribution of Maximum Storm Surge . . . . . . . . 127
5.5.5 Dimensional Analysis . . . . . . . . . . . . . . . . . . 129
5.5.6 Simulation Settings . . . . . . . . . . . . . . . . . . . 136
5.5.7 Prediction Results . . . . . . . . . . . . . . . . . . . . 138
5.5.8 Extrapolation . . . . . . . . . . . . . . . . . . . . . . 142
5.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 145

ix
Table of Contents

6 Final Remarks and Future Work . . . . . . . . . . . . . . . . 147


6.1 Summary of the Thesis . . . . . . . . . . . . . . . . . . . . . 147
6.2 Discussion and Future Work . . . . . . . . . . . . . . . . . . 150

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Appendices

A Correlation Structures . . . . . . . . . . . . . . . . . . . . . . . 162


A.1 Main-Effect Correlation Structures . . . . . . . . . . . . . . . 162
A.2 Joint-Effect Correlation Structures Up to All 2-Input Inter-
actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
A.3 Joint-Effect Correlation Structures up to Selected 2-Input In-
teractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A.4 Joint-Effect Correlation Structures with a Residual Effect
Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
A.5 Joint-Effect Correlation Structures Up to All 3-Input Inter-
actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
A.6 Joint-Effect Correlation Structures up to Selected 3-Input In-
teractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

B Heat Transfer in a Solid Sphere . . . . . . . . . . . . . . . . . 169

C Storm Surge Model . . . . . . . . . . . . . . . . . . . . . . . . 173

x
List of Tables

1.1 Overview of Studies . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Michalewicz Function: Simulation Settings . . . . . . . . . . . 34

4.1 Friedman Function: Simulation Settings . . . . . . . . . . . . 55


4.2 Expanded Franke Function: Simulation Settings . . . . . . . 63
4.3 OTL Circuit Function: Prediction Performance Comparison
by Non-Parametric Smoothing Method . . . . . . . . . . . . . 74
4.4 OTL Circuit Function: Simulation Settings . . . . . . . . . . 75
4.5 OTL Circuit Function: Summary Statistics of Prediction Ac-
curacy by Type of GaSP at n = 240 . . . . . . . . . . . . . . 79
4.6 Nilson-Kuusk Model: Simulation Settings . . . . . . . . . . . 83
4.7 Nilson-Kuusk Model: Summary Statistics of Prediction Ac-
curacy by Type of GaSP . . . . . . . . . . . . . . . . . . . . . 88

5.1 Borehole Function: Inputs, Dimensions, Units, and Ranges . 99


5.2 Baseline Borehole: Simulation Settings . . . . . . . . . . . . . 104
5.3 Expanded Log-Input Borehole: Simulation Settings . . . . . . 105
5.4 Heat Transfer in a Solid Sphere: Inputs, Dimensions, Units,
and Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 Heat Transfer in a Solid Sphere: General Simulation Settings 118
5.6 Heat Transfer in a Solid Sphere: Specific Simulation Settings
by Non-DA and DA . . . . . . . . . . . . . . . . . . . . . . . 119
5.7 Storm Surge Model: Inputs, Dimensions, Units, and Ranges . 129
5.8 Storm Surge Model: Specific Simulation Settings by Non-DA
and DA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.9 Storm Surge Model: Summary Statistics of Prediction Error
by Non-DA and DA . . . . . . . . . . . . . . . . . . . . . . . 143

6.1 OTL Circuit Function: Summary Statistics of Prediction Ac-


curacy by Type of GaSP at n = 240 Including TAAG . . . . 151

xi
List of Figures

3.1 Michalewicz Function: 2-Dimensional Plot . . . . . . . . . . . 30


3.2 Michalewicz Function: Marginal Additive Components
up to Dimension d = 7 . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Michalewicz Function: Prediction Accuracy by Type of GaSP
in d = 3, 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Michalewicz Function: Prediction Accuracy by Type of GaSP
in d = 5, 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Michalewicz Function: Prediction Accuracy by Type of GaSP
in d = 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Weighted Michalewicz Function: 2-Dimensional Plot . . . . . 40
3.7 Weighted Michalewicz Function: Marginal Additive Compo-
nents up to Dimension d = 7 . . . . . . . . . . . . . . . . . . 41
3.8 Weighted Michalewicz Function: Prediction Accuracy by Type
of GaSP in d = 3, 4 . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9 Weighted Michalewicz Function: Prediction Accuracy by Type
of GaSP in d = 5, 6 . . . . . . . . . . . . . . . . . . . . . . . . 45
3.10 Weigthed Michalewicz Function: Prediction Accuracy by Type
of GaSP in d = 7 . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 Friedman Function: Prediction Accuracy by Type of GaSP . 56


4.2 Franke Function: 2-Dimensional Plot . . . . . . . . . . . . . . 61
4.3 Expanded Franke Function: FANOVA Summary Plot by Type
of Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Expanded Franke Function: FANOVA Percentage Contribu-
tions by Type of Effect at n = 160 . . . . . . . . . . . . . . . 66
4.5 Expanded Franke Function: Prediction Accuracy by Type of
GaSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 OTL Circuit Function: FANOVA Percentage Contributions
by Type of Effect at n = 60 . . . . . . . . . . . . . . . . . . . 77
4.7 OTL Circuit Function: Prediction Accuracy by Type of GaSP
at n = 60, 120, 240 . . . . . . . . . . . . . . . . . . . . . . . . 78

xii
List of Figures

4.8 OTL Circuit Function: Prediction Accuracy by Type of GaSP


at n = 240. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.9 Nilson-Kuusk Model: FANOVA Summary Plots by Type of
Effect at n = 200 . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.10 Nilson-Kuusk Model: FANOVA Plots by Type of Effect at
n = 200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.11 Nilson-Kuusk Model: Prediction Accuracy by Type of GaSP 87

5.1 Borehole Function: FANOVA Percentage Contributions of


Main Effects at n = 80 . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Borehole Function: FANOVA Percentage Contributions of In-
put Interactions at n = 80 . . . . . . . . . . . . . . . . . . . . 102
5.3 Borehole Function: Prediction Accuracy by Type of DA . . . 108
5.4 Borehole Function: Prediction Accuracy by Type of DA with
an Extrapolated Testing Set . . . . . . . . . . . . . . . . . . . 110
5.5 Heat Transfer in a Solid Sphere: FANOVA Percentage Con-
tributions by Type of Effect at n = 70 . . . . . . . . . . . . . 113
5.6 Heat Transfer in a Solid Sphere: Estimated Main Effect of Ra-
dius of Sphere (r) on Temperature of Sphere (Ts ) and Fitted
Values coming from Least Squares Regression of Ts against
1/r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7 Heat Transfer in a Solid Sphere: Scatterplots of Four Output
Setups Against Time (t) . . . . . . . . . . . . . . . . . . . . . 117
5.8 Heat Transfer in a Solid Sphere: Prediction Accuracy by Type
of DA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.9 Heat Transfer in a Solid Sphere: Prediction Accuracy by Type
of DA with an Extrapolated Testing Set . . . . . . . . . . . . 122
5.10 Storm Surge Model: North Carolina’s Coastal Locations . . . 123
5.11 Storm Surge Model: Distribution of Maximum Storm Surge
by Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.12 Storm Surge Model: Distribution of Locations at Landfall . . 128
5.13 Storm Surge Model: FANOVA Percentage Contributions of
Main Effects and 2-Input Interactions in Locations 7 and 8 . 131
5.14 Storm Surge Model: FANOVA Percentage Contributions of
Main Effects and Higher-Order Input Interactions in Loca-
tions 9 and 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.15 Storm Surge Model: Estimated Main Effect of Central Pres-
sure Deficit (∆p ) on Maximum Storm Surge (ηimax ) in Loca-
tions 7, 8, 9, and 10 . . . . . . . . . . . . . . . . . . . . . . . 133

xiii
List of Figures

5.16 Storm Surge Model: Spatial Distribution of Storm Tracks and


Pressure Deficit Levels . . . . . . . . . . . . . . . . . . . . . . 134
5.17 Storm Surge Model: North Carolina’s Coastal Locations 9
and 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.18 Storm Surge Model: Prediction Accuracy in
Locations 7 and 8 . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.19 Storm Surge Model: Prediction Accuracy in
Locations 9 and 10 . . . . . . . . . . . . . . . . . . . . . . . . 141
5.20 Storm Surge Model: Prediction Error in with an Extrapolated
Testing Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.1 OTL Circuit Function: Prediction Accuracy by Type of GaSP


at n = 60, 120, 240 Including TAAG . . . . . . . . . . . . . . 152

B.1 Heat Transfer in a Solid Sphere: Estimated Main Effect of


Ratio of Distance from Center and Sphere Radius (R) and
Radius of Sphere (r) on Temperature of Sphere (Ts ) . . . . . 169
B.2 Heat Transfer in a Solid Sphere: Estimated Main Effect of
Time (t) and Temperature of Medium (Tm ) on Temperature
of Sphere (Ts ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B.3 Heat Transfer in a Solid Sphere: Estimated Main Effect of Ini-
tial Sphere Temperature minus Temperature of Medium (∆T )
and Thermal Conductivity (k) on Temperature of Sphere (Ts ) 171
B.4 Heat Transfer in a Solid Sphere: Estimated Main Effect of
Convective Heat Transfer Coefficient (hc ) on Temperature of
Sphere (Ts ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

C.1 Storm Surge Model: FANOVA Percentage Contributions of


Main Effects and 2-Input Interactions in Locations 1 and 2 . 174
C.2 Storm Surge Model: FANOVA Percentage Contributions of
Main Effects and Higher-Order Input Interactions in Loca-
tions 3 and 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
C.3 Storm Surge Model: FANOVA Percentage Contributions of
Main Effects and 2-Input Interactions in Locations 5 and 6 . 176
C.4 Storm Surge Model: Estimated Main Effect of Central Pres-
sure Deficit (∆p ) on Maximum Storm Surge (ηimax ) in Loca-
tions 1, 2, 3, and 4 . . . . . . . . . . . . . . . . . . . . . . . . 177
C.5 Storm Surge Model: Estimated Main Effect of Central Pres-
sure Deficit (∆p ) on Maximum Storm Surge (ηimax ) in Loca-
tions 5 and 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

xiv
List of Figures

C.6 Storm Surge Model: Prediction Accuracy in


Locations 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . 179
C.7 Storm Surge Model: Prediction Accuracy in
Locations 3 and 4 . . . . . . . . . . . . . . . . . . . . . . . . . 180
C.8 Storm Surge Model: Prediction Accuracy in
Locations 5 and 6 . . . . . . . . . . . . . . . . . . . . . . . . . 181

xv
Glossary
ABCD Automatic Bayesian Covariance Discovery (p. 153)

ADCIRC Advanced Circulation (p. 92)

BIC Bayesian Information Criterion (p. 153)

BLUP Best Linear Unbiased Predictor (p. 19)

CGP Composite Gaussian Process (p. 30)

CME Conditional Main Effect (p. 150)

DA Dimensional Analysis (p. 3)

DOE Design of Experiments (p. 94)

E-JE2 Equal Weights on Joint Effects up to All 2-Input Interactions (p. 52)

E-JE2R Equal Weights on Joint Effects up to All 2-Input Interactions


with a Residual Effect Term (p. 70)

E-JE2S Equal Weights on Joint Effects up to Selected 2-Input Interac-


tions (p. 57)

E-JE2SR Equal Weights on Joint Effects up to Selected 2-Input Inter-


actions with a Residual Effect Term (p. 71)

E-JE3 Equal Weights on Joint Effects up to All 3-Input Interactions (p. 69)

E-JE3S Equal Weights on Joint Effects up to Selected 3-Input Interac-


tions (p. 81)

xvi
Glossary

E-ME Equal Weights on Main Effects (p. 27)

FANOVA Functional Analysis of Variance (p. 58)

GaSP Gaussian Stochastic Process (p. 1)

GP Gaussian Process (p. 1)

IQR Interquartile Range (p. 79)

JPM Joint Probability Method (p. 123)

L-BFGS Limited Memory Broyden-Fletcher-Goldfarb-Shanno (p. 17)

LHD Latin Hypercube Design (p. 22)

LOESS Locally Estimated Scatterplot Smoothing (p. 115)

MARS Multivariate Adaptive Regression Splines (p. 53)

Max Maximum (p. 79)

Min Minimum (p. 79)

MLE Maximum Likelihood Estimate (p. 17)

mLHD Maximin Latin Hypercube Design (p. 22)

MSE Mean Squared Error (p. 19)

NC North Carolina (p. 10)

N-RMSE Normalized Root Mean Squared Error (p. 20)

OTL Output Transformerless (p. 10)

PowExp Power Exponential (p. 15)

PPR Project Pursuit Regression (p. 72)

xvii
Glossary

rLHD Random Latin Hypercube Design (p. 22)

RMSE Root Mean Squared Error (p. 20)

SI International System of Units (p. 6)

SqExp Squared Exponential (p. 15)

Std Standard (p. 14)

SWAN Simulating Waves Nearshore (p. 154)

TAG Transformed Additive Gaussian (p. 150)

TAAG Transformed Approximately Additive Gaussian (p. 150)

U-JE2 Unequal Weights on Joint Effects up to All 2-Input Interactions


(p. 52)

U-JE2S Unequal Weights on Joint Effects up to Selected 2-Input Interac-


tions (p. 57)

U-JE3 Unequal Weights on Joint Effects up to All 3-Input Interactions


(p. 69)

U-JE3S Unequal Weights on Joint Effects up to Selected 3-Input Interac-


tions (p. 81)

U-ME Unequal Weights on Main Effects (p. 27)

U-RMSE Unnormalized Root Mean Squared Error (p. 32)

WAM Wave Model (p. 154)

WRF Weather Research and Forecasting (p. 154)

xviii
Acknowledgements
I would like to express my gratitude and appreciation to my supervisor,
Dr. William J. Welch, who provided magnificent support, guidance, encour-
agement, and feedback during all these years of my doctoral research. All
our conversations and meetings were amazingly crucial and fruitful during
the development of this dissertation. I would also like to thank my super-
visory committee, Dr. Marie Auger-Méthé and Dr. James V. Zidek, for
their thoughtful and excellent comments and observations on this research.
Their recommendations were a vital part of this work. Moreover, I would
like to acknowledge Whitney Huang and Taylor Asher for their collaborative
support and guidance on this dissertation’s final case study.
Finally, I would also like to thank all the amazing people I had the
opportunity to work with and learn within the Department of Statistics
at UBC. I express my appreciation to all the faculty members who were
my professors during my program coursework. Also, I am grateful to the
Applied Statistics and Data Science (ASDa) group for all the consulting
projects I had the opportunity to collaborate on. A big thank you to Peggy
Ng, Ali Hauschildt, Andrea Sollberger, Mairead Roche, and Sima Jafari
in the department’s main office for their outstanding support and all-time
effort.

xix
Dedication
First of all, the most significant recognition for my family. I have had the
most amazing parents all over my life, who have struggled all their lives,
and they have raised the most amazing children. Marı́a del Carmen and
Gilberto, all this journey has been for you, and I am so happy to have you
in my heart. My brother and I always try to be the best professionals in
life: Airel as a physician, whereas I as a statistician. I am pretty sure we
have never disappointed you all over this time. And Airel, you also are a
vital part of my heart; never forget that.
Then, all my friends on this journey have been another important part
of it. I am grateful to have you all on my side still. Many of them have been
far away all these years, but we have managed to be close in at least one
text message away. On the other hand, all my local friends and colleagues
have been the best support ever, and I am grateful to have you here. Many
of you are already far away, though, while others are still here next to me
when I need them. It is impossible to list names. Although, you know who
you are, and that is what matters. You are also in my heart.

xx
Chapter 1

Introduction
Physical experiments can be hard to apply in a wide range of natural phe-
nomena and engineering processes. Difficulties arise in cases where physical
experimentation may be too resource-consuming or even impossible. One
example is the prediction of hurricane flood hazard as a function of diverse
environmental factors. To overcome these issues, we can rely on mathemat-
ical models that simulate a given physical system via computer codes.
Bastos and O’Hagan (2009) state that computer codes attempt to ex-
amine complex systems in different scientific fields. Nonetheless, the exper-
imenter can assume that the output is generated by a complex unknown
function subject to a specific input set. Hence, a statistical model can be
used as an emulator to approximate these expensive computer codes. The
specific surrogate model in computer experiments was adopted and adapted
from spatial statistics, namely kriging, where we view the response as a real-
ization of a stochastic process. This model is known as a Gaussian Stochastic
Process (GaSP) introduced by Sacks et al. (1989). It is also known merely
as a Gaussian Process (GP).
A computer experiment’s primary objective, and thus a GaSP emulator,
depends on the specific research context. Prediction can be an objective,
i.e., predicting outputs at untried inputs given a model fitted with a training
design coming from a computer code. We can also use a computer exper-
iment for optimization purposes, where a surface is fitted over a region of
interest to find a minimum or a maximum, as noted by Jones et al. (1998).
We explore the prediction accuracy paradigm in a GaSP model throughout
this thesis.
The assumption of independent random errors between runs (at different
input configurations) is common in traditional design and analysis of phys-
ical experiments. Given a set of inputs including any boundary conditions,
unlike physical experiments, a computer experiment is often deterministic
(McKay et al., 1979; Sacks et al., 1989). Therefore, replicate runs of a com-
puter code are wasteful as they would yield the same output. Nonetheless,
the emulator GaSP has a random component to consider uncertainty at
untried input vectors where the computer code is not run.

1
Chapter 1. Introduction

We assume a correlation between the available points in our data for


model training under the stochastic process approach. A correlation struc-
ture represents the distance between these points, and we use it to explain
the dependence between their corresponding outputs. The use of a specific
correlation structure can be validated through cross-validation of the train-
ing data. Moreover, the correlation structure setup in a GaSP can be an
essential attribute in prediction accuracy, but it has not been fully explored
in the existing literature (Duvenaud, 2014; Kandasamy et al., 2015).
Besides, a common practice when fitting a GaSP in a computer experi-
ment is the use of inputs and outputs in a “raw” form. Nevertheless, is it pos-
sible to improve prediction accuracy by input and output transformations?
Moreover, it might be crucial to consider the dimensional relationships in
a system when it comes to implementing a given modelling strategy. Since
a computer experiment has to deal with complex natural phenomena and
engineering processes, it is desirable to exploit the meaningful dimensional
relationships between all factors involved.
As noted above, we restrict our attention to prediction accuracy im-
provement in a computer experiment. Since a GaSP is used as a surrogate
model for computer codes that are too resource-consuming to execute at
many input configurations, or when a given code is not available to run any-
more, accuracy in terms of output prediction at untried inputs is essential
if the experimenter wants to explore different input settings. Hence, we can
see why the previously mentioned random component is a key part of GaSP
modelling. Moreover, while other active learning methods can also be used
as surrogate models for computer emulators, we have to point out that one
of the GaSP’s most appealing attributes is its ability to handle non-linear
and complex outputs with small training sizes. For computer experiments,
GaSPs can be more accurate than other non-parametric regression meth-
ods (e.g.; Ben-Ari and Steinberg, 2007) and GaSPs are related to Bayesian
neural networks (Neal, 2012).
The main thrust of this thesis is GaSP prediction accuracy improvement
with two different strategies:

1. For given input and output variables, the choice of a suitable corre-
lation structure plays a key role in a GaSP. One of the critical ap-
plications of a computer code is the replacement of certain physical
experiments. Hence, we could modify the correlation structure by fol-
lowing the available guidelines for fractional factorial designs in physi-
cal experiments, namely three effect principles (Box and Meyer, 1985;
Hamada and Wu, 1992, 2000). Such alternative correlation structures

2
1.1. The Use of a Factorial Effect Framework

are not common since a GaSP has a standard correlation structure, as


noted in Chapter 2.

2. The use of input and output transformations in GaSP modelling needs


a principled framework. Dimensional Analysis (DA) can be a suitable
approach since it is motivated by scientifically valid input and output
relationships. Physical fundamental quantities play a key role in DA.
The literature offers some previous DA-related work (Shen et al., 2014;
Shen and Lin, 2018; Shen et al., 2018), but the identification of the
fundamental physical quantities that are the basis for transformed
variables from the many possible choices is not part of the original
theory. This thesis will explore this matter.

1.1 The Use of a Factorial Effect Framework


We can find some work in the literature regarding the exploitation of mod-
elling options in a GaSP, e.g., Chen et al. (2016). Nonetheless, there are
scarce works that exploit more radical structural modifications of the cor-
relation structure (Duvenaud, 2014; Kandasamy et al., 2015). Hence, by
taking on some available guidelines from factorial experiments, this thesis
explores novel correlation structures for a GaSP. We show that these cor-
relation structures are competitive against the standard GaSP, in terms of
prediction accuracy, using five different cases of study.
Factorial experiments were primarily introduced during the first half of
the 20th century. Fisher (1926) presents one of the first notions of a factorial
experiment, which he calls a complex experiment. His practical example in-
volved winter oats, which are subject to three factors. According to Fisher
(1926), a complex experiment allows simultaneous comparisons of each fac-
tor’s levels instead of conducting single-factor experiments. Moreover, a full
factorial design allows estimation of all possible interactions between factors.
Further works introduce substantial new ideas that relax the requirement of
making runs at all factorial combinations of the factor levels, such as Yates
(1935) and (1937).

1.1.1 Historical Background of Experimental Guidelines


Some important guiding principles have been implied since the early stages
of factorial experiments. Yates (1935) notes that, within a factorial ex-
periment, treatment comparisons can be decomposed into main effects and
interactions of different orders. Nevertheless, to explain the confounding

3
1.1. The Use of a Factorial Effect Framework

concept, it is pointed out that high-order interactions are usually assumed


to be non-existent or negligible compared to experimental errors.
Yates (1937) indicates that a confounded experiment splits up differ-
ent treatment combinations into two or more blocks. Hence, the contrasts
between the blocks are aliased with high-order interactions since the facto-
rial experiment mainly focuses on the main effects and 2-factor interactions.
Unlike the idea illustrated by Fisher (1926), confounding will generate a
fractional factorial design whose primary properties are also described by
Finney (1943).
Box et al. (1978), as part of their work related to fractional factorial
designs, clarify that not all the effects are of substantial size in a full factorial
experiment. Furthermore, there is an effect hierarchy in an experimental
design: the absolute values of main effects tend to be larger than 2-factor
interactions, which also tend to be larger than 3-factor interactions, and so
on. There is also an association between the main effects and interactions,
and a Taylor series expansion of a response function. Therefore, main effects
correspond to the first-order terms of the expansion, 2-factor interactions to
the second-order terms, etc. If we only consider main effects and 2-factor
interactions in the factorial experiment, we will be ignoring the rest of the
terms in the series expansion.
Box et al. (1978) also mention that higher-order interactions often tend
to be negligible, and we can disregard them. Moreover, some experimental
variables do not have substantial effects when the total number is large. Both
ideas are contained in the concept of redundancy, an excess of interactions
and main effects that could be estimated in a physical experiment. The idea
of redundancy is highly exploited in a fractional factorial design.

1.1.2 Formal Definitions of the Effect Principles


As implied in Section 1.1.1 some guiding principles go back to shortly after
the introduction of factorial designs, but there were no formal definitions
until later works. Based on the Taylor series argument of Box et al. (1978),
three essential principles were defined as follows:
• Effect Sparsity Principle. This principle, coined by Box and Meyer
(1985), states that there is a small proportion of active effects in a
factorial experiment while the rest of them are inert. The principle
is illustrated by a common industrial problem with a large number
of factors, where it is necessary to find those few with large effects.
There is an analogue of this principle in the use of Pareto diagrams in
quality control studies.

4
1.2. Dimensional Analysis

• Effect Heredity Principle. This principle, coined by Hamada and


Wu (1992), states that at least one of the parent main effects is sig-
nificant for a given significant interaction. Yates (1937) implies this
principle by mentioning that large main effects may also produce sub-
stantial interactions, compared to small main effects that usually do
not show significant interactions.
• Effect Hierarchy Principle. This principle, coined by Hamada and
Wu (2000), indicates that lower-order effects in a factorial experiment
tend to be significant rather than higher-order effects. Hence, we can
focus the factorial experiment on main effects, 2-factor, and possibly
3-factor interactions. Yates (1937) also implies this hierarchical prin-
ciple.
While these principles were developed largely for designing physical
fractional-factorial experiments, they are adapted for computer experiments
in this thesis for the same reason: to exploit any simplicity in the unknown
function being modelled. Furthermore, in the presence of a large number of
factors, a proper variable selection approach is necessary to fulfil the previ-
ous three principles. For example, which low-order interaction effects should
be explicitly modelled? Hence, we have to keep in mind the challenge of se-
lecting the highly active main effects and input interactions. This matter
will be addressed in Chapter 4.

1.2 Dimensional Analysis


Concerns about dimensionality (in the sense of scientific units) in statistical
theory and practice can be found in early works such as Finney (1977), who
states the following :
“I am surprised by the lack of attention given to ‘dimensions’
as a check on the theory and practice of statistics. The basic
ideas, readily appreciated, should form part of the stock-in-trade
of every statistician; those of us who teach should make our
students explicitly aware of them.”
Finney (1977) points out the dissimilarity between the importance at-
tributed to the “theory of dimensions” in physics and statistics:
• In terms of physics, it is stated that “[...] no textbook of physics or
mechanics was complete without a chapter on the ‘theory of dimen-
sions.’ Those who mastered it could go far towards constructing the

5
1.2. Dimensional Analysis

essential features of a formula merely from the dimensionalities of the


measurements that had to enter.”

• For the case of statisticians, it is pointed out that “[...] Those statis-
ticians who recognize how deeply rooted their science is in data are
doubtless well aware of the relevance of dimensionality considerations.
I know of no statistical text that makes even the most elementary
points, yet these can be enormously useful in rapid checking of asser-
tions and formulae.”

As a side note, we need to clarify the difference between a dimension


and its empirical scales of measurement which are simply known as “units”.
Thus, we have seven fundamental physical dimensions: mass (M), length
(L), time (T), temperature (Θ), electric current (Q), amount of substance
(N), and luminous intensity (Iv ). On the other hand, we could find more
than one class of unit for each fundamental dimension; e.g., the International
System of Units (SI) establishes the following units: kilogram (kg) for M,
meter (m) for L, second (s) for T, kelvin degree (K) for Θ, ampere (A) for
Q, mole (mol) for N, and candela (cd) for Iv .
Lee and Zidek (2020) add that statisticians use variables, e.g. X, to
represent numbers in their models and equations. Notwithstanding, a given
variable such as X denotes a natural phenomenon or process in other scien-
tific fields. This variable X has one or more embedded fundamental dimen-
sions. Besides, it is necessary to specify a type of scale: ordinal, ratio, or
interval. The scale selection may depend on the purpose of a given experi-
ment.
For instance, for the case of height whose dimension is L, an ordinal scale
could involve two levels: short and tall. A ratio scale implies a numerical
measurement with their corresponding units (e.g., m for height), and a value
of zero indicates the absence of the variable. The interval case is also mea-
sured on a numerical scale. However, it allows the measurement to take on
negative values; thus, the value of zero does not indicate an absence of the
variable. The interval scale of Celsius degrees (C) is a common example.
From the physics perspective, Finney (1977) identifies three “mechan-
ical” fundamental dimensions of a ratio-scale type: L, M, and T. These
dimensions and the remaining four above, can be combined in the form of
products and/or quotients to generate what we call derived dimensions. For
instance; area (A) is defined as L2 , density (ρ) as ML−3 , frequency (f) as
T−1 , etc. Sonin (2001) provides more examples on further derived dimen-
sions. Any change of units of the fundamental dimensions, namely a scaling

6
1.2. Dimensional Analysis

factor, will impact the derived ones in the form of a product and/or quotient
of those fundamental dimensions’ corresponding scaling factors.

1.2.1 Why Dimensionality Should Be Important for


Statisticians?
Even though essential tools such as test statistics must be equivalent under
any class of units, Finney (1977) enumerates an interesting list of statistical
cases where dimensionality has to be taken into account all the time. Oth-
erwise, we could encounter consistency problems when dealing with all the
variables in a given dataset. Depending on each statistical case’s nature,
either univariate or multivariate, we have to take into account different fea-
tures depending on the mathematical formulas involved.
For the univariate case, Finney (1977) assumes variable x on a ratio-scale
with dimension D. Basic statistics such as mean or median will keep the
same dimension. The author does not specify the class of mean, geometric or
arithmetic. Luce (1959) points out that both classes are appropriate for ratio
scales. In contrast, the arithmetic mean is only suitable for the interval-type.
A measure of dispersion, such as the standard deviation, will also keep the
same dimension. On the other hand, the variance will have dimension D2 .
For the geometric mean, a measure of dispersion is the geometric standard
deviation (Kirkwood, 1979), which is dimensionless.
A probability value must be dimensionless. Therefore, any formula giv-
ing a probability density function must be dimensionless. For instance, in a
function of this type, all addends in a polynomial factor where x could come
into play have to be dimensionless. Lee and Zidek (2020) also point out the
need to use dimensionless variables in transcendental functions. A critical
case of special importance in a GaSP is the correlation function. The GaSP
correlation functions to be described in Section 2.3 involve the exponential
function; using dimensionless (derived) inputs in them is, therefore, more
principled.

1.2.2 Early Works in Dimensional Analysis


Early works in DA go back at least a century to Buckingham (1914), who
proposes a theorem commonly known as Buckingham’s Π-theorem. This
work relies on a solid theoretical basis and matrix algebra. The theorem
states that dimensionless output and inputs can represent a system. DA is
also formalized in two other principles: the Principle of Similitude and the
Principle of Absolute Significance of Relative Magnitude.

7
1.2. Dimensional Analysis

The Principle of Similitude by Rayleigh (1915) states that important


laws of nature must be depicted in homogeneous equations involving in-
dependent fundamental physical dimensions. An equation is dimensionally
homogeneous when both sides have the same dimensional relationships along
with consistent units. Moreover, a homogeneous equation of d variables in-
volving k fundamental units on the right-hand side can be reduced to d − k
variables. The reduction uses k base quantities, whose corresponding fun-
damental physical dimensions are independent and come from the seven
mentioned earlier.
The Principle of Absolute Significance of Relative Magnitude by Bridg-
man (1931) states the fact of independence for base quantities as follows:

“A number Q obtained by inserting the numerical values of base


quantities into a formula is a physical quantity if the ratio of
any two samples of it remains constant when base unit sizes are
changed.”

As an introductory example of DA, let us consider the gravity displace-


ment for a falling object on the Earth whose output y is the final vertical
displacement with dimension [y] = L. The relationship between the output
y and the d = 4 inputs is depicted as follows:

gt2
y = y0 + V 0 t − , (1.1)
2
where

y0 : initial distance [y0 ] = L


V0 : initial velocity [V0 ] = LT−1
t : time of displacement [t] = T
g : gravitational acceleration [g] = LT−2 .

Buckingham’s Π-theorem requires positive variables as inputs. The in-


puts V0 , t, and g are on a ratio-scale and fulfil this condition. Further-
more, y0 is a positive ratio-scale displacement from an origin. Note that
equation (1.1) involves two fundamental physical dimensions: L and T, i.e.,
k = 2. A direct application of the theorem would require to choose d− k = 2
out of the four inputs above as base quantities.
However, for the sake of this example in the Earth since g is a constant,
we only have three active inputs in the system: y0 , V0 , and t. Therefore, we

8
1.3. Structure of the Thesis

could select two out of these three inputs (V0 and t) as base quantities so
equation (1.1) can become dimensionless as in
y y0 gt
= +1− ;
V0 t V0 t 2V0
which yields one dimensionless output q0 and two dimensionless inputs q1
and q2 :
y y0 gt
q0 = q1 = q2 = .
V0 t V0 t 2V0
Buckingham’s Π-theorem says that the relationship can be represented
using only these three variables. Thus, this new formulation reduces the sys-
tem’s dimensionality with composite output and inputs (i.e., dimensionless
variables based on products and quotients between the original ones). As a
side note, the input g could be considered non-constant in another environ-
ment, and it has to be taken into account in a DA. Note that y0 and t might
be an alternative pair of base quantities given their fundamental dimensions
L and T, respectively. Notwithstanding, DA’s performance in prediction ac-
curacy heavily relies on a given base quantity selection. Chapter 5 explores
this matter in further detail.

1.3 Structure of the Thesis


This thesis focuses on improving prediction accuracy in a GaSP by following
either of two paths: non-standard correlation structures in interpolation, or
DA in interpolation and extrapolation. Hence, it is organized as follows:
• Chapter 2 provides the general modelling attributes of a GaSP. We
summarize the frequentist-based estimation method of maximum like-
lihood on the respective model parameters. Furthermore, we provide a
brief description of the optimization procedure used in our simulation
studies and details on experimental designs.

• Chapter 3 introduces non-standard GaSP correlation structures under


a framework adapted from factorial design. We start with structures
than rely entirely on main effects for a particular case: the Michalewicz
function.

• Chapter 4 expands the use of non-standard GaSP correlation struc-


tures to low-order joint effects. These structures also rely on the ef-
fect principles from factorial design along with a sensitivity analysis

9
1.3. Structure of the Thesis

technique. We start with two test functions: Friedman and Franke


functions. Then, we proceed with two case studies that go beyond the
aforementioned mathematical functions. The first is an engineering
application: the output transformerless (OTL) circuit function. The
second implements these correlation structures for an ecological code:
the Nilson-Kuusk model.

• Chapter 5 introduces DA as a second approach for prediction accuracy


improvement. The method relies on input and output transformations
with base quantities before model fitting, and the choice of the base
quantities is critical. Proper implementation of DA also benefits from
the use of sensitivity analysis. The initial simulation studies in this
chapter are related to the Borehole function and a thermodynamic
model. We close this chapter with a hurricane study using a GaSP
as a surrogate model to predict storm surges over the North Carolina
(NC) coast.

• Chapter 6 makes concluding remarks on our respective approaches and


points out difficulties and opportunities observed during the research.

Table 1.1 illustrates the studies along with their respective sections in
each one of the subsequent chapters.

10
1.3. Structure of the Thesis

Approach Chapter Case Study Section

(3): Main-Effect Michalewicz (3.5)


Gaussian Processes function (3.6)

Friedman
(4.3)
function

Non-standard
GaSP correlation Franke
(4.6)
structures function
(4): Gaussian Processes
with Low-Order
Joint Effects
OTL circuit
(4.9)
function

Nilson-Kuusk
(4.11)
model

Borehole
(5.3)
function

(5): Input and output


Heat transfer in a
DA transformations prior (5.4)
solid sphere
to model fitting

Storm surge
(5.5)
model

Table 1.1: Overview of the studies.

11
Chapter 2

The Gaussian Stochastic


Process Model
In a computer simulator, the input vector x and its corresponding output y
are denoted as

x = (x1 , . . . , xd )> ∈ Rd and y = y(x)

respectively. Our training design D is the set of n input vectors

D = x(1) , . . . , x(n) ,


whose outputs are held in vector

y = (y1 , . . . , yn )> ∈ Rn .

In a GaSP, mainstream practice is to model training design D and out-


put vector y as the computer simulator provides them. Nonetheless, a DA
approach will not follow this practice, as shown in Chapter 5.
As modelled by Sacks et al. (1989), a GaSP defines a deterministic output
y(x) as a fixed regression component µ(x) along with the realization of a
random function Z(x):

Y (x) = µ(x) + Z(x); (2.1)

where the marginal distribution of the random function is

Z(x) ∼ N 0, σ 2 ),

and σ 2 is the unknown process variance.


Let x and x0 be two input vectors. The correlation between their respec-
tive random functions, Z(x) and Z(x0 ), is denoted as R(x, x0 ). Thus, the
covariance between these two random functions is defined as

Cov[Z(x), Z(x0 )] = σ 2 R(x, x0 ). (2.2)

12
2.1. The Regression Component

If the mean, variance, and covariances between the input vectors are
taken as constant all over the space; then we have a stationary GaSP. More-
over, let R be the n × n symmetric and positive semidefinite correlation
matrix where each element is R(x, x0 ) for all possible pairs x and x0 in the
training design D. We refer to R(x, x0 ) as the correlation structure of the
GaSP throughout this thesis, and its setup is the central focus in Chapters 3
and 4.

2.1 The Regression Component


From model (2.1), a fixed regression component µ(x) can be generalized as
follows:
Xk
µ(x) = β0 + βh fh (x), (2.3)
h=1

whose k functions fh (x) are known. Note that the regression component (2.3)
can be expressed in the matrix form

µ(x) = β > f (x),

where our coefficients vector

β = (β0 , β1 , . . . , βk )> ∈ Rk+1

is unknown, and
>
f (x) = 1, f1 (x), . . . , fk (x) ∈ Rk+1

is known. Furthermore, in terms of the training design D, we can construct


the n×(k+1) matrix F whose ith row corresponds to f (xi )> for i = 1, . . . , n.

2.2 Choice of Regression Component


The choice of the regression component depends on the purpose of the exer-
cise. For instance, spatial data analysis in an epidemiological study would
focus on factors such as demographic profile densities over space (Goovaerts,
2014). These factors could be incorporated in the regression component
µ(x), depicted in model (2.1).
However, in the context of computer experiments, the work by Chen
et al. (2016) presents cases where the inclusion of regression terms appears
unnecessary. Moreover, Sacks et al. (1989) mention that a departure from

13
2.3. Correlation in Random Function

the regression model is absorbed by Z(x). Hence, following the recommen-


dations found in these two references, all cases in Chapters 3 and 4 will have
a single constant term β0 in the regression component, i.e.,

µ(x) = β0 .

On the other hand, besides a single constant term, the cases regarding
the Borehole function and heat transfer in solid sphere in Chapter 5 will
implement a linear regression term as follows:

µ(x) = β > f (x)


= (β0 , β1 , . . . , βd )> (1, x1 , . . . , xd )
= β 0 + β 1 x1 + · · · + β d xd .

Therefore, the most general case for this component (involving vector β and
matrix F) will be described throughout subsequent sections in this chapter.

2.3 Correlation in Random Function


In modelling data from a traditional physical experiment, random errors are
usually assumed to be independent between all input vectors in design D.
However, since a computer code is deterministic, any incompleteness in the
regression component will not be due to this conventional randomness. Note
that the GaSP still has a random component, as depicted in (2.1). From the
covariance in (2.2), there is a correlation between Z(x) and Z(x0 ). Hence,
we cannot assume the independence above.
The standard correlation structure in a GaSP, abbreviated Std through-
out, is defined for input vectors x and x0 as
d
Y
RStd (x, x0 ) = Rj (hj ) ∈ [0, 1] (2.4)
j=1

which is function of the distance hj = xj − x0j for the jth input. A key
feature of the standard structure is that it is multiplicative across the d di-
mensions. It is important to mention that alternative correlation structures
will be introduced in Chapters 3 and 4. Another important characteristic is
that the correlation functions Rj (·) we will consider decay with increasing
hj . That implies that output values further apart in the xj direction will
be less correlated and hence differ more. In contrast, two output values at
nearby xj values are highly correlated and hence similar.

14
2.3. Correlation in Random Function

Jones et al. (1998) state that the correlation between a pair of points
cannot always be taken as a distance metric assigning the same weight to
all the d inputs. Thus, the distance is weighted using a hyperparameter
θj on the jth input. A large value of θj indicates a highly active input
variable. Furthermore, in terms of any correlation structure R(·, ·) in our d
dimensions, we have correlations in design D that are high when a pair of
input points is close and low when they are apart. A larger θj increases the
weighted distance between two inputs in dimension j larger, and hence the
correlation function decays faster as a function of distance. Again the lower
correlation between values of the output implies they are less related, i.e.,
the function is more active in dimension j.
We can find in the literature more than one choice available for Rj (hj )
as follows:
• Power Exponential (PowExp) Correlation Function. This func-
tion is expressed as
p
RjPowExp (hj ) = exp(−θj hj j ), (2.5)
where θj ≥ 0 and pj ∈ [1, 2] for j = 1, . . . , d. The parameter pj
is related to the smoothness of the function with respect to the jth
input. A value of pj = 2 corresponds to smooth functions, and less
smoothness is represented by values close or equal to 1 .
• Squared Exponential (SqExp) Correlation Function. It is a
special case of the previous correlation function when pj = 2:
RjSqExp (hj ) = exp(−θj h2j ); (2.6)
with θj ≥ 0 for j = 1, . . . , d. It is also known as the Gaussian correla-
tion function.
The literature offers other options for the correlation function Rj (hj ),
e.g., the Matérn class suggested by Stein (1999) which also controls for
smoothness as the PowExp class. The parametrization of the Matérn class
allows us to consider more than one possible correlation function. Never-
theless, the scope of this thesis is not on the correlation function Rj (hj ) but
on modifications of the multiplicative structure R(·, ·) (2.4).
We restrict our attention to the two cases mentioned above in our sub-
sequent simulation studies: the PowExp and SqExp correlation functions.
We use both correlation functions in Chapters 3 and 4 for the case stud-
ies to compare our non-standard correlation structures versus the standard
one. On the other hand, Chapter 5 is focused on the standard correlation
structure only with the PowExp case.

15
2.4. Estimation

2.4 Estimation
According to Sacks et al. (1989), from the training design D, we assume an
output random vector

Y = (Y1 , . . . , Yn )> ∈ Rn

under a multivariate normal distribution, i.e.,

Y ∼ Nn (Fβ, σ 2 R).

Hence, the joint likelihood function L of the n points in the experimental


design D is
 
2 1 1 > −1
L(y | β, σ , θ) = exp − 2 (y − Fβ) R (y − Fβ) ,
(2πσ 2 )n/2 |R|1/2 2σ
(2.7)

which is a function of unknown parameters β, σ 2 , and θ.


Since we will work throughout this thesis with either RjSqExp (hj ) or
RjPowExp (hj ) as correlation functions, the following parameters are to be
estimated:

• The process variance represented by the scalar σ 2 .

• The vector β ∈ Rk+1 of regression coefficients.

• A vector of correlation hyperparameters θ that

– implies the estimation of

θ SqExp = (θ1 , . . . , θd )>

for correlation function RjSqExp (hj ),


– and
θ PowExp = (θ1 , . . . , θd , p1 , . . . , pd )>
for correlation function RjPowExp (hj ).

16
2.4. Estimation

2.4.1 Maximum Likelihood Estimation


We use the frequentist-based method of maximum likelihood on the joint
probability likelihood function (2.7). Given θ, the method of maximum like-
lihood takes the first partial derivatives of the log-likelihood function (2.7)
with respect to σ 2 and β. Then, we set both partial derivatives to zero
and isolate the corresponding terms. This procedure yields the following
maximum likelihood estimates (MLEs):

β̂(θ) = (F> R−1 F)−1 F> R−1 y, (2.8)

and
1
σ̂ 2 (θ) = (y − Fβ̂)> R−1 (y − Fβ̂). (2.9)
n
If we plug in the MLEs (2.8) and (2.9) into (2.7), we obtain the following
profile likelihood function:
 
2 1 n
L(y | β̂, σ̂ , θ) = exp − . (2.10)
(2πσ̂ 2 )n/2 |R|1/2 2

Then, the natural logarithm of this profile likelihood function is

l(θ) = log L(y | β̂, σ̂ 2 , θ)


 

n  1  n
= − log(2π) + log(σ̂ 2 ) − log |R| − . (2.11)
2 2 2

Both MLEs, β̂ and σ̂ 2 , depend on true vector θ and so does the pro-
file log-likelihood function (2.11); since we need to compute |R|, and σ̂ 2 is
present. Hence, we have to obtain the estimate θ̂ by maximizing Equa-
tion (2.11) numerically. As noted in Section 2.4.2, there is more than one
numerical method and software resource to achieve this goal.

2.4.2 Optimization Procedure


R (R Core Team, 2019) has different packages that implement maximum like-
lihood estimation by minimizing the negative of the profile log-likelihood
function (2.11); such as gptk (Kalaitzis et al., 2014), GPfit (MacDon-
ald et al., 2015), DiceKriging (Roustant et al., 2012), and mlegp (Dan-
cik, 2013). The initial work in this thesis was carried out with package
mlegp. This package uses the Nelder-Mead simplex and the limited-memory
Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) methods in two optimization
stages:

17
2.4. Estimation

1. The first stage corresponds to the Nelder-Mead (1965) simplex method.


The procedure follows heuristic ideas, and does not require the com-
putation of a gradient and a Hessian of the function l(θ) in (2.11) to
be optimized.

2. The second stage implements the L-BFGS method for the best solution
found by multiple simplex searches, described by Nocedal (1980) and
Liu and Nocedal (1989). It is a limited-memory quasi-Newton method
for large scale optimization where storage is critical. The optimization
procedure needs an available analytic expression for the gradient ∇l(θ)
of (2.11), which is also provided by mlegp.

The aforementioned R package poses numerical problems during the op-


timization procedure under some of our further modelling arrangements.
Therefore, it was necessary to use a different software resource to solve
this matter. All subsequent analyses are done via the C-based GaSP soft-
ware developed by Welch (2014), which primarily implements a Std GaSP.
Moreover, as in the case of mlegp, GaSP software was modified in terms of
modelling for the novel strategies in Chapters 3 and 4.
Quasi-Newton methods, such as L-BFGS, sometimes have a very slow
convergence. Furthermore, recall that these optimization methods require
a gradient and a Hessian computation with a significant processing time at
each step. The GaSP software implements a conjugate directions method to
overcome this matter. Given p parameters to find in the objective function’s
minimization, this class of methods follows a sequence of p linear minimiza-
tions in p chosen directions. This procedure is done instead of computing the
gradient ∇l(θ) and its corresponding Hessian, which demands the solution
of a linear system of equations by inverting the Hessian. Hestenes (1980)
provides more information about conjugate directions algorithms.
Since the minimization of the negative of the profile log-likelihood func-
tion (2.11) is a non-convex problem, we need to take a multi-start approach
to find the best local optimum. In all the case studies, we optimize (2.11)
with ten different starting points except for the Storm Surge model, where
we do it with four different starting points. Finding the best local optimal
solution is critical since this significantly impacts the GaSP’s prediction ac-
curacy. To illustrate the processing times in a single high-dimensional GaSP
replicate, a case such as the Michalewicz function (Section 3.5) in d = 7 with
n = 800 has a processing time of 5.61 h for the ten tries.

18
2.5. Prediction

2.5 Prediction
As mentioned in Chapter 1, one of the objectives of a computer experiment
is the prediction of a given response. The first major step to achieve this
objective is done through the GaSP fitting that allows us to obtain the
respective MLE estimates, β̂ and σ̂ 2 , as well as the estimated correlation
hyperparameters in vector θ̂. Then, we can proceed with the corresponding
predictions on a testing set H of N further input vectors
 (1) (N )
H = x t , . . . , xt ,
which has N unknown outputs
(1)  (N ) 
y xt , . . . , y xt .

2.5.1 Best Linear Unbiased Predictor


Predictions are based on the best linear unbiased predictor (BLUP) derived
by Sacks et al. (1989). Therefore, at the test input vector xt ∈ Rd , the
BLUP is
ŷ xt | θ̂ = µ̂(xt ) + r> (xt )R−1 y − Fβ̂ ;
 
(2.12)
where the vector
h i>
r(xt ) = R xt , x(1) , . . . , R xt , x(n) ∈ Rn


contains the n correlations between the test input vector xt , and each one
of the n input vectors in training design D. Here R(·, ·) and r(·) are also
conditional on θ̂, but we suppress the dependence for notational simplicity.
The BLUP (2.12) is the best linear predictor in the sense that it mini-
mizes the mean squared error (MSE) between a further output and a linear
combination of the n training data, subject to unbiased estimation of the
mean of the new output. Similarly to the GaSP model (2.1), predictor
ŷ xt | θ̂ has two parts:

• The regression component represented by


>
µ̂(xt ) = β̂ f (xt ), (2.13)
which contains the MLE for β.
• The interpolator
r> (xt )R−1 y − Fβ̂ ,


where the correlations in r(xt ) have a certain structure; e.g., RStd (x, x0 ).

19
2.5. Prediction

Note that the alternative modelling strategies in Chapters 3 and 4 rede-


fine R and r(xt ), but the formulas for maximum likelihood estimation and
prediction remain unchanged.

2.5.2 Prediction Accuracy


All modelling approaches throughout this thesis are compared using a pre-
diction accuracy metric, given the estimates obtained by the GaSP fitting.
 (i) 
We use a criterion based on the prediction error ŷ xt | θ̂ − y xt for i =
1, . . . , N . The prediction accuracy metric is the normalized root mean
squared error (N-RMSE) defined as
r
(i)  2
h i
1 PN

N i=1 ŷ x t | θ̂ − y x t
eN−RMSE = r ; (2.14)
(i)  2
h i
1 PN
N i=1 ȳ − y xt

where ȳ is the mean computed from vector y, the outputs of training set D,
n
1X
ȳ = yi .
n
i=1

The denominator in metric (2.14) is the test root mean squared error
(RMSE) of the trivial predictor ȳ and sets eN−RMSE on a range of 0 to
roughly 1, regardless of the scale on the original response. A value of 1 will
indicate no better performance than the trivial predictor.
The N-RMSE is meant to measure the overall prediction accuracy of a
GaSP in a testing set H. Alternatively to this metric, we could measure
the prediction uncertainty of a testing input vector xt for the BLUP (Sacks
et al., 1989; Cressie, 1993). The computation of its corresponding standard
error is
r
h i [1 − F> R−1 r(xt )]2
SE ŷ xt | θ̂ = σ̂ 1 − r> (xt )R−1 r(xt ) + , (2.15)
F> R−1 F

which incorporates the uncertainty of the estimation of β̂.


From Sacks et al. (1989), the predictive distribution of the BLUP is
Gaussian, or approximately Gaussian if uncertainty in θ is taken into ac-
count. Hence, the nominal 95% prediction interval is
 h i
ŷ xt | θ̂ ± 1.96 × SE ŷ xt | θ̂ . (2.16)

20
2.6. The Effect of the Nugget Term

The standard error (2.15) does not take into account the uncertainty asso-
ciated to plugging in the estimate θ̂, which is explored by Abt (1999). All
accuracy evaluations in this thesis are based on the point prediction (2.12)
and assessment metrics like (2.14). Comparing validity of uncertainty inter-
vals from various methods is further work.

2.6 The Effect of the Nugget Term


Any optimization procedure from Section 2.4.2 needs many computations of
the smooth non-linear scalar function l(θ) (2.11). Therefore, we repeatedly
have to obtain the determinant and the inverse of the correlation matrix R
for the profile likelihood function and the respective MLEs. However, the
optimization procedures can be susceptible to certain numerical problems
related to R.
By definition, the correlation matrix R is positive semidefinite. Nev-
ertheless, the computation of |R| and R−1 can be unstable due to an ill-
conditioned R. This can happen when two input vectors in the experimental
design D are too close causing near-singularity. Hence, in order to overcome
an ill-conditioned R, we can introduce a small nugget or jitter parameter
δ ∈ (0, 1) replacing R by
Rδ = R + δIn ,
where In is the n × n identity matrix.
Throughout this thesis, the nugget term δ is incorporated into the vector
of hyperparameters θ to be estimated for all the simulation studies. This
yields the following vectors depending on the correlation function:

• When using the SqExp correlation function, the estimated augmented


vector will be
SqExp SqExp
θ̂ δ = (θ̂ , δ̂)> ;

• and the PowExp correlation function will depend on


PowExp PowExp
θ̂ δ = (θ̂ , δ̂)> .

2.7 Training and Testing Designs


It is necessary to provide additional information on the training and testing
designs used in further test functions and the initial setup of the training
set size n. Unless specified otherwise, our simulations studies use Latin

21
2.7. Training and Testing Designs

hypercube designs (LHDs) for D in training and H in testing, which are


generated with the R package DoE.wrapper developed by Groemping (2017).
The LHD’s underlying idea is introduced by McKay et al. (1979) and
is related to a Latin square with exactly one sample point in each row
and column, just as a Latin square has one of each treatment in every row
and column. The LHD is readily generalized to the d-dimensional case.
Therefore, the LHD is merely a combinatorial structure represented by a
matrix with dimensions n × d for D, and N × d for H; where n or N are
the number of rows (runs or input vectors) and d is the number of columns
(inputs).
The simplest LHD class is the random Latin hypercube design (rLHD),
used in the thesis for the testing set H. Given a computer simulator in Rd ,
if we want to ensure that all portions of the input ranges are sampled, each
range is divided into N equally probable strata. We create a d-dimensional
rLHD by drawing one point in each of the N strata, and this is repeated
independently for the d dimensions. Once we have the N points in each of
the d dimensions, they are randomly permuted by column.
In the case of the training set D, we use a class of LHD called maximin
Latin hypercube design (mLHD) introduced by Morris and Mitchell (1995)
and implemented in DoE.wrapper. The mLHD attempts to optimize the
sample by maximizing the minimum distance between the design points,
i.e., a maximin criterion within the class of LHDs.
As suggested by Chapman et al. (1994) and Jones et al. (1998), we use the
rule of thumb for D of n = 10d as the initial setup in a computer experiment
unless specified otherwise. Loeppky et al. (2009) explore the importance of
choosing an adequate initial sample size n and provide evidence supporting
the effectiveness of this rule of thumb. Essentially, Loeppky et al. (2009)
conclude that the rule of n = 10d gives a good initial prediction accuracy in
tractable problems, which also diagnoses intractability. Furthermore, there
is a significant accuracy improvement as n is increased only for tractable
functions.

22
Chapter 3

Main-Effect Gaussian
Processes
When modelling a deterministic function, one usually trains the GaSP model
in Chapter 2 using data from design D, with subsequent testing on design H.
We could apply certain transformations on the outputs or inputs. As we shall
see, selecting a non-standard correlation structure can have a key impact
on the GaSP fitting and prediction performance. Therefore, as claimed in
Chapter 1, GaSP prediction accuracy can be improved by some innovative
adjustments to the correlation structure.
Chen et al. (2016) tune different modelling settings, in a set of case
studies, to improve the GaSP’s prediction accuracy: the regression compo-
nent µ(x) (2.3) described in Section 2.1, different choices of the correlation
function Rj (hj ) introduced in Section 2.3, different training sizes n, and
more than one class of training design D besides the LHDs described in
Section 2.7. Based on the accuracy results obtained in those different case
studies, Chen et al. (2016) conclude that the common choices of a non-
constant regression component or the SqExp correlation function are often
sub-optimal.
For the regression setting, the authors point out a misleading use of a
non-constant component in Joseph et al. (2008), which is focused on im-
proving prediction accuracy for metamodels in product design optimization.
This work addresses a mainstream test function in computer experiments
literature: the Borehole function (Worley, 1987; Morris et al., 1993). Joseph
et al. (2008) claim that their experiment with a GaSP implementing selected
linear terms in the regression component improves prediction accuracy com-
pared to a GaSP with a constant term. Nevertheless, Chen et al. (2016)
obtain opposite results when replicating this experiment.
Such anecdotal examples, like the one found in Joseph et al. (2008),
make us think about the need for alternative modelling choices different
from the regression component or the correlation function. Hence, we can
explore non-standard correlation structures that go beyond the multiplica-
tive form RStd (x, x0 ) (2.4) to improve prediction accuracy. The literature

23
3.1. Basis Functions and Interpolation

offers a limited number of works on this matter (Duvenaud et al., 2011;


Kandasamy et al., 2015). Nonetheless, we go beyond these works with an
additional component in our non-standard structure: the effect principles
from Section 1.1.2.
As an initial step, Section 3.1 explains why the BLUP’s interpolation
property in a GaSP holds regardless of the class of correlation structure.
Then, Section 3.2 illustrates the implications of the standard correlation
structure and why its use might not be recommended based on the effect
principles from Chapter 1. Hence, this chapter will explore alternative struc-
tures that use main effects with untransformed inputs in the output function
y(x).
Section 3.3 introduces an additive GaSP correlation structure in two spe-
cific forms. We address a well-known case in the literature: the Michalewicz
function. As explained in Section 3.5, this test function presents substantial
predictive challenges when using the correlation structure RStd (x, x0 ). Fur-
thermore, as shown in Section 3.6, we make scaling changes to the original
Michalewicz function to conduct more complex and challenging simulation
studies.
We have to highlight that the use of non-standard correlation structures
in a GaSP (which rely on additivity and limited main effects and input inter-
actions) opens up a whole set of possible correlation arrangements, unlike
the usual structure RStd (x, x0 ). One could, for example, explore whether
limited 2 and 3-input interactions might be useful. These extensions are
explored in Chapter 4.
This chapter, along with Chapter 4, will show the effectiveness of using
either main or low-order interaction effects in a GaSP model in terms of
prediction accuracy compared to a Std GaSP. As a side note, Appendix A
provides a summary of all the correlation structures used throughout this
thesis for easy reference.

3.1 Basis Functions and Interpolation


Jones et al. (1998) note that the BLUP (2.12) is an interpolator of the train-
ing dataset D, and we show this property is retained with any non-standard
correlation structure as well. Conditioned on the estimate θ̂, consider the
prediction of the input vector xi in the training set D:

ŷ(xi | θ̂) = µ̂(xi ) + r> (xi )R−1 y − Fβ̂




= µ̂(xi ) + e>

i y − Fβ̂ , (3.1)

24
3.2. Implication of the Standard Correlation Structure

where ei is the unitary vector since r(xi ) is the ith column in the correlation
matrix R. This result holds as long as vector r(xi ) is a column of the
correlation matrix R.
If we plug in the regression component (2.13) and simplify the second
term in the BLUP (3.1), then we obtain
> > >
ŷ(xi | θ̂) = β̂ f (xi ) + e>

i y − Fβ̂ = β̂ f (xi ) + [yi − β̂ f (xi )] = yi .

Thus, the interpolation property of the BLUP in (3.1) holds regardless of the
class of correlation structure; as long as we provide an invertible correlation
matrix R.
Equation (2.12) is the same as (3.1) but for predicting a new test input
vector xt . Note that the corresponding BLUP, ŷ(xi | θ̂), for any test input
vector xt is made up of basis functions from the regression component µ̂(xt )
and the correlation functions in the vector r(xt ). Hence, properties of the
correlation function (and the regression) determine the properties of the
BLUP.
Note that optimization of the log-likelihood function remains unchanged
except for the redefinition of the correlation function. Similarly, the BLUP
in (2.12) for testing set H has to incorporate the same non-standard cor-
relation structure. The use of a nugget term (as described in Section 2.6)
will produce a non-interpolator BLUP, but it is a trade-off for having a
better-conditioned correlation matrix R.

3.2 Implication of the Standard Correlation


Structure
Now, it is necessary to explain the reasoning behind the use of a non-
standard correlation structure. As previously said in Section 1.1, the three
effect principles of a factorial experiment will be used to construct the corre-
lation structure of a GaSP. Hence, is it fair to think that the standard corre-
lation structure does not follow these three principles? In a high-dimensional
framework, as in the case of a computer experiment where d  1, suppose
the main effects and lower-order interactions are more active than higher-
order interactions. Furthermore, we assume strong non-linearity (i.e., hy-
perparameter θj  0 for several of the xj ), making the output function hard
to predict.
In the case of the SqExp function (2.6), which is a function of the distance

hj = xj − x0j

25
3.3. Main-Effect Correlation Structures

for the jth input, a Taylor series expansion of the standard correlation struc-
ture (2.4) is:
d d
!
Y X
Std 0 2
R (x, x ) = Rj (hj ) = exp − θj hj
j=1 j=1
d
X d−1
X d
X
≈1− θj h2j + θj h2j θk h2k + . . .
j=1 j=1 k=j+1
d
Y
± θj h2j + terms involving h4j for at least one j,
j=1

where higher-order products (with large θj , θk , etc.) will not be negligible in


the series, unless we have a large number n of input vectors densely filling the
space leading to distances in the correlation function asymptotically equal
to zero (i.e., hj ' 0, hk ' 0, etc.).
Furthermore, as argued in Section 3.1, a GaSP prediction is a combina-
tion of basis functions formed from the correlation function. Thus, if some
of the θj are large, RStd (x, x0 ) implies non-linearity and high-order interac-
tion, which can only be overcome by large n. Given the time complexity of
a GaSP algorithm, O(n3 ), a large n is infeasible. Therefore, high accuracy
can only be achieved if there is some simple structure to exploit—the three
effect principles—in the correlation function.

3.3 Main-Effect Correlation Structures


In terms of the correlation structure in a GaSP, what are the implications
of working under a main-effect framework? Suppose the output has the
following additive form:
X d
y(x) = yj (xj ), (3.2)
j=1

where each subfunction yj (xj ) can be different and only depends on the
single input xj . Following the Effect Hierarchy principle, the main effects
would suffice to fit a predictive model for (3.2). However, the correlation
structure RStd (x, x0 ) is designed to handle input interactions up to order d
as shown in Section 3.2.
The first non-standard correlation structure to be considered has an

26
3.3. Main-Effect Correlation Structures

additive form with main effects only:


d
1X
RE−ME (x, x0 ) = Rj (hj ) ∈ [0, 1]. (3.3)
d
j=1

Note that structure (3.3) assigns equal weights (i.e., E-ME throughout) to
each one of the d main effects with 1/d. The weights ensure scaling of the
left-hand side to have correlations of 1 on the diagonal and [0, 1] elsewhere.
Furthermore, on the right, we have a linear combination of positive semidef-
inite matrices, and hence the matrix on the left is also positive semidefinite.
Similar comments apply to the additive forms later in this chapter and
Chapter 4.
There is a critical connection between the assumed output function
in (3.2) and the BLUP in (2.12). The BLUP depends on the new input
vector xt only through the correlation vector r(xt ); all other vectors and
matrices are fixed after training. Hence, the BLUP as a function of xt is
a linear combination of the basis functions in r(xt ). The correlation func-
tion (3.3) is additive in functions of the various inputs xj , and a linear com-
bination of these functions is also additive in functions of the xj , matching
the form of (3.2). Similar ideas will apply in Chapter 4.
In general, though, if we estimate the d weights one might obtain lower
eN−RMSE in (2.14). These weights need to be estimated along with the rest
of the GaSP parameters described in Chapter 2. Thus, in order to assess
whether unequal weights improve prediction accuracy, we can set up an
additional main-effect correlation structure. Thus, an additive correlation
structure with unequal weights on the main effects (U-ME) is defined as:
d
X
U−ME 0
R (x, x ) = ωj Rj (hj ) ∈ [0, 1], (3.4)
j=1

where there is a vector of weights

Ω = (ω1 , . . . , ωd )> ∈ [0, 1]d

subject to the constraint


d
X
ωj = 1. (3.5)
j=1

Since the weights in (3.4) are subject to constraint (3.5), the optimiza-
tion of the log-likelihood function for parameter estimation will also be con-
strained. Thus, we employ a multinomial logit transformation that allows

27
3.3. Main-Effect Correlation Structures

the optimization of unconstrained weights. The transformation is set up


with a raw weight vector

τ = (τ1 , . . . , τd−1 )> ∈ (−∞, ∞)d−1 ,

where
exp (τi )
ωi = Pd−1 for i = 1, . . . , d − 1;
1 + k=1 exp (τk )
and
1
ωd = Pd−1 .
1+ k=1 exp (τk )
Under this correlation structure, the d − 1 parameters in vector τ are
estimated and included in the vector of hyperparameters θ̂ (defined in Sec-
tion 2.4 for various correlation functions).

28
3.4. Previous Works

3.4 Previous Works


The notion of an additive correlation structure has been explored by Du-
venaud et al. (2011) and Kandasamy et al. (2015). Both works essentially
propose additive GaSPs, but they differ on how the correlation structure is
built. Our main effect correlation structures have one element in common
with those explored by Duvenaud et al. (2011), which is addressed next.
The approach proposed by Kandasamy et al. (2015) also relies on input
interactions so that it will be introduced in Chapter 4.
Let x and x0 be two input vectors in Rd , and their respective random
function values are Z(x) and Z(x0 ). For the jth input, the correlation
function Rj (hj ) is defined as in Section 2.3. Duvenaud et al. (2011) propose
an additive GaSP whose covariance function incorporates all the possible kth
order interactions for k = 1, . . . , d. Therefore, this progressive covariance
function is the sum of d terms with the following hierarchy:
d
X
CovAdd1 [Z(x), Z(x0 )] = σ12 Ru (hu )
u=1
d
X d
X
CovAdd2 [Z(x), Z(x0 )] = σ22 Ru (hu ) · Rv (hv )
u=1 v=u+1
..
.
d
Y
0
CovAddd [Z(x), Z(x )] = σd2 Ru (hu ),
u=1

where σk2 is the variance assigned to all the interactions of order k. We can
see that the main-effect correlation structure RE−ME (x, x0 ) in (3.3) is part
of CovAdd1 [Z(x), Z(x0 )], but adjusted by the variance σ12 instead of 1/d.
Further drawbacks and details on the prediction accuracy performance of
this approach will be provided in Chapter 4.

29
3.5. Michalewicz Function

3.5 Michalewicz Function


The Michalewicz function was used as a test function in a particular type
of a GaSP known as the Composite Gaussian Process (CGP), proposed by
Ba and Joseph (2012). It is a mathematical function designed to be difficult
to optimize (or predict). Similarly, the Friedman and Franke functions in
Chapter 4 test other non-standard correlation structures that also incorpo-
rate input interactions.
The complex surface of the Michalewicz function is used as a benchmark
in global and local optimization within a high-dimensional setting (Gao
et al., 2011; Jamil and Yang, 2013), which is also applicable for testing a
GaSP’s prediction accuracy. The test function has the following form:
d  2 2q
jxj
X 
ym (x) = − sin(xj ) sin where xj ∈ [0, π]. (3.6)
π
j=1

0.0

−0.5
ym

−1.0

−1.5
3
3
2 2
x2 1 1 x1
0 0

Figure 3.1: 2-dimensional Michalewicz function (3.6) as plotted by Sur-


janovic and Bingham (2017).

The Michalewicz function is multimodal and has d! local minima. The


parameter q sets the steepness of valleys. Therefore, larger values of q make
a more complicated search of the local minima since the narrow valleys’
points give little information on these locations.
Figure 3.1 illustrates the function with d = 2 and q = 10, as plotted
by Surjanovic and Bingham (2017). The marginal additive components of
a 7-dimensional case are illustrated in Figure 3.2; and one can see that as
we increase the value of d, the function becomes more complicated to work
with. Moreover, it is clear that function (3.6) follows the additive form (3.2).

30
3.5. Michalewicz Function

0.0 0.00

−0.2 −0.25

−0.4 −0.50
y

y
−0.6 −0.75

−0.8 −1.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x1 x2

0.00 0.00

−0.25 −0.25
y

y
−0.50 −0.50

−0.75 −0.75

−1.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x3 x4

0.00 0.00

−0.25 −0.25

−0.50
y

−0.50

−0.75 −0.75

−1.00 −1.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x5 x6

0.00

−0.25
y

−0.50

−0.75

−1.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x7

Figure 3.2: Marginal additive components of the Michalewicz function (3.6)


up to dimension d = 7 with q = 10.

31
3.5. Michalewicz Function

3.5.1 Overview of the Composite Gaussian Process Model


The literature offers different approaches to deal with non-stationarity in a
GaSP regarding the mean, variances, and covariances. For instance, Joseph
et al. (2008) use variable selection for the functional form of the regression
component µ(x) in (2.1) with an approach called bilnd kriging. Other com-
ponents require a more in-depth investigation, such as the non-stationary
variance, which is approached by Ba and Joseph (2012) in their CGP along
with a non-stationary mean.
The CGP is a combination of two independent GaSPs: the first is a
global GaSP, Zglobal (x), whose constant mean is µ, with its own variance κ2
and correlation structure G(·). The global component captures a smoother
global trend and replaces the term µ(x) in model (2.1). The second GaSP,
Zlocal (x), represents local adjustments with mean 0, variance σ 2 , and corre-
lation structure L(·).
Therefore, the CGP model is defined as
Y (x) = Zglobal (x) + Zlocal (x),
where
Zglobal (x) ∼ GP µ, κ2 G(·)


Zlocal (x) ∼ GP 0, σ 2 L(·) .




The global GaSP tries to make a more flexible regression component,


whereas the local GaSP can be considered the random component in the
regular GaSP. Thus, while the additive models in (3.3) and (3.4) are also
composite GaSPs, there are important differences from CGP. The two com-
ponents of CGP each involve all the inputs, much like a standard GaSP
model, whereas (3.3) and (3.4) have one input per component.

3.5.2 Performance of the Composite Gaussian and


Standard Stationary Gaussian Stochastic Processes
The underlying idea of the CGP model seems to be adequate when it comes
to difficult cases such as the Michalewicz function (3.6). However, as noted
by Chen et al. (2016), Ba and Joseph (2012) use an unnormalized root mean
squared error (U-RMSE) that only takes into account the numerator in the
N-RMSE (2.14):
v
u
u1 X N h
(i)  2
i
(i) 
eU−RMSE = t ŷ xt − y xt . (3.7)
N
i=1

32
3.5. Michalewicz Function

Ba and Joseph (2012) use the Michalewicz function under q = 10 with


d = 10 in a simulation study using their model. The study fits the CGP using
50 different random LHDs D (n = 100) which generate eU−RMSE values on a
range of 0.72 − 0.74. However, the trivial predictor in (2.14) with N = 5, 000
random test points provides a normalization factor on a range of 0.71 − 0.74.
Therefore, the CGP does not provide a better prediction performance than
the trivial predictor.

3.5.3 Simulation Settings


As previously stated, our first set of simulation studies in this chapter ad-
dress the Michalewicz function (3.6). We primarily attempt to establish
whether there is a better prediction accuracy by fitting a main-effect GaSP,
i.e. RE−ME (x, x0 ) (3.3) or RU−ME (x, x0 ) (3.4), rather than using RStd (x, x0 )
(2.4). Moreover, we compare the performance of the main-effect GaSPs
versus the CGP model. We restrict our attention to the case with q = 10:
d  2 20
jxj
X 
ym (x) = − sin(xj ) sin where xj ∈ [0, π]. (3.8)
π
j=1

33
3.5. Michalewicz Function

Setting Values
Dimensionality d = 3, 4, 5, 6, 7
Inputs x ∈ [0, π]d
Output ym (x) as in (3.8)
GaSP Correlation
SqExp and PowExp
Functions
GaSP Correlation RStd (x, x0 ),
Structures RE−ME (x, x0 ), and RU−ME (x, x0 )
Additional
CGP
Approach
20 different mLHDs
D
for each d = 3, 4 of n = 100, 200, 400 runs and
Training Sets
each d = 5, 6, 7 of n = 100, 200, 400, 800 runs
1 rLHD corresponding
H
to d = 3, 4, 5, 6, 7
Testing Set
of N = 10, 000 runs

Table 3.1: Simulation settings for Michalewicz function.

Table 3.1 summarizes the simulation settings. Since the test function be-
comes more complex as d is increased, we set up studies with d = 3, 4, 5, 6, 7.
Each d has 20 different training sets D for each of n = 100, 200, 400 runs for
d = 3, 4; n = 100, 200, 400, 800 runs for d = 5, 6, 7; and its own d-dimensional
testing set H of N = 10, 000 runs. Our non-standard correlation structures
aim to outperform the Std GaSP beginning with small training sizes. How-
ever, since this case allows us to obtain mLHDs with large training sizes
n via the package DoE.wrapper (Groemping, 2017), we expand our simula-
tion studies to n = 800 to show at what extent the main-effect correlation
structures outperform the Std GaSP.
Each one of the studies allows us to keep track of the evolution of the
prediction performance for the four GaSP approaches: Std, CGP, E-ME,
and U-ME. We use the SqExp (2.6) and PowExp (2.5) correlation functions
for the Std, E-ME, and U-ME GaSPs. The CGP model only uses the SqExp
correlation function as well as the correlation structure RStd (x, x0 ) for its two
independent GaSPs, which is implemented in the R package CGP (Ba and
Joseph, 2018). We do not conduct the comparison study with d = 1, since

34
3.5. Michalewicz Function

this case is trivial. The case d = 2 is also left out because of the risk of an
ill-conditioned correlation matrix R increases.

3.5.4 Prediction Results


For each prediction accuracy figure, we show the eN−RMSE in percent
(eN−RMSE × 100%) on the left y-axis and the log10 (eN−RMSE ) on the right
y-axis by GaSP approach. Furthermore, the plots are faceted by corre-
lation functions with the x-axis depicting the corresponding training sizes
n on a log10 (n)-log10 (eN−RMSE ) scale so breaks are equally-spaced. Since
the scale on the y-axis is logarithmically adjusted, the breaks on the right-
hand side are also equally-spaced. The log10 (n)-log10 (eN−RMSE ) scale shows
rates of convergence, and this is helpful in the assessment of high predic-
tion accuracy cases. Finally, lines depict the mean of either the eN−RMSE or
log10 (eN−RMSE ) of our 20 repeated experiments on each n along with their
corresponding boxplots.
Figure 3.3a shows the prediction accuracy for d = 3. For the SqExp
correlation function, we can see that the E-ME and U-ME GaSPs outperform
the Std GaSP and CGP, which show overlapped and poor accuracies. From
n = 100 to 400, the rate of convergence of eN−RMSE is 1/n9.2 for the E-ME
GaSP (log-log scales are used in this class of plots so rates of convergence can
be estimated from slopes). Note that we do not require a large training size n
to achieve good accuracy with the main-effect approaches. The training size
n = 800 was also tried but had numerical instability with both correlation
functions. Overall, the E-ME GaSP with the SqExp correlation function
clearly performs best.
Figure 3.3b shows the prediction accuracy for d = 4. For both correlation
functions, the main-effect GaSPs still outperform the other two approaches.
In the SqExp case, both main-effect structures have a rough rate of 1/n4.2
from n = 100 to 200 which is increased to 1/n13.33 for the E-ME GaSP from
n = 200 to 400. As in d = 3, the training size n = 800 was also tried but
had numerical instability with both correlation functions. Again, CGP is
not competitive at all in terms of the SqExp correlation function. Overall,
the E-ME GaSP with the SqExp correlation function clearly performs best.
Figure 3.4a shows the prediction accuracy for d = 5. CGP and Std
GaSP still show a non-competitive and poor performance in the SqExp case
compared to their main-effect counterparts. For both correlation functions,
from n = 100 to 400, both additive GaSPs show a similar behaviour with
a rough rate of convergence of 1/n4.2 . However, we obtain a large accuracy
spread for E-ME at n = 200 with the SqExp function. We encounter differ-

35
3.5. Michalewicz Function

ent results for the E-ME and U-ME GaSPs from n = 400 to 800 between the
correlation functions. Overall, the E-ME GaSP with the SqExp correlation
function clearly performs best.
Figure 3.4b shows the prediction accuracy for d = 6. Firstly, with
both correlation functions, the additive GaSPs outperform the other two
approaches. In the SqExp case, from n = 100 to 400, the U-ME GaSP
shows the best rate of convergence (1/n2.08 ). From n = 400 to 800, while
the spread in accuracy slightly persists, the E-ME GaSP has the best rate
(1/n12.5 ). Overall, the E-ME GaSP with the SqExp correlation function
clearly performs best.
Figure 3.5 shows the prediction accuracy for d = 7. In the case of the
SqExp correlation function, the CGP and Std GaSP are noticeably out-
performed by their main-effect counterparts even though the E-ME GaSP
shows a huge spread in its results at n = 800. In terms of the rate of con-
vergence, from n = 100 to 400, the U-ME GaSP has the best performance
(1/n0.83 ). This rate drastically increases to 1/n9.17 from n = 400 to 800. For
the PowExp function, both main-effect GaSPs show a similar performance
with a rate of 1/n1.67 from n = 100 to 400. Overall, both the U-ME GaSP
with the SqExp correlation function and the E-ME GaSP with the PowExp
correlation function clearly perform best.

36
3.5. Michalewicz Function

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6

0.00001 −7
100 200 400 100 200 400
n

CGP Std E−ME U−ME

(a) d = 3.

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6

0.00001 −7
100 200 400 100 200 400
n

CGP Std E−ME U−ME

(b) d = 4.

Figure 3.3: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the Michalewicz func-
tion (3.8), with SqExp and PowExp correlation functions. Each boxplot
shows results for 20 mLHDs of n = 100, 200, 400 runs for training, and
N = 10, 000 runs for testing. CGP is only implemented with SqExp.

37
3.5. Michalewicz Function

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6

0.00001 −7
100 200 400 800 100 200 400 800
n

CGP Std E−ME U−ME

(a) d = 5.

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6
100 200 400 800 100 200 400 800
n

CGP Std E−ME U−ME

(b) d = 6.

Figure 3.4: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the Michalewicz func-
tion (3.8), with SqExp and PowExp correlation functions. Each boxplot
shows results for 20 mLHDs of n = 100, 200, 400, 800 runs for training and
N = 10, 000 runs for testing. CGP is only implemented with SqExp.

38
3.5. Michalewicz Function

SqExp PowExp
100 0

31.63 −0.5

10 −1

log10(eN−RMSE)
eN−RMSE (%)

3.16 −1.5

1 −2

0.32 −2.5

0.1 −3

0.03 −3.5

100 200 400 800 100 200 400 800


n

CGP Std E−ME U−ME

Figure 3.5: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the Michalewicz func-
tion (3.8) in d = 7, with SqExp and PowExp correlation functions. Each
boxplot shows results for 20 mLHDs of n = 100, 200, 400, 800 runs for train-
ing and N = 10, 000 runs for testing. CGP is only implemented with SqExp.

39
3.6. Weighted Michalewicz Function

3.6 Weighted Michalewicz Function


Our previous simulation studies show that the main-effect correlation struc-
ture RE−ME (x, x0 ) often has the best predictive performance. Moreover,
both main-effect structures clearly outperform RStd (x, x0 ) either with the
CGP or Std GaSP. What would happen if we scale the additive components
in Michalewicz function (3.6)? Specifically speaking, would the U-ME GaSP
outperform the E-ME GaSP under this new arrangement?
We can make a slight modification to this test function, by increasing
the ranges of each additive component in the following way:
d  2 2q
jxj
X 
ywm (x) = − j sin(xj ) sin where xj ∈ [0, π]. (3.9)
π
j=1

This weighted case provides a framework to check whether unequal weights


in the correlation structure improve the prediction accuracy in a main-effect
GaSP.
The additional factor in function (3.9) increases by j the maximum am-
plitude of the sinusoidal curve for the jth dimension. Figure 3.6 illustrates
this test function with d = 2 and q = 10, where we can see deeper valleys
on the x2 -axis. Moreover, according to the marginal additive components of
the specific case of d = 7 and q = 10 in Figure 3.7, we would expect input
variables with higher indices to be more active in the GaSP fitting.

0.0
0.5
1.0
ywm

1.5
2.0
2.5
3
3
2 2
x2 1 1 x1
0 0
Figure 3.6: 2-dimensional weighted Michalewicz function (3.9).

40
3.6. Weighted Michalewicz Function

0 0

−2 −2

−4 −4
y

y
−6 −6

−8 −8
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x1 x2

0 0

−2 −2

−4 −4
y

−6 −6

−8 −8
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x3 x4

0 0

−2 −2

−4
y

−4

−6
−6

−8
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
x5 x6

−2

−4
y

−6

−8
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x7

Figure 3.7: Marginal additive components of the weighted Michalewicz


function (3.9) up to dimension d = 7 with q = 10.

41
3.6. Weighted Michalewicz Function

3.6.1 Simulation Settings


Our second set of simulation studies address the weighted Michalewicz func-
tion (3.9). We primarily attempt to establish whether there is a better pre-
diction accuracy by fitting a main-effect GaSP, i.e. RE−ME (x, x0 ) (3.3) and
RU−ME (x, x0 ) (3.4), rather than using RStd (x, x0 ) (2.4).
Furthermore, we compare the performance of the main-effect GaSPs ver-
sus the CGP model. We restrict our attention to the case with q = 10:
d  2 20
jxj
X 
ywm (x) = − j sin(xj ) sin where xj ∈ [0, π]. (3.10)
π
j=1

Secondly, we aim to determine whether using U-ME GaSP provides a better


prediction accuracy than E-ME GaSP. Note we keep all previous simulation
settings, as well as the same training and testing sets detailed in Table 3.1,
except for the output ywm (x) from function (3.10).

3.6.2 Prediction Results


Figure 3.8a shows the prediction accuracy for d = 3. For the SqExp cor-
relation function, our main-effect GaSPs clearly outperform the Std GaSP
and CGP, which show overlapped and poor accuracies as in the previous
simulation study. From n = 100 to 400, the E-ME GaSP has a rate of
1/n9.58 . Again, we do not need a large training size n for achieving good
accuracy on these approaches. The training size n = 800 was also tried
but had numerical instability with both correlation functions. Overall, the
E-ME GaSP with the SqExp correlation function clearly performs best.
Figure 3.8b shows the prediction accuracy for d = 4. For both correlation
functions, the main-effect GaSPs still outperform the rest of the approaches
except for the E-ME GaSP versus the Std GaSP from n = 200 to 400 in
the PowExp function (both approaches provide equal performances). In
the SqExp case, both main-effect structures have a rough rate of 1/n4.17
from n = 100 to 200 which is increased to 1/n14.17 for the E-ME GaSP at
n = 400. As in d = 3, the training size n = 800 was also tried but had
numerical instability with both correlation functions. Overall, the E-ME
GaSP with the SqExp correlation function clearly performs best.
Figure 3.9a shows the prediction accuracy for d = 5. In the SqExp case,
we see that both main-effect approaches outperform the CGP and Std GaSP
even though they have different behaviours as we increased the training size
n. From n = 100 to 200, both main-effect approaches have the same rate
of 1/n1.67 . However, from n = 200 to 800, the E-ME GaSP has the best

42
3.6. Weighted Michalewicz Function

rate (1/n10 ). Overall, the E-ME GaSP with the SqExp correlation function
clearly performs best.
Figure 3.9b shows the prediction accuracy for d = 6. As with smaller d
for the SqExp case, the main-effect structures clearly outperform the CGP
and Std GaSP. We see the same behaviour in the main-effect GaSPs as
in d = 5: large accuracy spreads in the U-ME GaSP with increasing n,
unlike the E-ME GaSP. Note the equal performance of these approaches
from n = 100 to 200 of 1/n1.67 , which drastically changes to 1/n10 for the
E-ME GaSP. Overall, the E-ME GaSP with the SqExp correlation function
clearly performs best.
Figure 3.10 shows the prediction accuracy for d = 7. Regarding the
SqExp function, the CGP and Std GaSP are outperformed by the main-
effect approaches. For the PowExp case, the Std GaSP becomes competitive
with a rate of 1/n2.5 from n = 200 to 800. Nonetheless, this is not enough
to outperform the E-ME GaSP which shows a rate of 1/n5 from n = 200 to
800. Overall, the E-ME GaSP with the PowExp correlation function clearly
performs best.

43
3.6. Weighted Michalewicz Function

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6

0.00001 −7
100 200 400 100 200 400
n

CGP Std E−ME U−ME

(a) d = 3.

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6

0.00001 −7
100 200 400 100 200 400
n

CGP Std E−ME U−ME

(b) d = 4.

Figure 3.8: Prediction accuracy in percent (eN−RMSE × 100%) versus n on a


log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the weighted Michalewicz
function (3.10), with SqExp and PowExp correlation functions. Each box-
plot shows results for 20 mLHDs of n = 100, 200, 400 runs for training and
N = 10, 000 runs for testing. CGP is only implemented with SqExp.

44
3.6. Weighted Michalewicz Function

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6

0.00001 −7
100 200 400 800 100 200 400 800
n

CGP Std E−ME U−ME

(a) d = 5.

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6
100 200 400 800 100 200 400 800
n

CGP Std E−ME U−ME

(b) d = 6.

Figure 3.9: Prediction accuracy in percent (eN−RMSE × 100%) versus n on a


log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the weighted Michalewicz
function (3.10), with SqExp and PowExp correlation functions. Each box-
plot shows results for 20 mLHDs of n = 100, 200, 400, 800 runs for training
and N = 10, 000 runs for testing. CGP is only implemented with SqExp.

45
3.7. Concluding Remarks

SqExp PowExp
100 0

31.63 −0.5

10 −1

log10(eN−RMSE)
eN−RMSE (%)

3.16 −1.5

1 −2

0.32 −2.5

0.1 −3

0.03 −3.5

100 200 400 800 100 200 400 800


n

CGP Std E−ME U−ME

Figure 3.10: Prediction accuracy in percent (eN−RMSE ×100%) versus n on a


log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the weighted Michalewicz
function (3.10) in d = 7, with SqExp and PowExp correlation functions.
Each boxplot shows results for 20 mLHDs of n = 100, 200, 400, 800 runs for
training and N = 10, 000 runs for testing. CGP is only implemented with
SqExp.

3.7 Concluding Remarks


The non-standard correlation structures based on main effects show a signif-
icant improvement in prediction accuracy for a challenging case, such as the
Michalewicz function in its two forms: the original and weighted cases. The
PowExp correlation function versus SqExp increases prediction accuracy in
the Std GaSP in both test functions. In terms of each approach except for
CGP, which is not competitive in any of the studies conducted, we can make
the following summarizing points:

• Std GaSP. Prediction accuracy with the SqExp correlation function


is not competitive against the E-ME and U-ME GaSPs for any of
the dimensions considered. On the other hand, the PowExp function
greatly improves prediction accuracy and makes the Std GaSP more
competitive against its additive counterparts (often at n = 400, 800).

46
3.7. Concluding Remarks

This result applies to both Michalewicz functions. However, this im-


provement is not enough to outperform the E-ME and U-ME GaSPs.

• E-ME GaSP. We obtain a good prediction accuracy with both corre-


lation functions for d = 3, 4 and n = 100, 200, 400 for both Michalewicz
functions. Our E-ME GaSP often tends to work well as we increase
n in higher dimensions with the SqExp function. Though, we can
find some cases where the PowExp counterpart performs better (for
instance, see Figure 3.5 with d = 7 for the original test function).

• U-ME GaSP. The behaviour of this GaSP approach is usually as


good as the E-ME GaSP. We have some cases where either the E-ME
or U-ME GaSP performs better by changing the correlation function.
For example, in Figure 3.5, both E-ME and U-ME work well with
the PowExp and SqExp functions, respectively. On the other hand,
there are cases where both E-ME and U-ME GaSPs have equivalent
performances with the SqExp function. However, the change to Pow-
Exp makes a huge difference for the E-ME GaSP (see Figure 3.10 with
d = 7 for the weighted test function). Overall, there is no figure in
these two test functions where the U-ME GaSP is substantially better
than the E-ME GaSP.

As we can see, the main-effect structures outperform the Std GaSP in


these case studies. Moreover, regarding the critical assessment of the ac-
curacy between the E-ME and U-ME GaSPs, we see that the inclusion of
weights for the main effects in a non-standard correlation structure does not
improve prediction accuracy in these specific simulation studies.
We can go beyond the specific results obtained with these main-effect
correlation structures. Note that the GaSP’s prediction accuracy in a com-
plex and high-dimensional case used as a benchmark in optimization testing,
such as the Michalewicz function, can be improved by following the Effect
Hierarchy principle. Nonetheless, the full application of the three effect
principles will be addressed in Chapter 4 in further cases.

47
Chapter 4

Gaussian Processes with


Low-Order Joint Effects
This chapter expands the main-effect GaSPs of Chapter 3 to allow the inputs
to have low-order joint effects on the output. A joint effect, say between x1
and x2 , is the overall effect of these two variables on the output from their
main effects and any interaction effect. Thus allowing joint effects up to a
particular order is implicitly allowing interaction effects to the same order.
“Joint” and “interaction” will often be used interchangeably in this chapter.
Section 4.1 continues with the description of previous approaches found
in the literature, which were already introduced in Section 3.4. Sections 4.2
and 4.4 introduce joint-effect correlation structures that provide more flex-
ibility by allowing 2-input interactions. Structures from Section 4.4 aim
to follow the three effect principles described in Section 1.1.2. Therefore,
they will require a sensitivity analysis to identify those low-order interac-
tions that significantly contribute to the GaSP predictor’s variability over
the input domain. Section 4.5 provides additional details on the sensitivity
analysis approach to be used in our subsequent simulation studies.
Sections 4.3 and 4.6 introduce two new test functions known as the
Friedman and Franke functions, respectively. Both cases are studied with
different non-standard correlation frameworks, depending on the function’s
complexity. Note that these frameworks will involve joint-effect GaSPs up
to 2-input interactions only. As we will see, prediction accuracy is greatly
improved with these non-standard structures incorporating more than one
class of effect than the use of a Std GaSP.
Since the Michalewicz, Friedman, and Franke functions can be considered
“toy examples”, we also provide extensions to the non-standard correlation
structures with two more practical applications: the OTL circuit function
in Section 4.9 and the Nilson-Kuusk model in Section 4.11. The OTL cir-
cuit function is an engineering-based application, whereas the Nilson-Kuusk
model is an ecological code for plant canopy.
Both applications provide interesting insights into the three effect princi-
ples from Section 1.1.2 as well. Now, is it possible to expand the joint-effect

48
4.1. Previous Works

structures to other types of effects? We introduce additional structures that


augment the joint-effect framework. Sections 4.7, 4.8, and 4.10 provide ad-
ditional joint-effect correlation structures with a higher level of complexity;
going beyond 2-input interactions. Some of these structures will again re-
quire a sensitivity analysis approach.

4.1 Previous Works


So far, we have explored the use of non-standard correlation structures whose
components are main effects only. However, we have not entirely applied the
three effect principles since the previous main-effect correlation structures do
not include any further input interactions. The previous works by Duvenaud
et al. (2011) and Kandasamy et al. (2015) explore the possibility of including
any class of interactions in the correlation structure.

4.1.1 Progressive Covariance Function


From Section 3.4, note that all interactions of order k are implicitly allowed
by a covariance function of the following form:

X k
Y
0
CovAddk [Z(x), Z(x )] = σk2 Rum (hum ),
1≤u1 <u2 <···<uk ≤d m=1

whose term σk2 is the variance assigned to the kth order. Therefore, a full
covariance function is the following sum:
d
X
CovAdd [Z(x), Z(x0 )] = CovAddk [Z(x), Z(x0 )]. (4.1)
k=1

We can see that function (4.1) has 2d − 1 terms, which increases expo-
nentially with d and become intractable as recognized by Duvenaud et al.
(2011). This computation also increases processing times in the optimization
procedures of the profile log-likelihood function (2.11).
Furthermore, Duvenaud et al. (2011) do not specify additional details
about the variances σ12 , . . . , σd2 such as specific ranges or estimation proce-
dures. However, they provide some examples with normalized σk2 ∈ [0, 1]
where it is stated that these “order variance” hyperparameters control how
much of the process variance comes from a kth order interaction.

49
4.1. Previous Works

A GaSP using covariance function CovAdd [Z(x), Z(x0 )] (4.1) can pro-
vide an informative framework with each σk2 depicting what orders of in-
teraction are the most important. This covariance proves to be useful in
low-dimensional examples. However, any high-dimensional case leads to a
complex computation under a full order covariance function. Another draw-
back of this approach is the low reduction of the MSE compared to a Std
GaSP with the SqExp correlation function. This is shown in four case stud-
ies with d = 4, 8, 10 inputs.

4.1.2 Disjoint Correlation Structure


Another additive structure that allows joint effects is presented by Kan-
dasamy et al. (2015), but a response y(x) is observed with random error.
The model is targeted to maximise a function g : X → R where X is a
rectangular region in Rd . Kandasamy et al. (2015) assume, without loss
of generality, that X ∈ [0, 1]d . The function g can be non-convex and its
gradient is not available. It is observed non-deterministically as

y(x) = g(x) + ,

where the random noise  is assumed to be N (0, η 2 ).


The key structural assumption on this model is the decomposition of g(x)
into distinct subsets of inputs, according to the following additive form:
 
g(x) = g(1) x(1) + · · · + g(M ) x(M ) ,

where a subset x(j) ∈ X(j) is a lower dimensional component with dj inputs


for j = 1, . . . , M . Note that the groups are disjoint, and each one has its
correlation structure RStd x(j) , x0(j) (2.4). Furthermore, Kandasamy et al.


(2015) point out a high-dimensional approach where each dj is bounded by


a threshold d0 , i.e., dj ≤ d0  d.
Unlike the progressive correlation structure proposed by Duvenaud et al.
(2011), we are allowed to use each input variable in only one subset. The
disjoint condition on the subsets is restrictive if we work under a GaSP
with this additive correlation structure, which does not allow inputs to take
part in more than one individual GaSP. This lack of flexibility is a notable
drawback of this approach.
For instance, with d = 4 where inputs x1 and x2 are part of a given
subset, the correlation structure RStd (x, x0 ) would be able to handle the
corresponding main effects and 2-input interaction. However, suppose we
are interested in the 2-input interaction between x1 and x4 ; this additive

50
4.2. Joint-Effect Correlation Structures up to All 2-Input Interactions

approach does not allow us to include input x1 in another subset. We would


have to put x1 , x2 , and x4 together, also introducing possibly unwanted
x2 × x4 and 3-way interaction effects.

4.2 Joint-Effect Correlation Structures up to All


2-Input Interactions
Analogous to Section 3.3, suppose the output y depends on the input vector
x = (x1 , . . . , xd )> ; and we have the following set of pairs of input indices:

B = {j, k} for some 1 ≤ j < k ≤ d where |B| = b.

Now, y(x) is determined by a sum of d subfunctions yj (xj ) and b sub-


functions yj,k (xj , xk ) in the following way:
d
X X
y(x) = yj (xj ) + yj,k (xj , xk ). (4.2)
j=1 {j,k}∈B

Function (4.2) has an additive form, but the inputs are allowed to interact
in specific pairs. Thus, we explore the possibility of expanding the idea of
an additive correlation structure by fully applying the aforementioned effect
principles.
As already introduced in Section 3.3, a main-effect correlation structure
with d single-input main effects corresponds to
d
E−ME 0 1X
R (x, x ) = Rj (hj ) ∈ [0, 1]
d
j=1

or
d
X
RU−ME (x, x0 ) = ωj Rj (hj ) ∈ [0, 1].
j=1

We can see that the inputs do not interact under these previous struc-
tures. Hence, cases such as the Michalewicz function allow this class of
non-standard correlation structures to be competitive against the Std GaSP
in terms of prediction accuracy.
A joint-effect correlation structure goes beyond the previous additive
forms with single-input main effects. It incorporates specific interactions
between the input vectors x and x0 ∈ Rd , along with their respective cor-
relation hyperparameters. The notion of a joint-effect correlation structure

51
4.2. Joint-Effect Correlation Structures up to All 2-Input Interactions

is proposed by Duvenaud et al. (2011), but they clearly do not apply the
Effect Sparsity and Hierarchy principles.
Let x and x0 be two input vectors in Rd , and their respective values of
the random function are Z(x) and Z(x0 ). For the jth input, the correlation
function Rj (hj ) is defined as in Section 2.3. Firstly, we define the joint-
effect correlation structure with equal weights using all 2-input interactions
(E-JE2). This correlation structure assigns fixed weights to the main effects
and all 2-input interactions on a 50-50 basis:
d d d
1 X 1 X X
RE−JE2 (x, x0 ) = Rj (hj ) + d Rj (hj ) · Rk (hk ) ∈ [0, 1].
2d 2 2 j=1 k=j+1
j=1

(4.3)

For the jth input, the correlation function Rj (hj ) in the first addend of
structure (4.3) contains the difference hj = xj − x0j at the input vectors
x and x0 which is viewed as a main effect. If the jth and kth inputs have
a 2-way interaction in this structure, then their corresponding functions
Rj (hj ) and Rk (hk ) are supposed to pick up the correlation induced by that
interaction in the second addend of the structure. A similar idea is applied
to model 3-way interactions in the correlation structures of Sections 4.7 and
4.10.
Note that the two types of terms—main effects and 2-input interactions—
each receive weight 1/2, with the 1/2 distributed equally across all the cor-
relation functions in the type of term. We specify that the correlation func-
tions Rj (hj ) are shared between the main effects and 2-input interactions.
Hence, we use the same correlation hyperparameters in both types of effects.
It is worth mentioning that the correlation structure (4.3), as well as the
subsequent ones, all have different features compared to the CGP by Ba and
Joseph (2012).
As in the case of the U-ME GaSP with structure (3.4), we can incorporate
unequal weights in the aforementioned joint-effect structure. The joint-effect
correlation structure with unequal weights using all 2-input interactions (U-
JE2) is defined as:
d d d
λ1 X λ2 X X
RU−JE2 (x, x0 ) = Rj (hj ) + d Rj (hj ) · Rk (hk ) ∈ [0, 1],
d 2
j=1 j=1 k=j+1

(4.4)

where the weights λ1 and λ2 are subject to the constraint λ1 + λ2 = 1.

52
4.3. Friedman Function

For maximum likelihood optimization, we also use a multinomial logit


transformation on these weights that fulfils the previous constraint. The
multinomial logit transformation is set up for the parameter

τ ∈ (−∞, ∞),

where λ1 is computed as

exp (τ )
λ1 = ;
1 + exp (τ )

and λ2 is defined as
1
λ2 = .
1 + exp (τ )
Furthermore, we follow the same optimization procedure for estimating
the parameter τ along with the vectors already defined in Section 2.4 for
various correlation functions. Unlike the U-ME GaSP, the number of pa-
rameters does not significantly increase since we are weighting by type of
effect only.

4.3 Friedman Function


The Friedman function is used as a 5-dimensional test function by Friedman
(1991) for the multivariate adaptive regression splines (MARS) model, as
well as by Friedman et al. (1983). Let x = (x1 , . . . , x5 )> be the input vector
that generates the output as:

yfri (x) = 10 sin(πx1 x2 ) + 20(x3 − 0.5)2 + 10x4 + 5x5


where xi ∈ [0, 1] for i = 1, . . . , 5.
(4.5)

Analogously to the Michalewicz function, the complexity of this func-


tion provides an adequate framework to test the non-standard correlation
structures from Section 4.2. The function (4.5) has an almost additive form,
similar to the Michalewicz functions from Sections 3.5 and 3.6, but there is
an interaction between two inputs: x1 and x2 . By following the Effect Hier-
archy and Heredity principles, we can fit joint-effect GaSPs with correlation
structures (4.3) and (4.4). Moreover, we compare their prediction accuracy
results against the Std GaSP with structure (2.4) and MARS.

53
4.3. Friedman Function

4.3.1 Overview of the Multivariate Adaptive Regression


Splines Model
MARS was introduced by Friedman (1991). This approach addresses the
curse of dimensionality when the researcher wants to approximate an output
subject to a considerable number of inputs. MARS facets a curved surface
in a high-dimensional space using connecting tangent planes to the surface,
analogous to splines in the one-dimensional case where these curves are
joined at the knots.
Let
x = (x1 , . . . , xd )> ∈ X ⊂ Rd
be the input vector. This approach is a non-parametric regression technique,
which is intended to deal with non-linearity and interactions within the
inputs when modelling a given output y(x) in the following way:

Y (x) = f (x) + ,

where inputs are contained in the deterministic function f (·) that can be
non-linear, along with a random component  whose mean is zero.
Note that MARS model is intended to tackle cases where we assume a
random part, as in any regular regression technique. It aims to construct
a predictor fˆ(x) on the domain X . This predictor is decomposed into M
basis functions Bi (x) in the following way:
M
X
fˆ(x) = ci Bi (x),
i=1

where the coefficients {ci }M


i=1 are adjusted in the predictor’s expansion in
order to provide the best fit to the data.
MARS was intended for functions observed with error, unlike the de-
terministic (4.5). Nevertheless, by allowing MARS to consider interaction
effects involving up to two inputs at a time, the assumed functional form is
analogous to that we will be assuming for our joint-effect GaSPs. For fitting
the MARS model, we use the R package earth implemented by Milborrow
(2019). Furthermore, MARS fitting is set up by allowing three evenly spaced
knots for each input.

4.3.2 Simulation Settings


Table 4.1 shows the simulation settings for this case study. We aim to
determine how competitive the E-JE2 and U-JE2 GaSPs are against the Std

54
4.3. Friedman Function

GaSP and MARS (up to 2-input interactions) in test prediction accuracy.


The rule of thumb for the smallest training set size is used here, resulting
in n = 10d = 50 with 20 different mLHDs for each one of the four sizes: 50,
100, 200, and 400 runs. Moreover, the testing set size is fixed at N = 10, 000.
Note we are also using SqExp and PowExp correlation functions as in our
previous simulation studies, in terms of the different GaSPs.

Setting Values
Dimensionality d=5
Inputs x ∈ [0, 1]5
Output yfri (x) as in (4.5)
GaSP Correlation
SqExp and PowExp
Functions
GaSP Correlation RStd (x, x0 ),
Structures RE−JE2 (x, x0 ), and RU−JE2 (x, x0 )
Additional MARS
Approach up to 2-input interactions
D 20 different mLHDs with
Training Sets n = 50, 100, 200, 400 runs
H 1 rLHD with
Testing Set N = 10, 000 runs

Table 4.1: Simulation settings for Friedman function.

4.3.3 Prediction Results


Figure 4.1 shows the prediction accuracy for each of the approaches in Ta-
ble 4.1, and its format is similar to that used for both Michalewicz functions.
Note that the MARS results are repeated in the two panels for SqExp and
PowExp.
The MARS model does not outperform any class of GaSP fitting. More-
over, there is no accuracy difference within each GaSP approach for Sq-
Exp versus PowExp. Note that the Std GaSP starts out with an average
eN−RMSE of 0.03 at n = 50, and ends up with an average eN−RMSE of 0.00006
at n = 400. In terms of the E-JE2 and U-JE2 GaSPs, there is a steady and
equivalent rate of 1/n3.89 (going from an average eN−RMSE of 0.01 at n = 50

55
4.4. Joint-Effect Correlation Structures up to Selected 2-Input Interactions

to 0.000003 at n = 400).
Hence, by following the Effect Hierarchy and Heredity principles with
the joint-effect GaSPs, we can obtain 1/3 to 1/20 of the N-RMSE of the
Std GaSP. It might be possible to achieve an even larger improvement if we
also apply the Effect Sparsity principle since function (4.5) has a single 2-
input interaction. Recall that structures (4.3) and (4.4) take into account all
possible 2-input interactions. We can make slight changes to our joint-effect
GaSP correlation structures, as shown in subsequent sections.

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6
50 100 200 400 50 100 200 400
n

MARS Std E−JE2 U−JE2

Figure 4.1: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the Friedman func-
tion (4.5), with SqExp and PowExp correlation functions. Each boxplot
shows results from 20 mLHDs of n = 50, 100, 200, 400 runs for training and
N = 10, 000 runs for testing.

4.4 Joint-Effect Correlation Structures up to


Selected 2-Input Interactions
The joint-effect correlation structures (4.3) and (4.4) take into account the
Effect Heredity and Hierarchy principles. Nonetheless, the Effect Sparsity
principle is not applied since all possible 2-input interactions are computed

56
4.5. Functional Analysis of Variance

in both structures. Recall that a function of the form (4.2) has specific pairs
of inputs interacting with each other, so it would be reasonable to make a
joint-effect correlation structure with selected 2-input interactions.
Under equal weights on the effect framework, a joint-effect correlation
structure with selected 2-input interactions (E-JE2S) is set up in the follow-
ing way:
d
E−JE2S 0 1 X 1 X
R (x, x ) = Rj (hj ) + Rj (hj ) · Rk (hk ) ∈ [0, 1],
2d 2b
j=1 {j,k}∈B

(4.6)

with the set



B = {j, k} for some 1 ≤ j < k ≤ d} where |B| = b.

We assign weights to main effects and 2-input interactions on a 50-50 basis,


as RE−JE2 (x, x0 ) in (4.3). Furthermore, each main effect and 2-input inter-
action has an equal weight within the corresponding half. Note that as the
number of elements in set B decreases, the equal weight assigned to each
2-input interaction increases.
Analogously to correlation structure RU−JE2 (x, x0 ) in (4.4), we can set
up unequal weights for the selected joint-effect structure under an effect-
framework. A joint-effect correlation structure with unequal weights using
selected 2-input interactions (U-JE2S) is defined as:
d
λ1 X λ2 X
RU−JE2S (x, x0 ) = Rj (hj ) + Rj (hj ) · Rk (hk ) ∈ [0, 1],
d b
j=1 {j,k}∈B

(4.7)

where the weights λ1 and λ2 are subject to the constraint λ1 + λ2 = 1. The


weights and the other correlation parameters are optimized as in Section 4.2.

4.5 Functional Analysis of Variance


Now, a further question arises as to how we can determine the input inter-
actions contained in set B. Hence, we can perform a sensitivity analysis to
identify the low-order interactions that significantly contribute to the GaSP
predictor variance. Sensitivity analysis is used to quantify how much of the

57
4.5. Functional Analysis of Variance

output’s variability is due to each input. Furthermore, we can use it to de-


termine the output variations as more than two inputs are varied. Since our
joint-effect structures (4.6) and (4.7) rely on the selection of certain 2-input
interactions, we can select those important ones through sensitivity analysis.
One tool that is used for sensitivity analysis is the functional analysis
of variance (FANOVA). This sensitivity analysis class decomposes the total
variance of the GaSP predictor into contributions from main or joint effects
(2-input interactions in the specific case of our structures mentioned above).
A FANOVA provides the percentage contribution attributed to each main
or joint effect of the total functional variance, determining its respective
importance.
Even though the experimenter can measure the jth input’s stand-alone
activity through the correlation parameter θj , FANOVA has a crucial ad-
vantage: it measures the activity of a given input on the GaSP predictor
variance in terms of an easily interpreted main effect or its interaction effects.
We describe the procedure found in Schonlau and Welch (2006), where
further details in terms of effect estimation are provided. Let x be the input
vector, with d > 1, which can be partitioned as
x = (xe , xe0 )> ;
where vector xe contains the inputs of interest and xe0 contains the remaining
ones. It is necessary to define the way variables in xe0 are handled, so we can
obtain the effect on the subset in xe . One possible approach is integrating
out the inputs in xe0 .
The variance decomposition of the output y(x) is made over the total
input space χ, which must be the direct product of one-dimensional regions:
χ = ⊗dj=1 χj ,
where χj is the input region corresponding to output xj (taken as a continu-
ous or discrete range of possible points). For a discrete input space, we have
to consider a summation instead of an integration. For our specific case, we
consider continuous input ranges where integration is needed. Integration
with respect to different weight functions is possible, but we always take
wj (xj ) to be uniform over the range of xj , which implies an equal interest
across all the input range.
Given the decomposition of vector x, the marginal effect of the inputs
of interest on the response ȳe (xe ) is defined as:
Z Y
ȳe (xe ) = y(xe , xe0 ) wj (xj )dxj for xe ∈ ⊗j∈e χj . (4.8)
⊗j6∈e χj j6∈e

58
4.5. Functional Analysis of Variance

Note that a marginal main effect is obtained when there is only one input
in xe , or a marginal joint effect otherwise.
We use (4.8) to compute the marginal effects that decompose the output
y(x) in the following way:
d
X d−1 X
X d
y(x) = µ0 + µj (xj ) + µjj 0 (xj , xj 0 ) + · · · + µ1...d (x1 , . . . , xd ).
j=1 j=1 j 0 =j+1

(4.9)

The element µ0 represents the overall average:


Z
µ0 = y(x)w(x)dx.
χ

The overall average is used to compute the respective corrected main


effects in the following way:

µj (xj ) = ȳj (xj ) − µ0 for xj ∈ χj .

Furthermore, the corrected joint effects are computed using the previous
overall and corrected main effects:

µjj 0 (xj , xj 0 ) = ȳjj 0 (xj , xj 0 ) − µj (xj ) − µj 0 (xj 0 ) − µ0 for xj , xj 0 ∈ χj ⊗ χj 0 .


(4.10)

Note that this computation can be expanded to higher-order interactions


which are corrected by lower-order effects.
The effects in (4.10) are orthogonal with respect to the weight function
w(x), which decompose the total variance of y(x) as:
Z d Z
2 X
µ2j (xj )wj (xj )dxj +

y(x) − µ0 w(x)dx =
χ j=1 χj

d−1 X
X d Z
+ µ2jj 0 (xj , xj 0 )wj (xj )wj 0 (xj 0 )dxj dxj 0
j=1 j 0 =j+1 χj ⊗χj 0
Z d
Y
+ ··· + µ21...d (x1 , . . . , xd ) wj (xj )dxj .
χ j=1

(4.11)

59
4.5. Functional Analysis of Variance

Each addend in (4.11) decomposes the total variance in terms of each


effect, which provides its respective importance. Therefore, this decompo-
sition is a useful tool for a proper construction of set B in the joint-effect
correlation structures (4.6) and (4.7). Note that the marginal effects can be
estimated until a given degree of interaction, e.g., up to the 2-input joint
effects for the structures mentioned above.

60
4.6. Franke Function

4.6 Franke Function


As seen in Section 4.3 with the Friedman function, the use of the joint-
effect correlation structures up to all 2-input interactions outperforms the
Std GaSP and MARS. However, there is already a high-accuracy framework
in this case. Thus, we need a more challenging test function.

1.2
1.0
0.8
y

0.6
0.4
0.2

1.0
0.8
0.6 1.0
x2 0.8
0.4
0.6
0.2 0.4 x1
0.2
0.0 0.0

Figure 4.2: 2-dimensional Franke function as plotted by Surjanovic and


Bingham (2017).

Therefore, we introduce the Franke function, used as a 2-dimensional


test function by Franke (1979) and Haaland and Qian (2011). Franke (1979)
introduced this function as the principal case study in his comparative work
on interpolation methods with scattered data. Haaland and Qian (2011)
used this function for their multi-step procedure in “large-scale” computer
experiments (i.e., with a large number of input runs) to obtain interpolated
predictions for a testing set. As illustrated in Figure 4.2 by Surjanovic and
Bingham (2017), the function has two Gaussian peaks with different heights
and a smaller dip. It has the following form:

61
4.6. Franke Function

(9x1 − 2)2 (9x2 − 2)2


 
yfra (x1 , x2 ) = 0.75 exp − −
4 4
2
 
(9x1 + 1) 9x2 + 1
+ 0.75 exp − −
49 10
2 (9x2 − 3)2
 
(9x1 − 7)
+ 0.5 exp − −
4 4
− 0.2 exp[−(9x1 − 4)2 − (9x2 − 7)2 ]
where x1 , x2 ∈ [0, 1]. (4.12)

To create a test function with only some 2-input interaction effects, we


will expand the Franke function to eight dimensions. Let x = (x1 , . . . , x8 )> ∈
[0, 1]8 be the input vector that generates an output of the form

y8f (x) = yfra (x1 , x2 ) + yfra (x3 , x4 ) + yfra (x5 , x6 ) + yfra (x7 , x8 ). (4.13)

Function (4.13) is our last abstract mathematical function, before moving


on to the applied case studies in this chapter. Again, the function’s high-
dimensionality and complexity provide an adequate framework to test the
non-standard correlation structures from Section 4.4. We have an additive
form, just as with the Friedman function, and there are four specific 2-
input interactions. Hence, we can test the selective join-effect GaSPs: E-
JE2S (4.6) and U-JE2S (4.7). This testing allows us to follow the Effect
Hierarchy, Heredity, and Sparsity principles.

4.6.1 Simulation Settings


We detail the simulation settings for the expanded 8-dimensional Franke
function (4.13) to compare test prediction accuracy of our non-standard
E-JE2S and U-JE2S GaSPs, using structures (4.6) and (4.7) respectively,
versus the Std GaSP. To incorporate other approaches that could open up
the perspective of the advantages of our non-standard correlation structures,
we also use CGP and MARS (up to 2-input interactions) models.
The rule of thumb for the smallest training set size is used, giving n =
10d = 80 with 20 different mLHDs for each one of the four training set sizes:
80, 160, 320, and 640 runs. Furthermore, the testing set size is fixed at N =
10, 000. Note we are also using SqExp and PowExp correlation functions as
in our previous simulation studies in terms of an Oracle (described next),
E-JE2S, U-JE2S, and Std GaSPs; unlike CGP that only uses SqExp in the R

62
4.6. Franke Function

package CGP (Ba and Joseph, 2018). We aim to determine how competitive
the joint-effect E-JE2S and U-JE2S GaSPs are against MARS (up to 2-input
interactions), CGP, Std, and Oracle GaSPs. Table 4.2 shows the simulation
settings.

Setting Values
Dimensionality d=8
Inputs x ∈ [0, 1]8
Output y8f (x) as in (4.13)
GaSP Correlation
SqExp and PowExp
Functions
RStd (x, x0 ) with one GaSP model (Std),
GaSP Correlation
and four independent GaSPs (Oracle);
Structures
RE−JE2S (x, x0 ), and RU−JE2S (x, x0 )
Additional CGP and MARS
Approach up to 2-input interactions
D 20 different mLHDs with
Training Sets n = 80, 160, 320, 640 runs
H 1 rLHD with
Testing Set N = 10, 000 runs

Table 4.2: Simulation settings for the expanded Franke function.

In contrast to determining the important input interactions empirically


via FANOVA, the Oracle GaSP “knows” the effects that are needed. It fits
four 2-dimensional independent Std GaSPs for the four terms in (4.13). Let
D be the n×8 design matrix that contains the input vectors from training set
D. The four GaSP fittings require n × 2 submatrices Dpq , in the partitioned
design matrix h i
D = D12 D34 D56 D78 .

Note that corresponding to Dpq , there is an output vector yfra (xp , xq ) (4.12)
to fit a GaSP model with only two inputs.
This arrangement will give us the following global prediction on the
(pq)
test input vector xt ∈ [0, 1]2 , based on the sum of four separate BLUPs

63
4.6. Franke Function

computed with (2.12):


(Oracle) (12)  (34)  (56)  (78) 
ŷ8f (xt ) = ŷfra xt + ŷfra xt + ŷfra xt + ŷfra xt .

Given the form of function (4.13) and how the four GaSP predictions are
obtained, we will not get better prediction accuracy using any other form of
GaSP fitting considered here.
Returning to the practical case where the Oracle is unknown, since we
are dealing with selected 2-input interactions in the joint-effect correlation
structures (4.6) and (4.7), we have to define the pairs of inputs used in
the respective set B. Thus, under a Std GaSP using SqExp and PowExp
correlation functions for the sake of comparison, we run FANOVA up to
2-input interactions on each one of the 20 repeated experiments per training
set size. Note that this analysis is implemented via the GaSP software by
Welch (2014). Integration in FANOVA is with respect to uniform weights
over the inputs’ respective ranges; the same will be done for subsequent case
studies in this chapter.

SqExp PowExp
100

● ●
● ●
90 ● ● ● ● ● ● ●

80
Variance Percentage

70

60

50

40

30

20

10 ● ● ● ●





● ● ● ● ●
0 ● ● ● ● ● ● ●

80 160 320 640 80 160 320 640


n


Main Effects ●
2−Input Interactions ●
Higher−Order Interactions

Figure 4.3: FANOVA summary plot by type of effect for the expanded
Franke function (4.13) using a Std GaSP, with SqExp and PowExp cor-
relation functions. Each boxplot shows results from 20 mLHDs of n =
80, 160, 320, 640 runs for training.

64
4.6. Franke Function

Figure 4.3 summarizes the variance percentages for each type of effect
in all our repeated experiments. The lines depict the evolution of the mean
percentage across all training set sizes. Moreover, the plots are faceted by
correlation functions, and we have the following results:

• For both correlation functions, we can see that the main effects account
for 90% or more of the process variance on average. Nonetheless, there
is a subtle increasing trend in the SqExp case, unlike its PowExp
counterpart across all training set sizes. Either way, the percentage
assigned to the eight main effects is high for both functions.

• Now, we still have another remaining 10% of the predictor variance.


The 2-input interactions clearly dominate the percentage. Note that,
in the SqExp case, the mean percentage remains quite constant across
the four training set sizes. On the other hand, with the PowExp
function, the mean percentage starts with a mean value of 5% at n =
80 and ends up with a mean value of 10% at n = 640.

• Finally, for both correlation functions, the percentage corresponding


to higher-order interactions is negligible.

A FANOVA approach has to produce consistent results on the impor-


tance of specific 2-input interactions across a set of replicates on a small
training size n, since the experimenter needs to rely on one experiment, pos-
sibly with limited input runs for sensitivity analysis. Figure 4.4 illustrates
the variance percentages of the eight main effects for the 20 experiments
with n = 160, as well as all the 2-input interactions for both correlation
functions. Note that PowExp shows four clearly separated magnitudes of
effect:

• Figure 4.4a indicates that the main effects for inputs x2 , x4 , x6 , and
x8 each account for about 15% of the predictor variance on average;
the remaining four main effects account for about 7.5% each.

• For the 2-input interactions in Figure 4.4b, there are clearly two mag-
nitudes of effect:

1. Those with medians between 1 and 2% of the predictor variance;


namely x1 · x2 , x3 · x4 , x5 · x6 , and x7 · x8 .
2. The remaining pairs, which all have negligible percentages.

65
4.6. Franke Function

SqExp PowExp
22.5

20.0

17.5
Variance Percentage

15.0

12.5 ●

10.0 ●

7.5

5.0

2.5

0.0

x2 x6 x4 x8 x1 x3 x5 x7 x2 x6 x4 x8 x1 x3 x5 x7

(a) Main effects.

4.5
4.0
3.5
3.0

SqExp
2.5
2.0
1.5 ●
Variance Percentage



● ●

1.0 ● ●

● ●


0.5 ● ●

● ● ●









● ● ● ●


● ● ●

0.0 ●

4.5
4.0

3.5
3.0
PowExp

2.5
2.0
1.5
1.0
0.5 ● ● ●

● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ●

● ● ●

0.0
x7 x3 x5 x1 x6 x1 x3 x2 x4 x4 x2 x6 x1 x2 x2 x3 x5 x1 x4 x1 x1 x2 x2 x3 x4 x5 x3 x1
x8 x4 x6 x2 x8 x6 x6 x8 x8 x6 x6 x7 x4 x5 x4 x8 x8 x3 x7 x7 x8 x3 x7 x7 x5 x7 x5 x5

(b) 2-input interactions.

Figure 4.4: FANOVA percentage contributions by type of effect for the


expanded Franke function (4.13) using a Std GaSP, with SqExp and PowExp
correlation functions. Each boxplot shows results from 20 mLHDs of n = 160
runs for training.
66
4.6. Franke Function

Thus, the effects with the largest percentage contributions are our eight
main effects (x1 , x2 , x3 , x4 , x5 , x6 , x7 , and x8 ), plus four 2-input interactions
(x1 · x2 , x3 · x4 , x5 · x6 , and x7 · x8 ). Now, we can construct our set B for
RE−JE2S (x, x0 ) and RU−JE2S (x, x0 ) as:

B = {1, 2}, {3, 4}, {5, 6}, {7, 8} . (4.14)

Thus, the specific correlation structures for this test function are defined as:
8
E−JE2S 0 1 X
R (x, x ) = Rj (hj )
16
j=1
1
+ [R1 (h1 ) · R2 (h2 ) + R3 (h3 ) · R4 (h4 )
8
+ R5 (h5 ) · R6 (h6 ) + R7 (h7 ) · R8 (h8 )] ∈ [0, 1],

and
8
λ1 X
RU−JE2S (x, x0 ) = Rj (hj )
8
j=1
λ2
+ [R1 (h1 ) · R2 (h2 ) + R3 (h3 ) · R4 (h4 )
4
+ R5 (h5 ) · R6 (h6 ) + R7 (h7 ) · R8 (h8 )] ∈ [0, 1].

4.6.2 Prediction Results


Figure 4.5 shows the prediction accuracy for each of the approaches in Ta-
ble 4.2, and its format is similar to that used for Friedman function. There
are two panels for SqExp versus PowExp showing the results from these cor-
relation families’ use for the GaSP approaches; the results from MARS with
up to 2-input interactions are repeated in the two panels. Note that MARS
has poor performance comparable to the Std GaSP. Furthermore, there is
no substantial accuracy difference between SqExp and PowExp functions
within each GaSP approach.
CGP, with only the SqExp correlation function, also has the same poor
performance as the Std GaSP showing a rate of convergence of 1/n0.28 (with
an average eN−RMSE of 0.56 at n = 80, which ends up with an average
eN−RMSE of 0.32 at n = 640). The rate of convergence slightly improves to
1/n0.55 with the PowExp function and the Std GaSP, which starts out with
an average eN−RMSE of 0.32 at n = 80 and ends up with 0.18 at n = 640. Not
surprisingly, the Oracle GaSP clearly outperfoms the rest of the approaches

67
4.6. Franke Function

with a rough rate of 1/n4.12 . Note that the Oracle GaSP starts out with an
eN−RMSE of 0.06 at n = 80, and ends up with 0.00001 at n = 640.
In terms of the joint-effect GaSPs (E-JE2S and U-JE2S, which show
equivalent and overlapped performances), the improvement in prediction
accuracy versus all methods except the Oracle is noticeable. Regardless of
the correlation function, both joint-effect GaSPs show the same prediction
accuracy: an average eN−RMSE of 0.18 at n = 80 and 0.003 at n = 640
(showing a rate of 1/n1.92 ). As we previously pointed out, the prediction
accuracy of the Oracle GaSP will not be outperformed. Nonetheless, we
manage to be decently close with an appropriate GaSP correlation structure.

SqExp PowExp
100 0

10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6
80 160 320 640 80 160 320 640
n

Oracle CGP MARS Std E−JE2S U−JE2S

Figure 4.5: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the expanded Franke
function (4.13), with SqExp and PowExp correlation functions. Each box-
plot shows results from 20 mLHDs of n = 80, 160, 320, 640 runs for training
and N = 10, 000 runs for testing. CGP is only implemented with SqExp.

68
4.7. Joint-Effect Correlation Structures up to All 3-Input Interactions

4.7 Joint-Effect Correlation Structures up to All


3-Input Interactions
The use of non-standard correlation structures in a GaSP allows use of a
variety of arrangements, different from the usual RStd (x, x0 ) (2.4), which go
beyond the ones introduced in Sections 4.2 and 4.4. Hence, how useful is it to
expand our approach from E-JE2 and U-JE2 GaSPs to 3-input interactions?
Even though the Effect Hierarchy principle states that lower-order effects
tend to be more significant than higher-order effects, our subsequent two
case studies show that the incorporation of 3-input interactions might give
extra prediction accuracy compared with the Std GaSP.
Therefore, we try a joint-effect correlation structure with equal weights
using up to all 3-input interactions (E-JE3) in the following way:
d d d
E−JE3 0 1 X 1 X X
R (x, x ) = Rj (hj ) + d Rj (hj ) · Rk (hk )
3d 3 2 j=1 j=k+1
j=1
d d d
1 X X X
+ d
 Rj (hj ) · Rk (hk ) · Rl (hl )
3 3 j=1 j=k+1 k=l+1

∈ [0, 1]. (4.15)

Note that the three types of terms—main effects, 2-input interactions,


and 3-input interactions—each receive weight 1/3, with the 1/3 distributed
equally across all the correlation functions in the type of term. As in the pre-
vious joint-effect structures, its correlation function is shared across all types
of effect for any input. Thus, we use the same correlation hyperparameters
for the three types of effects.
Analogous to the U-JE2 GaSP from Section 4.2, we can incorporate
unequal weights in the joint-effect structure (4.15), i.e., U-JE3. It has cor-
relation function
d d d
λ1 X λ2 X X
RU−JE3 (x, x0 ) = Rj (hj ) + d Rj (hj ) · Rk (hk )
d
j=1 2 j=1 j=k+1
d d d
λ3 X X X
+ d
 Rj (hj ) · Rk (hk ) · Rl (hl )
3 j=1 j=k+1 k=l+1

∈ [0, 1], (4.16)


P3
where the weights fulfil the constraint j=1 λj = 1.

69
4.8. Joint-Effect Correlation Structure with a Residual Effect Term

For maximum likelihood optimization, we again use a multinomial logit


transformation to satisfy the constraint. Optimization is performed on a
raw weight vector
τ = (τ1 , τ2 )> ∈ (−∞, ∞)2 ,
where
exp (τi )
λi = P2 for i = 1, 2;
1 + k=1 exp (τk )
and
1
λ3 = P2 .
1+ k=1 exp (τk )
Under this correlation structure, the two parameters in vector τ are
estimated along with the vectors already defined in Section 2.4 for various
correlation functions.

4.8 Joint-Effect Correlation Structure with a


Residual Effect Term
We also introduce two additional non-standard structures that have a par-
ticular feature, as described next. Analogous to linear models in physical
experiments, where we model the output as the sum of systematic compo-
nents (the regression part) plus a residual, the use of an additional correlated
term for residual effects is an appealing alternative.
Hence, as an alternative to the explicit higher-order effects in Section 4.7,
we introduce two correlation structures that have a residual part correspond-
ing to higher-order interactions, beyond all the 2-input interactions. Our
joint-effect correlation structure using all 2-input interactions with a resid-
ual effect term (E-JE2R) assigns fixed equal weights to the main effects,
2-input interactions, and residual component in the following way:
d d d
1 X 1 X X
RE−JE2R (x, x0 ) = Rj (hj ) + d Rj (hj ) · Rk (hk )
3d 3
j=1 2 j=1 k=j+1
d
1Y
+ RjRes (hj ) ∈ [0, 1]. (4.17)
3
j=1

The correlation structure (4.17) resembles the structure RE−JE2 (x, x0 ) (4.3),
but there is an additional addend RjRes (hj ) representing the residual com-
ponent under a standard multiplicative structure.

70
4.8. Joint-Effect Correlation Structure with a Residual Effect Term

Building on the previous ideas from Section 4.2, we have the following
set of pairs of input indices:

B = {j, k} for some 1 ≤ j < k ≤ d where |B| = b.

Under equal weights on the effect framework, a joint-effect correlation struc-


ture with selected 2-input interactions and a residual effect term (E-JE2SR)
is set up in the following way:
d
1 X 1 X
RE−JE2SR (x, x0 ) = Rj (hj ) + Rj (hj ) · Rk (hk )
3d 3b
j=1 {j,k}∈B
d
1 Y
+ RjRes (hj ) ∈ [0, 1]. (4.18)
3
j=1

As a side note, we are not expanding this approach to unequal weights


since both structures pose numerical problems in the log-likelihood optimiza-
tion. We are also making simplifications to the parameters in the correlation
function in the residual component. For instance, in the case of the SqExp
correlation function, this component will be:
d d d
!
SqExp
Y Y X
0 0 2 0 2
RjRes (hj , θ ) = exp(−θ hj ) = exp − θ hj .
j=1 j=1 j=1

On the other hand, for the PowExp correlation function, the residual com-
ponent becomes:
d d d
!
PowExp 0 p0 p0
Y Y X
0 0 0
RjRes (hj , θ , p ) = exp(−θ hj ) = exp − θ hj .
j=1 j=1 j=1

Note that, depending on the correlation function, we are respectively


setting up common correlation and smoothness parameters θ0 and p0 across
the d dimensions. The common parameter(s) across input dimensions dis-
tinguishes this residual term from correlation structure RStd (x, x0 ) (2.4).
This alternative setup also has computational advantages but restricts the
weights assigned to the d input distances hj . Hence, we are assuming the
same input activity across all input dimensions within these residual effects.

71
4.9. Output Transformerless Circuit Function

4.9 Output Transformerless Circuit Function


Even though the Michalewicz, Friedman, and Franke functions provide a
testing framework to show the predictive advantages of our non-standard
correlation structures, these functions can be considered “toy examples”.
Therefore we proceed with the OTL circuit function, addressed by Ben-Ari
and Steinberg (2007) and Moon (2010), as one of the two further applica-
tions of our non-standard correlation structures. As previously stated in
Chapter 1.
This engineering-based function models Vo , the midpoint voltage from
an output transformerless push-pull circuit, as follows:
(Vb1 + 0.74)β(Rc2 + 9) 11.35Rf 0.74Rf β(Rc2 + 9)
Vo (x) = + +  ,
β(Rc2 + 9) + Rf β(Rc2 + 9) + Rf β(Rc2 + 9) + Rf Rc1
(4.19)

where Vb1 = 12Rb2 /(Rb1 + Rb2 ). The inputs, units and ranges are

Rb1 = resistance b1 in kΩ ∈ [50, 150]


Rb2 = resistance b2 in kΩ ∈ [25, 70]
Rf = resistance f in kΩ ∈ [0.5, 3]
Rc1 = resistance c1 in kΩ ∈ [1.2, 2.5]
Rc2 = resistance c2 in kΩ ∈ [0.25, 1.2]
β = current gain in A ∈ [50, 300].
(4.20)

We usually encounter data sparsity and high-dimensionality in computer


experiments. The literature offers non-parametric smoothing methods, dif-
ferent from kriging (namely, a Std GaSP), that attempt to overcome these
two facts. Ben-Ari and Steinberg (2007) use the OTL function to empirically
compare the predictive performance of three common smoothing methods,
which are used for high-dimensional data in computer experiments: kriging,
MARS (previously described in Section 4.3.1), and project pursuit regression
(PPR).

4.9.1 Overview of the Project Pursuit Regression Model


As in Sections 3.5.1 and 4.3.1 for the CGP and MARS models, we provide
an overview of PPR before detailing all our simulation arrangements. This
approach is introduced by Friedman and Stuetzle (1981), and is considered

72
4.9. Output Transformerless Circuit Function

a method of non-parametric multiple regression. The authors argue that


parametric approaches assume a known functional form for the regression
surface. Thus, we only need to estimate the corresponding parameters.
However, this assumed functional form might not be appropriate for a given
case and lead to spurious results.
Non-parametric approaches, such as PPR, are meant to overcome the
issue above since they do not need extensive assumptions about the form
of the regression surface. PPR models this surface in the form of a sum of
general smooth functions of linear combinations of the inputs while following
an iterative estimation algorithm. Let x = (x1 , . . . , xd )> ∈ X ⊂ Rd be the
input vector. PPR models the output y(x) as
k d
!
X X
αjl xl +  = β > f α> x + ,

Y (x) = β0 + βj fj
j=1 l=1

where the coefficients vector β = (β0 , β1 , . . . , βk )> ∈ Rk+1 is unknown.


Furthermore, there is a random component  whose mean is zero.
The k functions in vector f α> x are flexible smooth splines. Moreover,


the coefficients in vector

αj = (αj,1 , . . . , αj,d )> ∈ Rd for j = 1, . . . , k

represent a one-dimensional projection of the d inputs, scaled so that


q
αj = αj,1 2 + · · · + α2 = 1.
j,d

By using a training design D of n input vectors; the PPR iterative esti-


mation algorithm chooses vectors β and f α> x that minimize the following


error function:
n
X 2
yi − β > f α > x .

min E =
β,f ,α
i=1

4.9.2 Simulation Settings


The OTL circuit function (4.19) has an additive form, but we have higher-
order input interactions in all addends. Therefore, can we still apply the
Effect Heredity and Hierarchy principles as with the Friedman function? To
answer this critical question, we use the joint-effect correlation structures
up to all 3-input interactions from Section 4.7. We also include the struc-
tures with the residual term introduced in Section 4.8. Note that the Effect
Sparsity principle can be fulfilled with RE−JE2SR (x, x0 ) (4.18).

73
4.9. Output Transformerless Circuit Function

We aim to explore the advantages of our joint-effect correlation struc-


tures, so we also look at the non-parametric methods used by Ben-Ari and
Steinberg (2007). Their simulation study compares the predictive perfor-
mance of a Std GaSP under a SqExp correlation function, MARS, and PPR
using 120 runs in a single LHD. These 120 runs are only split once in a
training set of 80 runs, a test set of 20 runs, and an evaluation set of 20
runs. Their study’s main goal is to compare the prediction accuracy of the
three models as follows:

1. The training set is used to find the respective tuning parameters of


MARS and PPR models. These parameters are found via multiple
model fittings with sets of different values, and the chosen ones provide
the best prediction accuracy on the test set.

2. With the final tuning parameters, a single MARS and PPR model is
fitted with the training and test sets’ merge. The Std GaSP is directly
fitted using these two datasets as well.

3. Accuracy is assessed via unnormalized RMSE (3.7) (U-RMSE) using


the 20 evaluation runs.

Approach U-RMSE
PPR 0.060
MARS 0.024
Std GaSP under the
0.009
SqExp Correlation Function

Table 4.3: Prediction performance comparison by non-parametric smooth-


ing method for the OTL circuit function (4.19). These results are reported
by Ben-Ari and Steinberg (2007) with 120 runs from a single LHD split in
a training set of 80 runs, a test set of 20 runs, and an evaluation set of 20
runs.

Table 4.3 shows the U-RMSE results, obtained by Ben-Ari and Stein-
berg (2007), for assessing prediction accuracy of the three approaches in
the evaluation set of 20 runs. We can see that the Std GaSP outperforms
PPR and MARS. We would expect to obtain similar conclusions regarding
the accuracy differences concerning PPR and MARS compared to the Std
GaSP in our simulation studies.

74
4.9. Output Transformerless Circuit Function

Table 4.4 shows the simulation settings for this case study. The rule of
thumb for the smallest training set size is used, giving n = 6d = 60 with
20 different mLHDs for each of the three training set sizes: 60, 120, and
240 runs. Furthermore, the test set size is fixed at N = 10, 000. Note that,
unlike Ben-Ari and Steinberg (2007), we are using the PowExp correlation
function for the Std GaSP since this setting shows the most competitive
prediction accuracy for the OTL circuit function against our non-standard
correlation structures.

Setting Values
Dimensionality d=6
Inputs x as in (4.20)
Output Vo (x) as in (4.19)
RStd (x, x0 ) using PowExp versus
GaSP Correlation
RE−JE2R (x, x0 ), RE−JE2SR (x, x0 ),
Structures
R E−JE3 (x, x0 ), and RU−JE3 (x, x0 ) using SqExp
Additional PPR and MARS
Approaches up to 2 and 3-input interactions
D 20 different mLHDs with
Training Sets n = 60, 120, 240 runs
H 1 rLHD with
Testing Set N = 10, 000 runs

Table 4.4: Simulation settings for OTL circuit function.

We will also include PPR and MARS as additional approaches:

• The PPR model is fitted with the R base package stats via func-
tion ppr() with 6 as the maximum number of terms to select from
when fitting the model. The smoothing method is Friedman’s “super
smoother” found in Friedman (1984).

• For fitting the MARS model, we use the R package earth (Milborrow,
2019). Furthermore, MARS fitting is set up by allowing three evenly
spaced knots for each input. It is important to state that Ben-Ari and
Steinberg (2007) do not clarify the maximum order of interaction, so
we are setting it up according to our specific subsequent simulation

75
4.9. Output Transformerless Circuit Function

arrangements to match the order of interaction in the GaSP models


(i.e., up to 2 or 3-input interactions).

This case study aims to compare our joint-effect GaSPs with all 3-input
interactions (i.e., E-JE3 and U-JE3 from Section 4.7) and residual terms (i.e.,
E-JE2R and E-JE2SR from Section 4.8) against MARS (up to 2 or 3-input
interactions), PPR, and Std GaSP with PowExp. Note that the joint-effect
GaSPs are only implemented with the SqExp correlation function.
As in the case of the Franke function, we are dealing with selected 2-
input interactions in the joint-effect correlation structure RE−JE2SR (x, x0 )
(4.18). Hence, we need to define the pairs of inputs used in the respective
set B. Thus, under a Std GaSP using both SqExp and PowExp correla-
tion functions for the sake of comparison, we run FANOVA up to 3-input
interactions on each of the 20 repeated experiments with training set size
n = 60.
Figure 4.6 illustrates the variance percentages of the six main effects for
the 20 experiments with n = 60, as well as the top six input interactions
that stand out by increasing medians and all of them involve two inputs only.
Figure 4.6a shows the boxplots of the variance percentages corresponding to
the six main effects, where both correlation functions yield similar results.
One can see that input Rc1 is the most active with medians between 55 and
70% followed by Rf with medians between 30 and 40%. Note that Rb2 and
Rb1 are in third and fourth places with medians above 0.25%.
According to Figure 4.6b, which shows the top six input interactions, the
FANOVA results are following the Effect Heredity and Hierarchy principles.
The interaction Rf ·Rc1 is the most active with median variance contribution
slightly below 0.50% for SqExp and 0.75% for PowExp. If we take into
account the remaining five 2-input interactions that stand out (Rb2 · Rc1 ,
Rb1 · Rc1 , Rb2 · Rf , Rb1 · Rf , and Rb1 · Rb2 ) besides Rf · Rc1 , these are all the
possible 42 combinations of the top four aforementioned main effects.
Even though the last three 2-input interactions in Figure 4.6b show small
percentages, their inclusion in our set B is important for the E-JE2SR GaSP
in order to be competitive against the Std GaSP as we will further see along
with its residual component. Hence, by applying the Effect Sparsity princi-
ple, FANOVA suggests the following effect set for RE−JE2SR (x, x0 ) (4.18):

B = {Rf , Rc1 }, {Rb2 , Rc1 }, {Rb1 , Rc1 },
{Rb2 , Rf }, {Rb1 , Rf }, {Rb1 , Rb2 } .

76
4.9. Output Transformerless Circuit Function

SqExp PowExp SqExp PowExp


100 3.5

90

3.0
80

70 2.5
Variance Percentage

60
2.0

50

1.5
40 ●

30 1.0 ●

20
0.5
10

0 0.0 ●

● ●

Rc1 Rf Rc1 Rf Rb2 Rb1 β R c2 Rb2 Rb1 β Rc2

(a) Main effects.

SqExp PowExp SqExp PowExp


1.75 0.0040

0.0035 ●

1.50 ●

0.0030
1.25
Variance Percentage

0.0025 ●

1.00

0.0020

0.75
0.0015

0.50
0.0010

0.25
0.0005



● ●

0.00 0.0000

Rf.Rc1 Rf.Rc1 Rb2.Rc1 Rb1.Rc1 Rb2.Rf Rb1.Rf Rb1.Rb2 Rb2.Rc1 Rb1.Rc1 Rb2.Rf Rb1.Rf Rb1.Rb2

(b) 2-input interactions.

Figure 4.6: FANOVA percentage contributions by type of effect for the


OTL circuit function (4.19) using a Std GaSP, with SqExp and PowExp
correlation functions. Each boxplot shows results from 20 mLHDs of n = 60
runs for training.
77
4.9. Output Transformerless Circuit Function

4.9.3 Prediction Results


Figure 4.7 shows the prediction accuracy in percent (eN−RMSE × 100%) for
this case study on the log10 (n)-log10 (eN−RMSE ) scale across n = 60, 120, 240.
We are only displaying a single series for MARS since both approaches (up
2 and 3-input interactions) provide identical results. First of all, following
the results found by Ben-Ari and Steinberg (2007), we can see that the Std
GaSP outperforms both MARS and PPR. Furthermore, MARS outperforms
PPR. The rate of convergence with the Std GaSP is 1/n1.67 which shows an
average eN−RMSE of 0.01 at n = 60 and 0.0009 at n = 240.

31.62 −0.5

10 −1

3.16 −1.5

log10(eN−RMSE)
eN−RMSE (%)

1 −2

0.32 −2.5

0.1 −3

0.03 −3.5
60 120 240
n

PPR Std E−JE2SR U−JE3

MARS E−JE2R E−JE3

Figure 4.7: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of GaSP for the OTL circuit func-
tion (4.19). Each boxplot shows results from 20 mLHDs of n = 60, 120, 240
runs for training and N = 10, 000 runs for testing on the log10 (n)-
log10 (eN−RMSE ) scale. Std GaSP is implemented with PowExp. E-JE2R,
E-JE2SR, E-JE3, and U-JE3 GaSPs are implemented with SqExp.

In terms of the joint-effect GaSPs, all approaches have a better accuracy


than PPR and MARS. From the joint-effect GaSPs up to all 3-input inter-
actions, only the U-JE3 outperforms the Std GaSP. The E-JE3 GaSP has
a slower rate of convergence (1/n1.46 ) with an average eN−RMSE of 0.01 at

78
4.9. Output Transformerless Circuit Function

n = 60 which ends up with an average eN−RMSE of 0.001 at n = 240. On the


other hand, the U-JE3 GaSP shows an equivalent rate when compared to
the Std GaSP but with an average eN−RMSE of 0.007 at n = 60 and 0.0007
at n = 240. Note the fast and sustained rates of convergence of the E-JE2R
(1/n2.29 ) and E-JE2SR (1/n2.08 ) GaSPs. Even though the E-JE2SR GaSP
is outperformed by the other GaSP approaches at n = 60; it manages to
provide the best prediction accuracy at n = 240 with an average eN−RMSE
of 0.0006.
Table 4.5 provides the summary statistics of the eN−RMSE s (2.14) by
GaSP approach at n = 240 from the 20 replicates, where those non-standard
structures outperforming the Std GaSP are highlighted. The summary
statistics correspond to the minimum (Min), first quartile (Q1 ), median,
mean, third quartile (Q3 ), maximum (Max), and interquartile range (IQR).

Approach Min Q1 Median Mean Q3 Max IQR


Std 0.0008 0.0009 0.0009 0.0009 0.0010 0.0010 0.0001
E-JE2R 0.0010 0.0011 0.0012 0.0012 0.0012 0.0017 0.0002
E-JE2SR 0.0005 0.0006 0.0006 0.0006 0.0007 0.0008 0.0001
E-JE3 0.0009 0.0010 0.0010 0.0010 0.0011 0.0013 0.0001
U-JE3 0.0005 0.0006 0.0007 0.0007 0.0008 0.0009 0.0002

Table 4.5: Summary statistics of eN−RMSE s (2.14) by type of GaSP for the
OTL circuit function (4.19). Each figure summarizes the results from 20
mLHDs of n = 240 runs for training and N = 10, 000 runs for testing. Std
GaSP is implemented with PowExp. E-JE2R, E-JE2SR, E-JE3, and U-JE3
GaSPs are implemented with SqExp.

Figure 4.8 shows the prediction accuracy in for this case study at n =
240. We compare the eN−RMSE s (2.14), in their original range of 0 to roughly
1, on the y-axis by GaSP approach on the x-axis with side-by-side boxplots.
Overall the E-JE2SR and U-JE3 GaSPs provide the best prediction accuracy
among all the GaSP approaches with medians of 0.0006 and 0.0007. The
medians are reduced by 33.33% and 22.22% respectively for E-JE2SR and U-
JE3 compared to the Std GaSP. To check whether this difference in medians
is not due to chance with respect to Std GaSP, a Wilcoxon test can be
applied. This results in a p-value < .001 for both joint-effect approaches.

79
4.10. Joint-Effect Correlation Structures up to Selected 3-Input Interactions

0.0018

0.0016

0.0014

0.0012
eN−RMSE

0.0010

0.0008

0.0006

0.0004
E−JE2R E−JE2SR E−JE3 U−JE3 Std

Figure 4.8: Prediction accuracy by type of GaSP for the OTL circuit func-
tion (4.19). Each boxplot shows results from 20 mLHDs of n = 240 runs for
training and N = 10, 000 runs for testing. Std GaSP is implemented with
PowExp. E-JE2R, E-JE2SR, E-JE3, and U-JE3 GaSPs are implemented
with SqExp.

4.10 Joint-Effect Correlation Structures up to


Selected 3-Input Interactions
We introduce two additional non-standard correlation structures used in
the last case study of this chapter. Building on the previous ideas from
Section 4.2, we assume that output y(x) depends on the input vector x =
(x1 , . . . , xd )> ; and we have the following sets of pairs and triples of input
indices, respectively:

B = {j, k} for some 1 ≤ j < k ≤ d where |B| = b, and

C = {j, k, l} for some 1 ≤ j < k < l ≤ d where |C| = c.
Now, y(x) is determined by a sum of d subfunctions yj (xj ), b subfunc-
tions yj,k (xj , xk ), and c subfunctions yj,k,l (xj , xk , xl ), respectively as:
d
X X X
y(x) = yj (xj ) + yj,k (xj , xk ) + yj,k,l (xj , xk , xl ). (4.21)
j=1 {j,k}∈B {j,k,l}∈C

80
4.10. Joint-Effect Correlation Structures up to Selected 3-Input Interactions

Function (4.21) has an additive form, but the inputs are interacting in spe-
cific pairs and triples. Therefore, it would be reasonable to make a joint-
effect correlation structure up to selected 3-input interactions.
With equal weights, a joint-effect correlation structure up to selected 3-
input interactions (E-JE3S) reduces its number of 2 and 3-input interactions
in the following way:
d
E−JE3S 0 1 X 1 X
R (x, x ) = Rj (hj ) + Rj (hj ) · Rk (hk )
3d 3b
j=1 {j,k}∈B
1 X
+ Rj (hj ) · Rk (hk ) · Rl (hl )
3c
{j,k,l}∈C

∈ [0, 1]. (4.22)

We assign equal weights to main effects and both interaction orders. Note
that as the numbers of elements in sets B and C decrease, the equal weight
assigned to each 2 and 3-input interaction respectively increases.
We can set up unequal weights for this class of correlation structure
(U-JE3S) as:
d
λ1 X λ2 X
RU−JE3S (x, x0 ) = Rj (hj ) + Rj (hj ) · Rk (hk )
d b
j=1 {j,k}∈B
λ3 X
+ Rj (hj ) · Rk (hk ) · Rl (hl )
c
{j,k,l}∈C

∈ [0, 1], (4.23)

where the weights fulfil the constraint 3j=1 λj = 1. The weights and the
P
other correlation parameters are optimized as in Section 4.7.

81
4.11. Nilson-Kuusk Model

4.11 Nilson-Kuusk Model


The Nilson-Kuusk model is an ecological code used by Bastos and O’Hagan
(2009) in their work on GaSP modelling diagnostics. To interpret remote
sensing data, researchers use complex canopy reflectance models via com-
puter codes. They also use these models to calibrate vegetation parameters
using reflectance measurements. Kuusk (1996) describes this model in fur-
ther detail.
Chen et al. (2016) point out this case as a slow code, which means
generating new runs for a testing set is too resource-consuming. Therefore,
we have to rely on an accurate GaSP as a surrogate model. As stated by
Bastos and O’Hagan (2009), this case has plant canopy reflectance (CR)
as output and the following five inputs: solar zenith angle (view zenith),
leaf area index (LAI), relative leaf size (sl), Markov clumping parameter
(lmbd z), and parameter λ (lambda).

4.11.1 Simulation Settings


Since this is a slow code, we have to depend on a limited amount of data
and cross-validation to obtain different combinations of runs for training and
testing through randomization. These randomized combinations provide the
basis for the evaluation of variability in prediction performance.
We use the training and testing data found in Bastos and O’Hagan
(2009), which are LHDs composed of 150 and 100 runs, respectively. Fol-
lowing the study made by Chen et al. (2016) in terms of training, we start
with the LHD of 150 runs and augment it with 50 further randomized runs
from the other LHD of 100 runs. The remaining 50 runs of the 100-run
LHD are left as a testing set, and we repeat this process 25 times. Thus,
we calculate the N-RMSE based on 25 different testing sets, which accounts
for variability in prediction accuracy. Table 4.6 shows the corresponding
simulation settings.
We aim to determine how competitive both types of selective joint-effect
GaSPs, either up to 2 (E-JE2S and U-JE2S) or 3-input (E-JE3S and U-
JE3S) interactions, are against the Std GaSP. As in the Franke function, it
is necessary to define the following sets of input interactions:

• input pairs in set B for correlation structures RE−JE2S (x, x0 ) (4.6) and
RU−JE2S (x, x0 ) (4.7), as well as,

• input pairs and triples in sets B and C respectively for correlation


structures RE−JE3S (x, x0 ) (4.22) and RU−JE3S (x, x0 ) (4.23).

82
4.11. Nilson-Kuusk Model

We run FANOVA up to 3-input interactions, under a Std GaSP using


SqExp and PowExp correlation functions, on each one of the 25 randomized
training sets. For running FANOVA, we work under these specific input
ranges as follows:
view zenith ∈ [−60, 60]
LAI ∈ [1, 8]
sl ∈ [0.05, 0.35]
lmbd z ∈ [0.4, 1]
lambda ∈ [555, 850].

Setting Values
Dimensionality d=5
Inputs view zenith, LAI, sl, lmbd z, and lambda
Output CR
GaSP Correlation
SqExp and PowExp
Functions
RStd (x, x0 ),
GaSP Correlation
RE−JE2S (x, x0 ), RU−JE2S (x, x0 ),
Structures
R E−JE3S (x, x0 ), and RU−JE3S (x, x0 )
D 25 randomized LHDs with
Training Sets n = 200 runs
H 25 randomized LHDs with
Testing Set N = 50 runs

Table 4.6: Simulation settings for Nilson-Kuusk model.

Figure 4.9 shows the boxplots of the variance percentages accounted for
by estimated main effects, 2 and 3-input interaction effects from all the 25
randomized LHDs of n = 200 runs. Both correlation functions yield similar
results, and the main effects together have median around 92% whereas the
2-input interactions have median around 8%. Note that 3-input interactions
show small percentages, but one of them will emerge for the selective joint-
effect structures RE−JE3S (x, x0 ) (4.22) and RU−JE3S (x, x0 ) (4.23).
Figure 4.10a shows the boxplots of the variance percentages correspond-
ing to the five main effects where both correlation functions yield similar

83
4.11. Nilson-Kuusk Model

results. One can see that input lambda is the most active one with medians
around 87% followed by LAI with medians close to 4%.
According to Figure 4.10b, which shows the top eight input interac-
tions, the interaction LAI · lambda is the most active with median variance
contribution about 7.5%. Furthermore, both inputs appear in the 3-input
interaction view zenith · LAI · lambda and either one of them appears in the
remaining six 2-input interactions.
Again, the FANOVA results are following the Effect Heredity and Hi-
erarchy principles. Furthermore, applying the Effect Sparsity principle,
FANOVA suggests the following effect sets:

B = {LAI, lambda}, {view zenith, lambda}, {sl, lambda},
{view zenith, LAI}, {lmbd z, lambda}, {view zenith, lmbd z},
{LAI, sl} ;

and 
C = {view zenith, LAI, lambda} .
Regarding the input interactions for sets B and C, the boxplots in Fig-
ure 4.10b show that the variance percentages of these effects are never equal
to zero across the 25 randomized LHDs of n = 200 runs for training.

84
4.11. Nilson-Kuusk Model

SqExp PowExp
93.00

92.75

92.50
Variance Percentage

92.25

92.00

91.75

91.50

91.25

91.00

Main Main
Effect

SqExp PowExp
10

8
Variance Percentage

2−Input 3−Input Higher−Order 2−Input 3−Input Higher−Order


Effect

Figure 4.9: FANOVA summary plots for the Nilson-Kuusk model using a
Std GaSP, with SqExp and PowExp correlation functions. Each boxplot
shows results from 25 randomized LHDs of n = 200 runs for training.

85
4.11. Nilson-Kuusk Model

SqExp PowExp
88.0
Variance Percentage

87.5

87.0

86.5

86.0

85.5

85.0
lambda lambda

SqExp PowExp
5.0
Variance Percentage

4.5
4.0 ●

3.5
3.0
2.5
2.0
1.5
1.0 ●

0.5
0.0
LAI view_zenith sl lmbd_z LAI view_zenith sl lmbd_z

(a) Main effects.

SqExp PowExp
8.0
Variance Percentage

7.9
7.8
7.7
7.6
7.5
7.4
7.3
7.2
7.1
7.0
LAI.lambda LAI.lambda

SqExp PowExp
0.200
Variance Percentage

0.175
0.150

0.125
0.100
0.075 ●

0.050
0.025
● ●

0.000
view_zenith view_zenith sl view_zenith lmbd_z view_zenith view_zenith view_zenith sl view_zenith lmbd_z view_zenith
lambda LAI lambda LAI lambda lmbd_z lambda LAI lambda LAI lambda lmbd_z
lambda lambda

(b) 2 and 3-input interactions.

Figure 4.10: FANOVA plots by type of effect for the Nilson-Kuusk model
using a Std GaSP, with SqExp and PowExp correlation functions. Each
boxplot shows results from 25 randomized LHDs of n = 200 runs for training.

86
4.11. Nilson-Kuusk Model

4.11.2 Prediction Results


Figure 4.11 shows the prediction accuracy for this case study. We compare
the eN−RMSE s (2.14), in their original range of 0 to roughly 1, on the y-axis
by GaSP approach on the x-axis with side-by-side boxplots. Moreover, the
plots are faceted by correlation functions. Unlike our previous cases, there
is a single training size n = 200. Another difference is the generation of 25
different randomized testing sets of N = 50 for each one of those 25 training
sets. We set up two different thresholds on the y-axis for the eN−RMSE , one
dashed line at 0.07 and another solid one at 0.06 for the sake of comparison.

SqExp PowExp
0.11


0.10


● ●

0.09 ●

0.08
eN−RMSE

0.07

0.06 ●

0.05

0.04
● ● ●

0.03

Std E−JE2S U−JE2S E−JE3S U−JE3S Std E−JE2S U−JE2S E−JE3S U−JE3S

Figure 4.11: Prediction accuracy by type of GaSP for the Nilson-Kuusk


model, with SqExp and PowExp correlation functions. Each boxplot shows
results from 25 randomized LHDs of n = 200 runs for training and 25
randomized LHDs of N = 50 runs for testing.

Table 4.7 provides the summary statistics of the eN−RMSE s (2.14) by


GaSP approach and correlation function from the 25 replicates, where those
non-standard structures outperforming the Std GaSP are highlighted. We
summarize the following results:
• SqExp Correlation Function. The E-JE3S and U-JE3S GaSPs
outperform E-JE2S and U-JE2S, and show a reduction in median N-
RMSE of 15.94% when compared to the Std GaSP. In order to check

87
4.11. Nilson-Kuusk Model

whether this difference in medians is not due to chance with respect


to Std GaSP, a Wilcoxon test can be applied, resulting in a p-value
< .001 for both joint-effect approaches (E-JE3S and U-JE3S).
• PowExp Correlation Function. All the joint-effect approaches
show improvements in their medians relative to SqExp and relative
to the Std GaSP. The E-JE3S and U-JE3S GaSPs show a reduction
of 9.83% in median N-RMSE compared to the Std GaSP. In order to
check whether this difference in medians is not due to chance with
respect to the Std GaSP, a Wilcoxon test can be applied, which re-
sults in a p-value < .001 for both joint-effect approaches (E-JE3S and
U-JE3S).

SqExp
Approach Min Q1 Median Mean Q3 Max IQR
Std 0.061 0.065 0.069 0.069 0.073 0.084 0.008
E-JE2S 0.058 0.071 0.075 0.075 0.078 0.092 0.007
U-JE2S 0.059 0.071 0.074 0.075 0.077 0.091 0.006
E-JE3S 0.042 0.053 0.058 0.058 0.063 0.074 0.010
U-JE3S 0.041 0.053 0.058 0.057 0.061 0.072 0.008
PowExp
Approach Min Q1 Median Mean Q3 Max IQR
Std 0.036 0.057 0.061 0.060 0.066 0.073 0.009
E-JE2S 0.060 0.069 0.073 0.075 0.080 0.103 0.011
U-JE2S 0.061 0.068 0.072 0.075 0.080 0.102 0.012
E-JE3S 0.036 0.052 0.055 0.055 0.059 0.072 0.007
U-JE3S 0.036 0.053 0.055 0.056 0.061 0.071 0.008

Table 4.7: Summary statistics of eN−RMSE s (2.14) by type of GaSP for the
Nilson-Kuusk model, with SqExp and PowExp correlation functions. Each
figure summarizes the results from 25 randomized LHDs of n = 200 runs for
training and 25 randomized LHDs of N = 50 runs for testing.

Additionally, those joint-effect structures used for the Friedman and


OTL circuit functions were tried in this case study. The RE−JE3 (x, x0 )
and RU−JE3 (x, x0 ) GaSPs also outperformed the Std GaSP. However, when
we compare these non-standard GaSPs versus their selective counterparts,
the respective Wilcoxon tests by correlation function conclude that the
RE−JE3S (x, x0 ) (4.22) and RU−JE3S (x, x0 ) (4.23) GaSPs have smaller median
N-RMSEs than the RE−JE3 (x, x0 ) (4.15) and RU−JE3 (x, x0 ) (4.16) GaSPs.

88
4.12. Concluding Remarks

4.12 Concluding Remarks


Our joint-effect approaches open up the door to more than main effects,
which allow improved prediction accuracy for further mathematical func-
tions. Thus, cases like the Friedman and Franke functions can be approached
with different joint-effect structures that include up to 2-input interactions.
We can see that prediction accuracy is greatly improved for both functions,
compared to the Std GaSP, in the following ways:
• Friedman Function. The Friedman function shows that joint-effect
GaSPs, up to all 2-input interactions, greatly improve prediction ac-
curacy even with small training sizes. Note that we apply the Effect
Hierarchy and Heredity principles in this case. Other approaches, such
as MARS, are not competitive against any class of GaSP.
• Franke Function. The Franke function shows that joint-effect GaSPs
up to selected 2-input interactions offer a competitive prediction ac-
curacy against an Oracle GaSP, while completely fulfilling the three
effect principles: Sparsity, Heredity, and Hierarchy. Note FANOVA’s
key role in the application of the Effect Sparsity principle. As an ad-
ditional remark, CGP and MARS are not competitive at all in this
case.
The aforementioned joint-effect GaSPs are expanded by incorporating
3-input interactions and a model with up to 2-input interactions plus a
special residual component. This fact has a key role in improving prediction
accuracy against the Std GaSP for two additional case studies: the OTL
function and the Nilson-Kuusk model. We highlight the following points in
each one of these cases:
• OTL Circuit Function. PPR and MARS models are not competi-
tive at all against the Std GaSP nor any joint-effect GaSP. Our joint-
effect correlation structures, U-JE3 and E-JE2SR, have a better pre-
diction accuracy than the Std GaSP at n = 120, 240. In this case,
we can see that the inclusion of weights on the effect-level improves
prediction accuracy for U-JE3 versus E-JE3.
• Nilson-Kuusk Model. Since this is considered a slow code, cross-
validation is useful in this case study. For both correlation functions,
our joint-effect correlation structures up to selected 3-input interac-
tions manage to provide a better prediction accuracy than the Std
GaSP in terms of medians on the eN−RMSE .

89
Chapter 5

Dimensional Analysis
Our second approach to improve prediction accuracy is DA, which pays at-
tention to fundamental physical dimensions when modelling scientific and
engineering systems. As stated in Chapter 1, early works in DA were done by
Buckingham (1914), Rayleigh (1915), and Bridgman (1931). These works do
not focus on mechanical tools for modelling but provide important DA foun-
dations. A critical point to highlight is that a dimensionally homogeneous
system of a natural phenomenon is depicted by relationships between the
seven fundamental physical dimensions. Nonetheless, a given physical law
coming from this system cannot depend on specific scales of measurement
known as “units”. Statistical modelling cannot be exempt from this fact.
Still, there are examples where the figure of the “unconscious statistician”
could arise (Lee and Zidek, 2020).
Even though statisticians usually do not take DA into account when
it comes to data modelling, we will see in this chapter its great potential.
Hence, DA can be explored as an alternative strategy in model training and
testing. DA allows us to deduce the physical relationships between a given
system’s variables while shedding light on their complex dynamics. Note
that “dimension” is being used in two senses here. DA is an analysis of
measurement units, and hence “dimensionless” refers to a possibly derived
variable with no units. A consequence of DA is that it also leads to dimension
reduction, where the dimensionality d of the inputs is reduced.
Some DA-related statistical modelling can be found in the literature
(Shen et al., 2014; Shen and Lin, 2018; Shen et al., 2018). These previous
works have addressed fundamental challenges in specific DA practices (e.g.,
the choice of base quantities or additional input/output transformations).
Shen et al. (2014) provide background and basic guidelines for DA, along
with examples of physical experiments. Furthermore, the work by Shen
et al. (2018) specifically applies DA work to computer experiments using
novel designs for model training.
The non-standard correlation structures in Chapters 3 and 4 are focused
on the prediction accuracy improvement related to interpolation, i.e., the
testing set H has the same d input ranges as its corresponding training set

90
Chapter 5. Dimensional Analysis

D. Therefore, predictions are obtained with input vectors under the same
ranges used for model training. Nonetheless, the experimenter might be
interested in predictions outside of the trained input ranges. This interest
leads to extrapolation. Note that Shen et al. (2018) argue that a model
under dimensional constraints performs better with extrapolated input vec-
tors, mainly when complexity arises in response surfaces. Hence, we will
target the use of DA to improve prediction accuracy for interpolation and
extrapolation.
DA can be thought of as a data pre-processing step. It analyzes in
terms of more fundamental dimensionless quantities derived from the orig-
inal variables, and possibly designs for them too. The derivation of these
dimensionless quantities takes into account the system’s fundamental physi-
cal dimensions out the total number of seven listed in Section 1.2. With the
“right” dimensionless quantities, prediction accuracy will hopefully improve.
While the above goals make much sense for scientific applications and
statistical modelling, DA’s implementation is far from straightforward; choos-
ing the derived quantities is particularly problematic. Empirical approaches
to finding “good” derived variables in computer experiments will be de-
scribed in this chapter, and the improvements in prediction accuracy will be
demonstrated in three case studies. These choices will be implemented via
FANOVA, as in our joint-effect correlation structures with selected input
interactions from Chapter 4. Note that constructing dimensionless derived
inputs resembles feature engineering from machine learning. However, our
implementation is not automatic but manual based on science.
The paradigm of choosing the “right” dimensionless quantities is more
unmistakable in the cases described in Sections 5.3 and 5.4. Firstly, we will
compare the accuracy of interpolated predictions with a trained model using
a FANOVA-based DA to others under Non-DA and alternative DAs from
the literature (Shen et al., 2018; Tan, 2017). The FANOVA-based DA out-
performs these other approaches. Then, since DA is related to the scaling
of variables, we also illustrate sustained accuracy gains when extrapolating
substantially outside the training data. The FANOVA-based DA’s outper-
formance, when compared to the other approaches, is even more striking for
extrapolated predictions.
Overall, we illustrate how the DA can improve prediction accuracy in
a computer experiment via three cases. Moreover, the use of sensitivity
analysis tools like FANOVA is helpful in key DA steps. This chapter is
structured as follows:

• Section 5.1 summarizes two previous works where DA is applied in

91
Chapter 5. Dimensional Analysis

different fields: physiology and hydrodynamics. In hydrodynamics,


there is a specific application related to the design of experiments.

• Section 5.2 summarizes the Buckingham’s Π-theorem, which is the


basis for DA.

• Section 5.3 introduces the Borehole function as our first DA case study.
The Borehole function is commonly used in computer experiments lit-
erature. We first show that logarithmic input transformations and
a further input space expansion can improve prediction accuracy in
a non-DA framework. Then, along with the use of a FANOVA-based
DA, we make additional accuracy improvements. We also compare our
FANOVA-based DA with an alternative DA (Shen et al., 2018). More-
over, we study extrapolation, where FANOVA-based DA substantially
improves prediction accuracy.

• Section 5.4 provides our second case study with a thermodynamic


model depicting the temperature of a solid sphere immersed in a fluid.
In this case, using a FANOVA-based DA with an output transfor-
mation provides significant improvements in prediction accuracy com-
pared to a non-DA framework. We also compare our FANOVA-based
DA with an alternative DA (Tan, 2017). As in the Borehole function,
we also study extrapolation. A FANOVA-based DA and input/output
transformations and an input space expansion provide a good accuracy
improvement.

• Section 5.5 illustrates our final case study in this thesis, which is a
Storm Surge model with data coming from the Advanced Circulation
(ADCIRC) model (Luettich et al., 1992). Here GaSP is used as a
surrogate model to predict storm surges along the NC coast. This
model is an application of a full DA approach to a certain extent
to improve the prediction accuracy of a GaSP compared to a non-
DA framework. A further study shows the advantage of the novel
approaches in extrapolating a model trained only with low to moderate
intensity storms to predict surge from the highest intensity storms.

The main thrust of this thesis is the GaSP’s prediction accuracy im-
provement by either using more than one non-standard correlation structure
or different DA arrangements. FANOVA is the sensitivity analysis tool in
common between both strategies. Note that the experimenter could combine
both strategies, but at the cost of significantly increasing the number of pos-

92
5.1. Previous Applications of Dimensional Analysis

sible modelling settings. Furthermore, the previous case studies do not have
a suitable framework based on the seven fundamental physical dimensions.
Hence, unlike our previous simulation studies in Chapters 3 and 4, all
cases throughout this chapter will only involve the use of a Std GaSP. DA is
focused on strategies for choosing variables for modelling, i.e., input/output
transformations and input space expansions. Moreover, based on our pre-
vious results regarding the Std GaSP in Chapters 3 and 4, the use of the
PowExp correlation function provides better accuracy than the SqExp coun-
terpart for this class of GaSP. Therefore, we will also restrict our attention
to PowExp.

5.1 Previous Applications of Dimensional


Analysis
Finney (1977) was concerned about the lack of use of DA in statistics. How-
ever, Shen et al. (2014) introduce the following two important examples:

• Asmussen and Heebøll-Nielsen (1955) use DA in terms of anatomical


and physiological relationships between body parts and/or organ func-
tions, which are ratio-scale continuous variables (namely, covariates).

• On the other hand, Islam and Lye (2009) use DA in a hydromechanics


experiment regarding the thrust of a propeller. This use allows a useful
input reduction in a high-dimensional framework subject to factor type
dimensionless design variables.

The work by Asmussen and Heebøll-Nielsen (1955) is focused on chil-


dren of age from 7 to 16 years. It hypothesizes that, regardless of the
actual anatomical and physiological measurements in a child’s body, there
is a functional relationship between a given pair of measurements. These
relationships are based on the facts of “geometrical similarity and necessity”:

• In cases where there is a pair of measurements regarding a fundamental


physical dimension such as L, both measurements will be proportional
to each other. This fact is what the authors call “geometrical similar-
ity”, e.g., the length of legs will have a linear relationship with body
height (h).

• The fact of “geometrical necessity” will involve a power-based rela-


tionship between both measurements when their dimensional nature
requires so. For instance; cases such as the area of the lungs, whose

93
5.1. Previous Applications of Dimensional Analysis

expression as a fundamental dimension is L2 , will increase/decrease in


function of the second power of another measurement like body height
(i.e., h2 ).

Asmussen and Heebøll-Nielsen (1955) collected a sample of about 400


boys within the age mentioned above range to test their hypothesis. The
sample was subject to various anatomical and physiological measurements,
i.e., body height, sitting height, trunk length, vital capacity, maximum ex-
piratory and inspiratory forces, muscular strength, etc. Statistical analyses
were performed via a simple linear regression model in the following loga-
rithmic form:

y = a × xb
log(y) = log (a) + b log (x);

where x and y are a given pair of anatomical and/or physiological measure-


ments, whereas a and b are constants estimated via weighted least squares.
Note that the authors are not using dimensionless covariates in the logarith-
mic transformation, which might not be appropriate for a transcendental
function.
Islam and Lye (2009) extend the use of DA to an engineering problem in
hydromechanics as a design of experiments (DOE) applied to the thrust of
a propeller. They state that the inputs, i.e. the controllable variables in the
experiment, have complex relationships in the system. Thus, establishing an
analytical expression between inputs and output is not straightforward. For
this specific case, the output is the thrust coefficient KT (whose fundamental
dimensions are MLT−2 ; i.e., [KT ] = MLT−2 ). The system has 14 inputs
involving the properties of the fluid, blade and pod geometry, as well as
operating conditions. They apply DA by, for example, considering speed
of advance ([VA ] = LT−1 ), rotational speed ([rS ] = T−1 ), and size of the
propeller represented by the diameter ([D] = L) which generate the following
dimensionless input:
VA
q VA = .
rS D
As will be explained in the next section on Buckingham’s Π-theorem, the
use of DA reduces the Islam and Lye (2009) example problem to 11 inputs
while generating dimensionless design variables. Then, DOE is applied with
a two-level quarter fractional factorial design with 8 out of the 11 inputs
since the experimenter can control those variables. The results obtained
with this experimental design identify five significant factors used to develop

94
5.1. Previous Applications of Dimensional Analysis

a second-order response surface model for the output KT , with acceptable


prediction accuracy.

95
5.2. Buckingham’s Π-Theorem

5.2 Buckingham’s Π-Theorem


Early works in DA go back at least a century to the Π-theorem of Bucking-
ham (1914), which uses dimensionless variables along with a dimension re-
duction. Buckingham (1914) assumes a system of d positive inputs x1 , . . . , xd
whose respective dimensions (units) are D1 , . . . , Dd and their corresponding
output y has dimension Dy . The theorem states that an equation depicting
this system involving d inputs and output y will be a function of them plus
a given number of derived quantities, only ratio variables r0 , r00 , . . . ; i.e.,

f (y, x1 , . . . , xd , r0 , r00 , . . . ) = 0. (5.1)

An important aspect of the theorem is that all the system inputs need
to be included in equation (5.1). Thus, this equation will depict the rela-
tionships between those inputs in terms of the seven fundamental physical
dimensions already listed in Section 1.2 whose coefficients (i.e., other con-
stants and parameters) will be dimensionless. These coefficients represent
the fixed interrelations of the units used to measure the fundamental physical
dimensions. However, they do not depend on any specific system of mea-
surement (e.g., SI or Imperial System). The theorem is explained by Shen
et al. (2014), from the statistical perspective, in the form of the following
four steps:
(I) System’s Dimensional Setup. The initial step in DA requires in-
spection of the fundamental dimensions involved in the system for
inputs and output. Define f (·) as the function to be estimated:

y = f (x1 , . . . , xd ). (5.2)

Let E1 , . . . , Ep be the p ≤ 7 existing fundamental dimensions in the


system. Therefore, the d dimensions for all the inputs can be expressed
as the following products:
p
e
Y
Di = Ej i,j for i = 1, . . . , d;
j=1

as well as the corresponding output


p
e
Y
Dy = Ej 0,j .
j=1

Note we introduce p exponents ei,j for the ith input, as well as p


exponents e0,j for output y.

96
5.2. Buckingham’s Π-Theorem

(II) Determination of Base Quantities. This step can be explained


using matrix algebra (Meinsma, 2019). Let E = (ei,j ) be the dimen-
sional matrix of exponents where exponent ei,j corresponds to the ith
input dimension and the jth fundamental dimension in the system
(i.e., i = 1, . . . , d) where rank(E) = p ≤ d. Note that the output’s di-
mensional exponents (i.e., i = 0) are not included in this matrix, since
we are already taking the output’s dimensions into account with the
inputs’ exponents under the assumption of dimensional homogeneity
in the system. We reorder the rows so that the first p are linearly
independent and the remaining d − p are represented as
p
X
ek,· = dk,i ei,j for k = p + 1, . . . , d and j = 1, . . . , p;
i=1

where we have p terms dk,i that allow the kth row to be expressed in
terms of the first p rows. The rank p is the number of inputs that have
to be considered as base quantities in the system. As stated by Shen
et al. (2014), the dimensions of these base quantities have to fulfil the
following conditions:

• Representativity. The second set of d − p input dimensions


Dp+1 , . . . , Dd are the result of a combination of the first p input
dimensions D1 , . . . , Dp of the base quantities. It is important to
highlight that output dimension Dy also needs to be represented
by the dimensions of these p base quantities. Otherwise, the
dimensional homogeneity in the system will be violated.
• Independence. Any dimension of a given base quantity can-
not be expressed in combinations of the other base quantities’
dimensions.

(III) Input and Output Transformations. When p inputs x1 , . . . , xp


(whose dimensions fulfil the two conditions above) are considered as
base quantities, the remaining d−p are transformed into dimensionless
inputs as
p
−d
Y
qk = x k xi k,i for k = p + 1, . . . , d.
i=1
We also set a dimensionless output
p
−d0,j
Y
q0 = y xj ;
j=1

97
5.3. Borehole Function

where the p exponents d0,1 , . . . , d0,p have an analogous role to the p


exponents dk,1 , . . . , dk,p for the kth input.

(IV) System’s Transformed Function. The fundamental result of Buck-


ingham’s Π-theorem is that the system’s function (5.2) can be re-
expressed as
q0 = g(qp+1 , . . . , qd ).
See Meinsma (2019) for two proofs of this result, one based on the
matrix of exponents E defined above under a physical perspective and
another using a scaling matrix under a mathematical and more ab-
stract perspective.

There is a critical point in Step II that does not make it straightforward:


selecting the base quantities might not be unique in a given case. This matter
is mentioned by Shen and Lin (2018). They suggest that the experimenter’s
subject-matter expertise may provide dimensionless inputs in Step III with
specific physical meanings. However, is it possible to use any other further
statistical tool to select base quantities? As in the setup of our selective
joint-effect correlation structures in Chapter 4, one could rely on FANOVA
(see Section 4.5) in order to identify these base quantities. Each one of our
subsequent three case studies will specify how FANOVA was used for this
step.

5.3 Borehole Function


The Borehole function appears as a sample problem in works such as Wor-
ley (1987) and Morris et al. (1993). Unlike the Nilson-Kuusk model from
Section 4.11, this case is considered a fast code by Chen et al. (2016). The
main attribute of this class of codes is obtaining large testing sets H, and a
considerable number of training sets D, in the form of LHDs. This can be
done via the R package DoE.wrapper (Groemping, 2017) by providing the
appropriate input ranges. As noted by Morris et al. (1993), this test func-
tion provides a quick-assessment framework for prediction accuracy. Hence,
it is often used in computer experiments literature.
The Borehole function models the flow of water through a drilled bore-
hole from the ground surface to two aquifers. The output yb , flow rate
through the borehole in m3 year−1 (i.e., [yb ] = L3 T−1 ), is based upon eight

98
5.3. Borehole Function

input parameters as follows:


2πTu (Hu − Hl )
yb (x) = h i; (5.3)
2LTu Tu
log(r/rw ) 1 + log(r/r 2
w )r Kw
+ Tl
w

where the inputs and their respective dimensions, units, and ranges are
detailed in Table 5.1.

Input Dimensions Units Range


rw radius of borehole L m [0.05, 0.15]
r radius of influence L m [100, 50000]
transmissivity
Tu L2 T−1 m2 year−1 [63070, 115600]
of upper aquifer
potentiometric head
Hu L m [990, 1110]
of upper aquifer
transmissivity
Tl L2 T−1 m2 year−1 [63.1, 116]
of lower aquifer
potentiometric head
Hl L m [700, 820]
of lower aquifer
L length of borehole L m [1120, 1680]
hydraulic conductivity
Kw LT−1 m year−1 [9855, 12045]
of borehole

Table 5.1: Inputs, dimensions, units, and ranges for the Borehole func-
tion (5.3).

The Borehole function does not pose significant predictive challenges for
a GaSP, mostly in interpolating predictions. Nevertheless, it will exemplify
the application of a DA by following the steps detailed in Section 5.2. As we
will see, prediction accuracy is greatly improved by conducting appropriate
input transformations and an input space expansion without DA. The appli-
cation of DA, along with the previous input arrangements, allow additional
prediction accuracy improvement. Section 5.3.1 provides the FANOVA re-
sults used for the selection of base quantities in this DA.
Our simulation studies start under a baseline arrangement, i.e. as train-
ing and testing sets are provided by the DoE.wrapper package with the
original variables, then we proceed with input logarithmic transformations

99
5.3. Borehole Function

and a final logarithmic input space expansion. In terms of how we treat


our inputs for training and further testing, this progressive presentation of
results gives us a clearer perspective on the advantages of working on the
logarithmic, especially in a DA framework.

5.3.1 Dimensional Analysis


To select the base quantities for our DA, we run a FANOVA up to 3-input
interactions on 20 repeated experiments under a training size of n = 80 runs,
the PowExp correlation function, and a constant regression term. Here the
original output yb and the original inputs in Table 5.1 are used. Note that,
with a small training size n, FANOVA gives consistent results over all 20
repeat experiments.
Again, integration in FANOVA is with respect to uniform weights over
the inputs’ respective ranges; the same will be done for all examples in
this chapter. Recall that FANOVA decomposes a GaSP predictor’s total
variance into contributions from main effects, 2-input interactions, 3-input
interactions, etc. We can then obtain the percentage contribution attributed
to a main or interaction effect as an estimate of its relative importance.

Step I: System’s Dimensional Setup


According to the output and inputs previously detailed, see Table 5.1, the
system’s fundamental dimensions are length (L) and time (T) resulting in
p = 2.

Step II: Determination of Base Quantities


Since there are p = 2 fundamental dimensions in the system, we have

d−p=6

transformed dimensionless inputs by selecting two base quantities for our


DA. Therefore, in terms of FANOVA, it is necessary to check what main
effects have the largest contributions to the function’s variability and are
most important. We obtain the following results:
• Main Effects. Figure 5.1 shows the distribution of the variance per-
centages across the 20 repeated experiments corresponding to the eight
inputs in the system. Note there is a small variability in the results.
We can see that the four most important main effects involve funda-
mental dimension L: rw , Hu , Hl , and L. However, radius of borehole

100
5.3. Borehole Function

(rw ) roughly accounts for a median of 82.5%, which is the most dom-
inant main effect. In terms of dimension time T, the input hydraulic
conductivity of borehole (Kw ) is the most important main effect in-
volving this fundamental dimension with percentages around 0.95%.

90 1.0

80 0.9

0.8
70

0.7
Variance Percentage

60
0.6
50
0.5
40
0.4
30
0.3

20
0.2

10 0.1

0 0.0 ● ●

rw Hl Hu L Kw Tl r Tu

Figure 5.1: FANOVA percentage contributions of main effects for the Bore-
hole function (5.3) using a Std GaSP, with PowExp correlation function
and a constant regression term. Each boxplot shows results over 20 repeat
experiments with different mLHDs of n = 80 runs for training.

• Input Interactions. Figure 5.2 shows the distribution of the variance


percentages across the 20 repeated experiments corresponding to the
top eight 2 or 3-input interactions. The 2-input interactions rw ·Hu and
rw ·Hl account for a rough 1.2% of the variance, which makes them the
most important input interactions. Note that these interactions only
involve fundamental dimension L, but the fourth 2-input interaction
rw · Kw involves dimensions L and T

Since rw has a predominant role in FANOVA, as well as Kw to a lesser


degree, we are using them as base quantities. This DA will be implemented
as FANOVA DA. Shen et al. (2018) also use the Borehole function as one
of their examples. They select inputs potentiometric head of upper aquifer

101
5.3. Borehole Function

(Hu ) and transmissivity of upper aquifer (Tu ) as base quantities, based on


smallest range ratios. Note their base quantity selection is not determined
by input importance, but we are implementing Shen’s DA for comparison.

1.3

1.2

1.1

1.0

0.9
Variance Percentage

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

rw.Hu rw.Hl rw.L rw.Kw Hl.L Hu.L rw.Hu.L rw.Hl.L

Figure 5.2: FANOVA percentage contributions of input interactions for the


Borehole function (5.3) using a Std GaSP, with PowExp correlation function
and a constant regression term. Each boxplot shows results from 20 mLHDs
of n = 80 runs for training.

Step III: Input and Output Transformations


Based on the previous base quantity selections for both DA frameworks, we
make the following input and output transformations:
(F )
• FANOVA DA. Let q0 be our new transformed output defined as

(F ) yb (x)
q0 = 2K
, (5.4)
rw w

102
5.3. Borehole Function

(F )
and the qi dimensionless inputs (i = 1, . . . , 6) are obtained as

(F ) r (F ) Tu (F ) Hu
q1 = q2 = q3 =
rw rw Kw rw

(F ) Tl (F ) Hl (F ) L
q4 = q5 = q6 = .
rw Kw rw rw
(5.5)

(S)
• Shen’s DA. Let q0 be the new transformed output defined as

(S) yb (x)
q0 = , (5.6)
Hu Tu
(S)
and the qi dimensionless inputs (i = 1, . . . , 6) are obtained as

(S) rw (S) r (S) Tl


q1 = q2 = q3 =
Hu Hu Tu

(S) Hl (S) L (S) Kw Hu


q4 = q5 = q6 = .
Hu Hu Tu
(5.7)

Step IV: System’s Transformed Function


The transformed functions for the two DA frameworks are
(F ) (F ) (F ) (F ) (F ) (F ) (F ) 
q0 = g q1 , q2 , q3 , q4 , q5 , q6

i.e.  
yb (x) r Tu Hu Tl Hl L
2K
=g , , , , ,
rw w rw rw Kw rw rw Kw rw rw
for FANOVA DA, and
(S) (S) (S) (S) (S) (S) (S) 
q0 = h q1 , q2 , q3 , q4 , q5 , q6

i.e.  
yb (x) rw r Tl Hl L Kw Hu
=h , , , , ,
Hu Tu Hu Hu Tu Hu Hu Tu
for Shen’s DA.

103
5.3. Borehole Function

5.3.2 Simulation Settings


In all simulations, we implement two different regression components: con-
stant and linear terms. Here we try a linear-trend regression model to see
if previous arguments suggesting trend terms are unnecessary carryover to
DA. Note that all training and testing sets will be the same for all DA
approaches, regardless of the input/output arrangement in Section 5.3.1.
Subsequent sections provide further details.

Baseline Borehole
Baseline Borehole starts without any logarithmic transformations of the in-
puts. We compare the prediction accuracy of three different approaches:
Non-DA, FANOVA DA, and Shen’s DA. Table 5.2 shows the simulation set-
tings for this arrangement. The rule of thumb for the smallest training set
size is used here: n = 10d = 80 with 20 different mLHDs; similarly, the other
three training sizes. Moreover, the testing set size is fixed at N = 10, 000.

Setting Values
Non-DA: d = 8
Dimensionality
FANOVA DA and Shen’s DA: d = 6
Non-DA: x as in Table 5.1
Inputs FANOVA DA: q(F ) as in (5.5)
Shen’s DA: q(S) as in (5.7)
Non-DA: yb as in (5.3)
Outputs (F )
FANOVA DA: q0 as in (5.4)
(S)
Shen’s DA: q0 as in (5.6)
Regression Component Constant and Linear
D 20 different mLHDs with
Training Sets n = 80, 160, 320, 640 runs
H 1 rLHD with
Testing Set N = 10, 000 runs

Table 5.2: Simulation settings for Baseline Borehole.

104
5.3. Borehole Function

Log-Input Borehole
Taking logarithms of the inputs is particularly relevant for the input radius
of influence (r), which ranges over several orders of magnitude. Thus, the
baseline settings’ first variation is a logarithmic transformation on the inputs
with an untransformed output.
We compare the prediction accuracy of three approaches: Non-DA,
FANOVA DA, and Shen’s DA. A logarithmic input transformation in Non-
DA is not expected to improve its prediction accuracy as much as in a DA
setup. Recall the importance of setting up dimensionless inputs before a
logarithmic transformation, which is addressed in an example presented by
Shen et al. (2014).
We also keep all the simulation settings from Baseline Borehole, as well
as the same training and testing sets detailed in Table 5.2; except for the
inputs log(x) as in Table 5.1, log q(F ) as in (5.5) for FANOVA DA, and


log q(S) as in (5.7) for Shen’s DA.




Expanded Log-Input Borehole


Table 5.3 summarizes the simulations settings for this arrangement.

Setting Values
Non-DA: d = 64
Dimensionality
FANOVA DA and Shen’s DA: d = 36
Non-DA:
log(x) as in Table (5.1),
and all pairwise sums and differences
of the elements of log(x)
FANOVA DA:
Inputs (F )

log q as in (5.5),
and all pairwise sums and differences
of the elements of log q(F )


Shen’s
 DA:
log q (S) as in (5.7),
and all pairwise sums and differences
of the elements of log q(S)


Table 5.3: Simulation settings for Expanded Log-Input Borehole.

105
5.3. Borehole Function

In this arrangement, the input vectors are still log-transformed in the


same three approaches: Non-DA, FANOVA DA, and Shen’s DA. Note the
Expanded Log-Input Borehole resembles the Log-Input arrangement but
with a significant expansion of the number of inputs. However, when the
regression model has linear terms, the linear terms correspond to the log-
transformed inputs only. The expansion is not carried out for the regression
component.
Since we are working on the log-scale, we can rely on the logarithmic
properties for a product and a quotient in the respective forms of sums and
differences of the transformed inputs. This input expansion will yield
 
8
d=8+2 = 64 for Non-DA
2

and  
6
d=6+2 = 36 for FANOVA DA and Shen’s DA,
2
which targets all possible 2-input interactions (products and quotients) plus
the main effects on the logarithmic scale. Even though the processing times
in maximum likelihood optimization increase as the number of inputs in-
creases as well, this impact is not as large as the increase of the training set
size n.

5.3.3 Prediction Results


Figure 5.3 shows the prediction accuracy in percent (eN−RMSE × 100%) for
this case study on the log10 (n)-log10 (eN−RMSE ) scale across
n = 80, 160, 320, 640 from the three strategies (Non-DA, FANOVA DA, and
Shen’s DA), with the three possible input transformations (Baseline Bore-
hole, Log-Input Borehole, and Expanded Log-Input Borehole) in the rows
of the figure and constant versus linear regression models in the columns.
The boxplots show the variation in the assessment metric eN−RMSE across
20 repeat experiments from different mLHDs; to avoid overlap of the box-
plots for a given sample size, they are offset horizontally. The offset is cor-
rected for the lines joining the eN−RMSE sample means. Furthermore, the
x-axis depicts the corresponding training sizes n. Note that the y-axis scale
is logarithmically adjusted, so the right-hand side breaks are equally-spaced.
The first row of Figure 5.3 shows that FANOVA DA noticeably outper-
forms the other two approaches without any input transformation. Taking
account of the different y-axis scales in the figure, it is also apparent in the

106
5.3. Borehole Function

next two rows that all combinations of the three DA strategies and two lin-
ear models benefit considerably from the logarithmic transformation of the
inputs. These DA strategies benefit further from the expanded-log inputs.
FANOVA DA gains by far the most. It achieves an average eN−RMSE of
0.00006 at n = 80 and ends with 0.000002 at n = 640. Hence this method
has near-perfect accuracy even with n = 80. We also note that the inclusion
of a linear regression component does not improve prediction accuracy than
the constant case in any of our modelling strategies.
The two axes in Figure 5.3 are on logarithmic scales, and all lines join-
ing the sample means of eN−RMSE are approximately linear throughout the
range. Hence, the slope r of a line indicates the approximate rate of conver-
gence nr of eN−RMSE (or RMSE) to zero with sample size. With expanded-
log inputs, for example, all methods show similar rates of convergence: all
achieve eN−RMSE going down faster than n−1 (note this is for root MSE, not
MSE).

107
5.3. Borehole Function

Baseline Borehole
Constant Linear
1 −2

0.32 −2.5

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.03 −3.5

0.01 −4
80 160 320 640 80 160 320 640
n

Log−Input Borehole
Constant Linear
10 −1

1 −2

log10(eN−RMSE)
eN−RMSE (%)

0.1 −3

0.01 −4

0.001 −5

0.0001 −6
80 160 320 640 80 160 320 640
n

Expanded Log−Input Borehole


Constant Linear
1 −2

0.1 −3
log10(eN−RMSE)
eN−RMSE (%)

0.01 −4

0.001 −5

0.0001 −6
80 160 320 640 80 160 320 640
n

Non−DA Shen's DA FANOVA DA

Figure 5.3: Prediction accuracy in percent (eN−RMSE × 100%) versus n


on a log10 (n)-log10 (eN−RMSE ) scale by type of DA for the Borehole func-
tion (5.3), for three types of input transformation and constant versus lin-
ear regression components. Each boxplot shows results from 20 mLHDs of
n = 80, 160, 320, 640 runs for training and N = 10, 000 runs for testing. The
lines join eN−RMSE sample mean percentages.

108
5.3. Borehole Function

5.3.4 Extrapolation
As clarified at the beginning of this chapter, we now turn to extrapolation.
Shen et al. (2018) argue that a model, satisfying dimensional constraints,
performs better when input ranges are extrapolated, especially for a complex
response surface. FANOVA DA and Shen’s DA both satisfy the dimensional
constraints and are compared with Non-DA.
We obtain extrapolated predictions based on these three modelling strate-
gies with the same 20 training sets per size n. Simulation settings are as
before, and the input ranges are as in Table 5.1 for training. Two inputs
have extended ranges, however, in the testing set H for extrapolation. We
keep the same testing set size: N = 10, 000 runs. Our FANOVA results
in Figure 5.1 show that radius of borehole (rw ) and potentiometric head
of lower aquifer (Hl ) are two important inputs, and H now has with the
following extrapolated ranges for them:

rw ∈ [0.15, 0.25]
Hl ∈ [820, 880]. (5.8)

Figure 5.4 compares the extrapolated prediction accuracy in percent


(eN−RMSE × 100%) on the log10 (n)-log10 (eN−RMSE ) scale across
n = 80, 160, 320, 640 from the three strategies. Not surprisingly, for all
strategies, accuracy deteriorates slightly relative to interpolation. Again
FANOVA DA outperforms the other two approaches. In conjunction with
the expanded log-inputs it still achieves an average eN−RMSE of about 0.001
at n = 80 and 0.0001 at n = 640, i.e., high accuracy for the smallest sample
size and excellent accuracy for the largest. In contrast, the other two meth-
ods only achieve average eN−RMSE of about 0.03 at n = 80. Furthermore, the
use of a linear regression component does not improve prediction accuracy
compared to the constant case. Using the linear trends to aid extrapolation
makes little difference here.

109
5.3. Borehole Function

Baseline Borehole
Constant Linear
31.63 −0.5

10 −1

log10(eN−RMSE)
eN−RMSE (%)

3.16 −1.5

1 −2

0.32 −2.5
80 160 320 640 80 160 320 640
n

Log−Input Borehole
Constant Linear
31.63 −0.5

10 −1

log10(eN−RMSE)
eN−RMSE (%)

3.16 −1.5

1 −2

0.32 −2.5

0.1 −3

80 160 320 640 80 160 320 640


n

Expanded Log−Input Borehole


Constant Linear
10 −1

3.16 −1.5

1 −2
log10(eN−RMSE)
eN−RMSE (%)

0.32 −2.5

0.1 −3

0.03 −3.5

0.01 −4

0.003 −4.5
80 160 320 640 80 160 320 640
n

Non−DA Shen's DA FANOVA DA

Figure 5.4: Extrapolation prediction accuracy in percent (eN−RMSE × 100%)


versus n on a log10 (n)-log10 (eN−RMSE ) scale by type of DA for the Borehole
function (5.3), for three types of input transformation and constant versus
linear regression components. Each boxplot shows results from 20 mLHDs
of n = 80, 160, 320, 640 runs for training and testing is extrapolation as in
(5.8). The lines join eN−RMSE sample mean percentages.

110
5.4. Heat Transfer in a Solid Sphere

5.4 Heat Transfer in a Solid Sphere

Input Dimensions Units Range


ratio of distance from
R Dimensionless None [0.01, 1]
center and sphere radius
r radius of sphere L m [0.05, 0.2]
t time T s [1, 600]
Tm temperature of medium Θ K [240, 270]
initial sphere
temperature
∆T Θ K [50, 80]
minus temperature
of medium
convective heat
hc MT−3 Θ−1 kg s−3 K−1 [100, 160]
transfer coefficient
k thermal conductivity MLT−3 Θ−1 kg m s−3 K−1 [30, 100]
c specific heat L2 T−2 Θ−1 m2 s−2 K−1 400
ρ density ML−3 kg m−3 8000

Table 5.4: Inputs, dimensions, units, and ranges for the Solid Sphere func-
tion (5.11).

The second case study was used by Tan (2017) to illustrate DA for polyno-
mial emulation of a complex computer code. As in the case of a GaSP, a
polynomial metamodel is intended to provide quick output predictions with
training datasets composed of a small number of input runs. Nonetheless,
polynomial metamodels might be of high-degree if one wants to obtain a
good approximation of the computer code.
To make it flexible enough, when the polynomial may need to be high-
degree, Tan (2017) uses two approaches to reduce the number of terms:

• Effect Heredity Principle. Previously detailed in Section 1.1.2, this


principle is formulated for polynomial models by Peixoto (1987).

• Dimensional Analysis. By using Buckingham’s Π-theorem, the


number of basis functions to estimate in a polynomial metamodel will

111
5.4. Heat Transfer in a Solid Sphere

only be based on the d − p dimensionless inputs coming from DA (see


Step II in Section 5.2).
Tan (2017) presents an example to test his approach, which is a physical
system depicting the temperature of a solid sphere (immersed in a fluid) at a
given distance from its center (Çengel, 2003). The sphere has a higher tem-
perature than the fluid when immersed at time 0. The model assumes heat
transfer convection between the sphere and the fluid, whereas heat transfer
by conduction is assumed within the sphere. The output is temperature of
the sphere (Ts ) in K (i.e., [Ts ] = Θ) which depends on nine inputs; their
dimensions, units, and ranges are detailed in Table 5.4. Note that inputs
specific heat (c) and density (ρ) are kept constant in the system.
Tan (2017) provides the system of equations leading to the output Ts in
terms of the dimensionless output
(T ) Ts
q0 = , (5.9)
∆T
(T )
and the dimensionless inputs qi (i = 1, . . . , 5)
(T ) Tm (T ) (T ) hc r
q1 = q2 =R q3 =
∆T k

(T ) kt (T ) h2c
q4 = q5 = .
cρr2 ∆T c3 ρ2
(5.10)
(T ) (T ) (T )
Here q0 and q1 are dimensionless temperature ratios, q2 is a dimension-
(T )
less distance ratio from the center of the sphere relative to its radius, q3
(T )
is the dimensionless Biot number, q4 is the dimensionless Fourier number,
(T )
and q5 is the dimensionless convective heat transfer coefficient.
(T )
Then, q0 is given by
∞ (T ) 
(T ) (T )
X 4(sin ηi − ηi cos ηi ) −η2 q4(T ) sin ηi q2
q0 = q1 + e i (T )
; (5.11)
2ηi − sin(2ηi ) ηi q
i=1 2

where ηi is the solution of the equation


(T )
1 − ηi cot ηi = q3 for i = 1, . . . , ∞
 
with ηi ∈ (i−1)π, iπ . Note that, for numerical computations of the output,
Tan (2017) approximates the series in (5.11) with four terms.

112
5.4. Heat Transfer in a Solid Sphere

5.4.1 Dimensional Analysis


To identify another set of base quantities for DA, we run a FANOVA up
to 3-input interactions on 20 repeated experiments under a training size of
n = 70 runs, the PowExp correlation function, and a constant regression
term. Here the original output Ts and the original inputs in Table 5.4 are
used. Then, we obtain the percentage contribution attributed to a main or
interaction effect which is an estimate of its relative importance. We obtain
the following results:

35 3.5 ●

30 3.0

25 2.5
Variance Percentage

Variance Percentage

20 2.0

15 1.5

10 1.0

5 0.5

● ●

0 0.0

Tm t ∆T r R hc k r.t t.∆T r.∆T t.hc R.r r.hc R.k r.t.∆T

(a) Main effects. (b) 2 and 3-input interactions.

Figure 5.5: FANOVA percentage contributions by type of effect for the Solid
Sphere function (5.11) using a Std GaSP, with PowExp correlation function
and a constant regression term. Each boxplot shows results from 20 mLHDs
of n = 70 runs for training.

• Main Effects. Figure 5.5a shows boxplots of the percentage con-


tributions to the variance of the predictor for the seven inputs that
vary in the system. We can see that, out of the top four main effects,
temperature of medium (Tm ) and initial sphere temperature minus
temperature of medium (∆T ) involve fundamental dimension Θ with
median percentage contributions of around 30% and 17.5% respec-
tively. Furthermore, the second most important main effect is time

113
5.4. Heat Transfer in a Solid Sphere

(t), with a median close to 30%. Finally, radius of sphere (r) has a
median of 15% and involves fundamental dimension L.

• Input Interactions. Figure 5.5b shows boxplots of the percentage


contributions to the variance of the predictor for the top eight 2 or
3-input interactions. The largest, r · t, accounts for a median slightly
above 2.75%, which is consistent with important r and t main effects.
Note that ∆T is involved in the two interactions that are next largest
along with t and r.

Based on the FANOVA results we select base quantities for DA. The
system has p = 4 fundamental dimensions: Θ, L, T, and M. Therefore, we
have to choose four inputs as base quantities from the set of nine in Table 5.4.
FANOVA suggests the inputs Tm , t, r, and hc . In the Appendix B we show
the main-effect plots for the seven varying inputs in this system. In these
figures, we clearly see that R and r have a non-linear effect on the output.
The input t also shows modest non-linear behaviour, and this matter will
be dealt in the DA setup as well.
We gain insight into the behaviour of the effect of r by inspection of its
main effect plot in Figure 5.6 from FANOVA. Additionally, this figure in-
cludes the fit of a simple linear regression of the observed main effect against
1/r. The good fit suggests that Ts is approximately linear in 1/r, which will
be taken into account in the DA when considering the dimensionless output.
Thus, the DA has five dimensionless inputs:

(F ) (F ) ∆T (F ) k
q1 =R q2 = q3 =
Tm + ∆T hc r

r s
(F ) ct2 Tm (F ) 3 hc t3 Tm
q4 = q5 = . (5.12)
r2 ρr3

There are two key points we have to highlight in the input setup (5.12):
(F )
• q2 is the proportion that ∆T represents in the initial sphere temper-
ature Tm + ∆T .
(F ) (F ) (F )
• Based on our findings in Figure 5.6; q3 , q4 , and q5 contain the
input r as a reciprocal. Note that we apply a square and cubic root
(F ) (F )
transformation on q4 and q5 respectively, so that r appears as 1/r.
Furthermore, these transformations make input t linear, as suggested
in the next step for the output.

114
5.4. Heat Transfer in a Solid Sphere

315



310

305 ●
Ts

300


295

290

0.050 0.075 0.100 0.125 0.150 0.175 0.200


r

Figure 5.6: Estimated main effect (solid line) of radius of sphere (r) on tem-
perature of sphere (Ts ) from FANOVA with PowExp correlation function
and a constant regression term. The dashed lines show approximate point-
wise 95% confidence limits, and the dotted line shows the fitted values from
a simple least-squares regression of Ts on 1/r.

For the dimensionless output, it is insightful to do exploratory data anal-


ysis of the evolution of Ts over time t. Figure 5.7 plots Ts , or three dimen-
sionless quantities derived from Ts , versus t. The data plotted are from
the first training set of n = 70 runs, along with a locally estimated scat-
terplot smoothing (LOESS) regression. The first panel in Figure 5.7a, of
the untransformed (and hence not dimensionless) Ts , shows a clear down-
ward relationship as would be expected, with wide scatter uniformly over
the range of input t. Even more scatter and less trend is apparent in the
(T )
second panel in Figure 5.7b for the dimensionless output q0 in (5.9).
An alternative dimensionless output,

(F ) Ts − Tm
q0 = , (5.13)
∆T
is shown in Figure 5.7c. Its maximum value is 1 by definition, and Tm and
∆t are used as suggested by FANOVA. Again there is a downward trend with

115
5.4. Heat Transfer in a Solid Sphere

t, but overall there is less scatter than in Figures 5.7a and 5.7b, particularly
for small values of t.
Finally, Meinsma (2019) provides a much simpler example involving the
temperature over time of an object in a medium of temperature 0 (pre-
sumably K, the author does not give units). According to Newton’s law of
cooling, the rate of cooling for this object is proportional to the difference
between its temperature and the temperature of the medium at time t. Ac-
(F ) 
cording to the idealized physics, log q0 should be approximately linear
in t. The LOESS regression shown in Figure 5.7d depicts an approximately
negative linear relationship. There is still scatter (due to the other inputs
varying in the design), but it is arguably slightly less than in any of the
(F ) 
other panels. The dimensionless output will therefore be log q0 in our
DA.

116
5.4. Heat Transfer in a Solid Sphere


340 ● ●

6.0

● ●

●●
● ●

● ● ●

320 ●
● 5.5 ●
● ● ●
● ● ●
● ●
● ● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●

● ●
● 5.0 ● ● ● ●
● ● ● ●

(T)
● ● ●
● ● ●
Ts


300 ●

q0
● ● ● ●


● ●
● ●
● ●
● ● ●

● ●

● 4.5 ● ●


● ● ● ●
● ●
● ● ●
● ●
● ● ●
● ● ●
● ● ● ●
280 ● ●
●●

● ● ● ●
4.0 ●




260 3.5

0 100 200 300 400 500 600 0 100 200 300 400 500 600
t t
(T )
(a) Ts against t. (b) q0 against t.

1.1 0.4

●● ● ● ●

● ● ●
● ●


●● ●
0.9 ●
● ●
● 0.0 ●● ● ● ●
● ●


● ● ● ●
●● ● ● ●
● ●● ●
● ● ● ●
● ●
● ●● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●● ●
●● ● ●
log(q0 )


( F)

● ● ●
●●

(F)

● ● ●

−0.4 ●
q0

●● ● ●
0.7 ●


● ●
● ●
● ● ●



● ●
● ● ●

● ●


● ●

● −0.8
0.5
● ●




● ●

● −1.2
0.3
0 100 200 300 400 500 600 0 100 200 300 400 500 600
t t
(F ) (F ) 
(c) q0 against t. (d) log q0 against t.

Figure 5.7: Scatterplots of Ts or three derived dimensionless outputs against


time (t) for the Solid Sphere function (5.11) based on data from an mLHD
of n = 70 runs.

117
5.4. Heat Transfer in a Solid Sphere

5.4.2 Simulation Settings


In all simulations, we implement two different regression components: con-
stant and linear terms. All training and testing datasets will be the same
and use designs in terms of the original variables for all the DA approaches,
regardless of any given input/output transformation. We compare the pre-
diction accuracy of four different approaches: Non-DA, Tan’s DA, Full-DA,
and Expanded Full-DA. Based on the above exploratory analysis, we choose
(F ) 
log q0 as the dimensionless output for our DA.

Setting Values
Regression Component Constant and Linear
D 20 different mLHDs with
Training Sets n = 70, 140, 280, 560 runs
H 1 rLHD with
Testing Set N = 10, 000 runs

Table 5.5: General simulation settings.

Table 5.5 shows all the simulation settings for this case study. The rule
of thumb for the starting training set size is used here with the number of
non-constant inputs in the Non-DA framework, resulting in n = 70 with 20
different mLHDs for each one of the four sizes: 70, 140, 280, and 560 runs.
Moreover, the testing set size is fixed at N = 10, 000. Table 5.6 shows the
specific simulation settings for each DA approach in terms of dimensionality,
inputs, and outputs. We highlight the following points regarding these DA
approaches:

• Non-DA has only the d = 7 inputs that vary.

• Full-DA involves a logarithmic transformation on the dimensionless


(F )
output q0 along with the dimensionless inputs in (5.12).

• Following the success of the variable transformations in Expanded Log-


Input Borehole in Section 5.3, the approach Expanded Full-DA applies
a similar framework in order to determine whether prediction accuracy
is improved.

118
5.4. Heat Transfer in a Solid Sphere

Model Dimensionality Inputs Output


x (T )
Ts = q0 ∆T
Non-DA d=7 (excluding c and ρ)
as in (5.11)
as in Table 5.4
(T )
q(T ) q0
Tan’s DA
as in (5.10) as in (5.9)
d=5
(F ) 
q(F ) log q0
Full-DA
as in (5.12) as in (5.13)
log q(F )


as in (5.12),
(F ) 
Expanded and all pairwise sums log q0
d = 25
Full-DA and differences as in (5.13)
of the elements  of
log q (F )

Table 5.6: Specific simulation settings by Non-DA and DA.

119
5.4. Heat Transfer in a Solid Sphere

5.4.3 Prediction Results


Figure 5.8 compares the prediction accuracy in percent (eN−RMSE ×100%) for
this case study on the log10 (n)-log10 (eN−RMSE ) scale across
n = 70, 140, 280, 560 of the models using DA (Tan’s DA, Full-DA, and Ex-
panded Full-DA) with a model using the original variables (Non-DA) in two
panels for constant versus linear regression terms. Note that both regres-
sion components provide approximately equivalent prediction accuracy in
Non-DA. Our Full-DA with a linear regression component provides the best
prediction accuracy among all the approaches, particularly for small n.

Constant Linear
100 0

31.62 −0.5

log10(eN−RMSE)
eN−RMSE (%)

10 −1

3.16 −1.5

1 −2

0.32 −2.5
70 140 280 560 70 140 280 560
n

Non−DA Tan's DA Full−DA Expanded Full−DA

Figure 5.8: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of DA for the Solid Sphere func-
tion (5.11), with constant and linear regression components. Each boxplot
shows results from 20 mLHDs of n = 70, 140, 280, 560 runs for training and
N = 10, 000 runs for testing. The lines join eN−RMSE sample mean percent-
ages.

On the other hand, when using a constant regression term, we obtain a


better prediction accuracy for Tan’s DA and Expanded Full-DA. However,
these approaches are not competitive at all against Full-DA and even the
Non-DA. We note that Tan’s DA was not developed for GP models, but

120
5.4. Heat Transfer in a Solid Sphere

the findings here emphasize that choice of variables in DA is extremely


important.

5.4.4 Extrapolation
As in the Borehole function, the Solid Sphere function can be used for an
additional simulation study on extrapolation. We obtain extrapolated pre-
dictions with the same 20 training sets per size n for Non-DA, Tan’s DA,
Full-DA, and Expanded Full-DA. Simulation settings in Tables 5.5 and 5.6
are kept in this study except for changes to four specific input ranges on the
testing set H, as described next.
Our FANOVA results in Figures 5.5a and 5.5b show that temperature
of medium (Tm ), time (t), initial sphere temperature minus temperature of
medium (∆T ), and radius of sphere (r) are the top four main effects; and the
2-input interaction r · t has the largest variance percentage among low-order
interactions. Therefore, we obtain predictions for a new testing set H with
the following extrapolated ranges:

Tm ∈ [270, 280]
t ∈ [600, 750]
∆T ∈ [40, 50]
r ∈ [0.2, 0.25]; (5.14)

while the rest of the inputs keep their original ranges.


Figure 5.9 compares the extrapolated prediction accuracy in percent
(eN−RMSE × 100%) on the log10 (n)-log10 (eN−RMSE ) scale across
n = 70, 140, 280, 560 of the three models using DA (Tan’s DA, Full-DA, and
Expanded Full-DA) with a model using the original variables (Non-DA) in
two panels for constant versus linear regression terms. Note that both re-
gression components provide approximately equivalent prediction accuracy
in Non-DA and Expanded Full-DA. Our Expanded Full-DA provides the
best prediction accuracy among all the approaches, regardless of the regres-
sion component. On the other hand, the linear case does not improve the
prediction accuracy for Tan’s DA and Full-DA.
Compared to Interpolation in Figure 5.8, we see a clear deterioration in
prediction accuracy in the Extrapolation case except for Expanded Full-DA.
Nonetheless, specifically in the constant case, there are clear differences be-
tween the four approaches. Tan’s DA does not offer a competitive accuracy
with an average eN−RMSE of 0.24 at n = 560, while Non-DA has similarly
poor performance with an average eN−RMSE of 0.18 at the same training

121
5.4. Heat Transfer in a Solid Sphere

size. On the other hand, the Full-DA has a good prediction accuracy in
Extrapolation with a rate of convergence of 1/n0.83 , going from an average
eN−RMSE of 0.07 at n = 70 to 0.01 at n = 560. As we can see, the Expanded
Full-DA has the best prediction accuracy in Extrapolation with a rate of
convergence of 1/n0.97 going from an average eN−RMSE of 0.06 at n = 70 to
0.007 at n = 560.
Thus, both Full-DA and Expanded Full-DA have good extrapolation
accuracy with n = 70 and excellent accuracy with n = 560. Compared
to Interpolation, where Full-DA with a linear regression component is the
best strategy, the Extrapolation case is well approached with Full-DA and
Expanded Full-DA with a constant regression term.

Constant Linear
100 0

31.63 −0.5

log10(eN−RMSE)
10 −1
eN−RMSE (%)

3.16 −1.5

1 −2

0.32 −2.5

70 140 280 560 70 140 280 560


n

Non−DA Tan's DA Full−DA Expanded Full−DA

Figure 5.9: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale by type of DA for the Solid Sphere func-
tion (5.11), with a constant and linear regression components. Each boxplot
shows results from 20 mLHDs of n = 70, 140, 280, 560 runs for training and
testing is extrapolation as in (5.14). The lines join eN−RMSE sample mean
percentages.

122
5.5. Storm Surge Model

5.5 Storm Surge Model


The Borehole function introduces a full DA using FANOVA as a variable
selection tool, along with logarithmic output and input transformations and
an input expansion, that improves prediction accuracy. The Solid Sphere
function shows that a full DA, with a setup based on FANOVA and New-
ton’s law of cooling, outperforms a non-DA framework. Both cases show a
remarkably better performance in terms of extrapolation when using DA.
Can these two physical/engineering functions give us useful prediction in-
sights in more complex systems?

40.0

37.5

1
2
Latitude

3
35.0 5 4
8 7 6

10 9

32.5

30.0

−90.0 −85.0 −80.0 −75.0


Longitude

Figure 5.10: Storm surge locations on the North Carolina coast plotted
with ggmap (Kahle and Wickham, 2013).

Nowadays, there is a growing concern over the potentially catastrophic


effects of climate change. Hurricane hazards are of particular attention, es-
pecially coastal flooding. Computer codes that simulate a hurricane’s phys-
ical system have been developed to evaluate the phenomenon’s magnitude.
These codes have a predefined set of inputs and have storm surge level as an
output (at times before and after the hurricane’s landfall). Our case study
is based on storm surge data from the ADCIRC model based upon the Joint
Probability Method (JPM) to predict storm surges on the NC coast at ten
different locations, as shown in Figure 5.10. The study is mainly focused

123
5.5. Storm Surge Model

on the advantages of DA in extrapolation, even though our interpolation


results show improvements in prediction accuracy at critical locations.

5.5.1 The Joint Probability Method


The JPM was developed by Ho and Myers (1975) and Myers (1975). It uses
a small set of hurricane parameters (namely, inputs to the computer code)
to obtain the maximum storm surge of a hurricane (ηmax ). In initial JPM
studies, the inputs are:
cp = hurricane central barometric pressure
Rmax = radius to maximum wind speed from the center of the storm
vf = forward storm speed
θ = angle of storm track at the coast.
Details and examples of the use of the JPM can be found in Resio et al.
(2009).

5.5.2 Advanced Circulation Model Settings


The ADCIRC is a finite-element long-wave numerical model (Westerink
et al., 1992, 2008), which is time-dependent. Our case study’s main pur-
pose is to use a GaSP as a surrogate model to predict maximum storm
surge levels from ADCIRC. We use 324 different storm tracks (namely, sim-
ulations with different starting locations off the East Coast of the United
States and the Caribbean Sea). A track starts three days before and finishes
one day after landfall with surge output every 30 minutes.
The output storm surge level (ηi ) in m, for the ith geographic location
(i = 1, . . . , 10), is subject to the following inputs across the 324 storm tracks
at each time point t:
(t)
λ0 = longitude of the eye of storm in degrees ∈ [−82.64, −62.74]
(t)
φ0 = latitude of the eye of storm in degrees ∈ [17.38, 38.62]
cp = hurricane central barometric pressure in mb ∈ [922.8, 1012.3]
Rmax = radius to maximum wind speed in km ∈ [15.02, 131.45]
B = Holland B parameter ∈ [0.49, 1.82].
(5.15)
The Holland B parameter, introduced by Holland (1980), has a key role in
the determination of the maximum wind speed in a hurricane as noted by
Resio et al. (2009).

124
5.5. Storm Surge Model

All storm tracks have an additional input

fp = far-field pressure in mb = 1013, (5.16)

which is constant over time for all 324 storm tracks. Moreover, the fixed
coordinates in longitude and latitude for each location (λi , φi ) are used to
create derived inputs. They are:

(λ1 , φ1 ) = (−75.86◦ , 36.54◦ ) (λ2 , φ2 ) = (−75.75◦ , 36.18◦ )


(λ3 , φ3 ) = (−75.46◦ , 35.60◦ ) (λ4 , φ4 ) = (−75.53◦ , 35.22◦ )
(λ5 , φ5 ) = (−76.00◦ , 35.07◦ ) (λ6 , φ6 ) = (−76.54◦ , 34.58◦ )
(λ7 , φ7 ) = (−77.34◦ , 34.58◦ ) (λ8 , φ8 ) = (−77.80◦ , 34.53◦ )
(λ9 , φ9 ) = (−78.03◦ , 33.89◦ ) (λ10 , φ10 ) = (−78.54◦ , 33.85◦ ).
(5.17)

5.5.3 Computation of Additional Inputs


Based on the previous variables in (5.15), (5.16), and (5.17); we constuct
six additional inputs. The first one is the central pressure deficit ∆p in mb
which is computed as

∆p = fp − cp ∈ [0.7, 90.2].

This input deserves particular attention since it is considered a measure


of the storm’s intensity, which increases over time as the storm is close to
landfall. Once the storm hits land, ∆p decreases over time.
Let j and k be two time points in a single storm track, with j < k.
(j) (j)  (k) (k) 
Moreover, let λ0 , φ0 and λ0 , φ0 be the two pairs of coordinates
corresponding to the eye of the storm at those time points. We use the
Haversine formula to compute the shortest distance dj,k between these two
pairs of coordinates on a sphere:
"s  (k) #
(j)   (k) (j) 
−1 φ 0 − φ 0 (j) (k) λ 0 − λ 0
sin2 + cos φ0 cos φ0 sin2
 
dj,k = 2RE sin ,
2 2
(5.18)

where the radius of the Earth

RE = 6371 km.

125
5.5. Storm Surge Model

We obtain the forward storm speed, in m/s, with the equation (5.18) for
each one of the runs in the storm track by using the average of the speeds
travelled by the eye of storm computed from three days before landfall to
one day after. The range of this input, all over the 324 storm tracks, is

vf ∈ [2.42, 8.51].

We also consider how close a storm is at landfall to a given location.


(0) (0) 
Let λ0 , φ0 be the coordinates corresponding to the eye of the storm at
landfall, t = 0. The distance
 from the eye to the ith geographic location
with coordinates λi , φi is computed as in (5.17):
"s #
 (0)   (0) 
−1 φi − φ0 (0)  λi − λ0
sin2 φi sin2

d0,i = 2RE sin + cos φ0 cos .
2 2

Furthermore, we add a sign to this distance. The input signed distance


in km, between the ith location and the location of the eye of the storm at
landfall, is defined as:
(
(0)
d0,i if λi > λ0 ,
Di =
−d0,i otherwise.

Thus Di is positive if the ith location is at the east of the eye of the storm at
landfall and negative otherwise, which is important because it takes account
of the onshore versus offshore direction of the wind at the location. The
range of this input, over all the 324 storm tracks, is

Di ∈ [−258.82, 492.79].

We compute the storm track angles before and after landfall, θb and θa
respectively. We obtain the true bearing to a point, which is the angle mea-
sured in degrees between the vector formed by two pairs of coordinates and
the north line in clockwise direction. Both angles are signed, e.g., NE = 45◦
and NW = −45◦ . Thus, both inputs use the following pairs of coordinates:
(−3) (−3) 
• Storm track angle before landfall uses coordinates λ0 , φ0 of the
eye of the storm three days before landfall (i.e., −3), and coordinates
(0) (0) 
λ0 , φ0 at landfall (i.e., 0); with the following range across the 324
storm tracks:
θb ∈ [−34.09, 5.38].

126
5.5. Storm Surge Model

(0.02) (0.02) 
• Storm track angle after landfall uses coordinates λ0 , φ0 half
(1) (1) 
an hour after landfall (i.e., 0.02 days) and λ0 , φ0
one day after
landfall (i.e., 1); with the following range across the 324 storm tracks:

θa ∈ [−28.01, 30.09].

We also add the signed bearing angle of the ith location with respect to
the hurricane’s location at landfall in degrees, αi . We use the coordinates of
 (0) (0) 
the ith location λi , φi along with λ0 , φ0 . The input has the following
range across the 324 storm tracks:

αi ∈ [−177.05, 179.97].

5.5.4 Distribution of Maximum Storm Surge


5.5

5.0 ●


● ●
● ●

4.5 ●

● ●
● ● ●

● ● ●
● ●

● ●
4.0 ●
● ●
● ●
● ●

● ●

3.5 ●
(m)

3.0
● ●

ηmax

2.5
i




● ●


2.0 ● ●



● ●
● ● ●

1.5 ● ●



● ●
● ● ●
● ● ●
1.0 ●






● ●



● ●
● ●


● ●
● ●

● ●





0.5 ●

0.0

1 2 3 4 5 6 7 8 9 10
Location

Figure 5.11: Distribution of maximum storm surge by location.

We extract the maximum storm surges by location and storm track. Hence,
we obtain ten different datasets for the ten locations, each with 324 data
points from the 324 tracks. Figure 5.11 shows the maximum storm surge by
location, and we can see that larger surges occur at the southern locations.

127
5.5. Storm Surge Model

Locations 7 and 8 have the largest medians above 1 m with larger spreads
(locations 9 and 10 also have large spreads).
Figure 5.12 details the locations at landfall in blue for each storm track,
and we can see that the landfalls are to the south of the NC coast or in
South Carolina. We will restrict our subsequent studies to the larger-surge
locations 7, 8, 9, and 10. Appendix C provides additional results on the rest
of the locations.

38.0

37.0

1
2
36.0
Latitude

4
35.0 5

8 7 6

34.0 9
10

33.0

−82.5 −80.0 −77.5 −75.0 −72.5


Longitude

Figure 5.12: Distribution of locations at landfall in blue.

128
5.5. Storm Surge Model

5.5.5 Dimensional Analysis


The output maximum storm surge ηimax in m, for the ith geographic location,
is based upon eight inputs whose respective dimensions, units, and ranges
are detailed in Table 5.7.

Input Dimensions Units Range


radius to
Rmax maximum L km [18.18, 116.65]
wind speed
Holland B
B Dimensionless None [0.56, 1.72]
parameter
central pressure
∆p ML−1 T−2 mb [3.3, 90.2]
deficit
forward storm
vf LT−1 m s−1 [2.42, 8.51]
speed
signed distance
between the
Di L km [−258.82, 492.79]
ith location
and landfall
signed storm
θb track angle Dimensionless None [−34.09, 5.38]
before landfall
signed storm
θa track angle Dimensionless None [−28.01, 30.09]
after landfall
signed angle of
αi the ith location Dimensionless None [−177.05, 179.97]
at landfall

Table 5.7: Inputs, dimensions, units, and ranges for the Storm Surge model.

The system has p = 3 fundamental dimensions: L, M, and T. Nonethe-


less, the output ηimax only involves fundamental dimension L. A DA setup
would require three base quantities from the subset of four inputs that have
dimensions: Rmax , ∆p , vf , and Di . Unlike the Borehole and Solid Sphere
function, this system does not consider a complete set of inputs that deter-

129
5.5. Storm Surge Model

mine our output of interest given the dimensional arrangement. Hence it is


not entirely possible to obtain a full DA, as seen further.
As in our previous two case studies, we use FANOVA to identify those
main effects and input interactions that have a key role in the system. We
run a FANOVA up to 3-input interactions by location. The training data is
obtained from the set of 324 points corresponding to the maximum storm
surges per location. We then randomize n = 200 runs as a training set D and
repeat this process 25 times. We fit a GaSP with the PowExp correlation
function and a constant regression term using each training set.
FANOVA uses the eight inputs in Table 5.7. It is important to highlight
that, unlike the two previous case studies in this chapter, the 324 storm
tracks do not uniformly cover the 8-dimensional input space. We obtain the
percentage contribution attributed to a main or interaction effect, which
is an estimate of its relative importance. Figures 5.13 and 5.14 show the
FANOVA results for locations 7, 8, 9, and 10 in percentage contributions.
We have to highlight the following:
1. The input central pressure deficit (∆p ) dominates the FANOVA main
effect contributions for three locations out of four and often has sub-
stantial interaction contributions. It is, therefore, the first base quan-
tity. Moreover, the main-effect plots from FANOVA in Figure 5.15
indicate a positive linear relationship between ∆p and ηimax in the four
locations.
2. The inputs signed distance between the ith location and landfall (Di )
and radius to the maximum wind speed (Rmax ) both involve only L,
and only one can be a base quantity. Di has a much larger main
effect and interaction contributions over the four locations. Hence it
is chosen over Rmax as the second base quantity. Furthermore, Di · ∆p
stands out within the top six input interactions in the four locations.
3. Both previous selections leave forward storm speed (vf ) as the only
option for a third base quantity since the Holland B parameter is
dimensionless. However, as described next, we would not use it in our
dimensionless input setup if we condition this forward speed as a base
quantity.
Figures C.1, C.2, and C.3 show the FANOVA results in percentage con-
tributions for the remaining northern locations. They also show that Di and
∆p play a key role on the variance of the predictor, as well as the interaction
Di · ∆p in locations 3, 4, 5, and 6. Figures C.4 and C.5 show the main-effect
plots for ∆p on output ηimax for these locations as well.

130
5.5. Storm Surge Model

50 8

45 ●

7
40
6
35
Variance Percentage

5 ●

30

25 4

20 ●

3
15 ●

● 2

10
1 ●

0 0
∆p D7 vf B α7 Rmax θa θb D7.∆p D7.α7 ∆p.α7 B.D7 vf.∆p vf.D7

Main Effects Largest 2−Input Interactions


(a) Location 7.

50 6

45
5
40 ●

35
Variance Percentage

4 ●

30

25 3

20
2
15

10
1 ●

5

0 ●
0
∆p α8 B D8 vf Rmax θa θb ∆p.α8 D8.α8 D8.∆p B.α8 vf.α8 B.D8

Main Effects Largest 2−Input Interactions


(b) Location 8.

Figure 5.13: FANOVA percentage contributions of main effects and 2-input


interactions for the Storm Surge model using a Std GaSP, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training.

131
5.5. Storm Surge Model

60 16

55
14
50

45 12
Variance Percentage

40
10
35 ●

30 8

25
6
20

15 4

10
2
5 ●

0 ●
0
D9 ∆p vf B Rmax α9 θa θb D9.∆p vf.D9 B.D9 Rmax.D9 vf.∆p vf.D9.∆p

Main Effects Largest 2 and 3−Input Interactions


(a) Location 9.

50 14

45 ●

12
40 ●

35 10
Variance Percentage

30
8
25
6
20

15 4 ●

10

2
5 ●

0 ●


0
∆p α10 D10 vf B θa Rmax θb ∆p.α10 D10.∆p B.α10 vf.α10 Rmax.α10 θa.α10

Main Effects Largest 2−Input Interactions


(b) Location 10.

Figure 5.14: FANOVA percentage contributions of main effects and higher-


order input interactions for the Storm Surge model using a Std GaSP, with
PowExp correlation function and a constant regression term, with 25 ran-
domized datasets of n = 200 runs for training.

132
5.5. Storm Surge Model

3
3




2 ● ●

● 2 ●


ηmax

ηmax

7

8





1 ● 1 ●

● ●

● ●

0 0
25 50 75 25 50 75
∆p ∆p

(a) Location 7. (b) Location 8.

2.0 3

1.5 ●



2 ●

● ●
1.0
ηmax

ηmax


10


9




0.5 ● 1




0.0
0
25 50 75 25 50 75
∆p ∆p

(c) Location 9. (d) Location 10.

Figure 5.15: Estimated main effect (red) of central pressure deficit (∆p )
on maximum storm surge (ηimax ) from FANOVA, using a Std GaSP with
PowExp correlation function and a constant regression term with 95% con-
fidence limits (green). One randomized dataset of n = 200 runs is used for
training the Std GaSP.

133
5.5. Storm Surge Model

The map in Figure 5.16 shows the 324 storm tracks over time coloured
by the levels of ∆p , along with landfall locations in blue. We can identify
two subsets of locations:

• The northern locations 1, 2, 3, and 4 are unusual. The storm tracks


and their landfalls are to the south and west of them; those that get
close to them have had to change direction near land, and ∆p has
decreased. These locations have the smallest median maximum storm
surges, as shown in Figure 5.11.

• On the other hand, the remaining six southern locations are within the
landfall coastal region associated with higher values of ∆p . They show
a larger median maximum storm surges across the 324 storm tracks.

Figure 5.16: Spatial distribution of the 324 storm tracks along with pressure
deficit (∆p ) levels.

134
5.5. Storm Surge Model

The facts above plus the importance of Di and ∆p identified by FANOVA,


with their corresponding interaction, sheds light on the potential benefits of
a DA approach for a surrogate GaSP’s prediction accuracy. One of these
potential benefits is the use of ∆p as a scaling factor on the output ηimax to
adjust for storm intensity, as described next.
Nonetheless, DA will only be possible in this case study with certain limi-
tations, as pointed out in the previous selection of base quantities. Moreover,
four out of the eight inputs are already dimensionless and cannot be used
as base quantities since they do not fulfil the property of representativity
(Step II in Section 5.2).
Irish et al. (2009) make important remarks on the physical scaling laws
for the storm surge at any given location. Their simulated results on the
Texas coastline suggest that a storm surge at a location of interest is scaled
according to storm intensity (i.e., ∆p ) along with its proximity to the eye of
storm at landfall (i.e., Di ). Furthermore, they suggest a general dimension-
less storm surge ηi0 as follows:
γηi
ηi0 = + mxi ∆p ;
∆p

where γ is the specific weight of water and mxi is a constant for the ith
location obtained from linear regression.
In our subsequent DA approach we disregard γ (whose fundamental di-
mensions are ML−2 T−2 ) and mxi , since they are taken as constants. Thus,
by using ∆p as our first base quantity, the maximum storm surge in the DA
approach suggested by FANOVA will be:
ηimax
ηimax = . (5.19)
(F )
∆p

It is dimensionless when the constant γ is taken into account.


Following our FANOVA results, Di is the second base quantity, which
yields the following dimensionless input:
Rmax
Rmax(F ) = . (5.20)
| Di |

We are using the absolute value of this base quantity i.e., the distance with-
out a sign as typically defined, since the inputs need to be considered as
positive by the Buckingham’s Π-theorem.
The use of ηimax
(F )
as a dimensionless output along with five dimensionless
inputs (Rmax(F ) , B, θb , θa , and αi ) would suffice in an usual DA setup. The

135
5.5. Storm Surge Model

inputs θb , θa , and αi are dimensionless since they are angles. They can take
on negative values, even though neither of them is used as a base quantity.
Note that the forward storm speed vf was initially considered as the third
base quantity, and it is not used in this DA arrangement. However, given
the system’s incompleteness, we have to include it as another dimensionless
input: s
vf2
vf(F ) = , (5.21)
| Di | g
which is adjusted by the constant acceleration due to gravity g = 9.81m/s2
on the surface of Earth.
We encounter an analogous matter to vf with the base quantity ∆p , in
terms of its lack of use in the set of dimensionless inputs. Thus, we add
an input depicting a ratio between the two base quantities ∆p and Di as
follows:
∆p
with ∆p(F ) = ML−2 T−2 ,
 
∆p(F ) = (5.22)
| Di |
which can also be adjusted with the constant γ to make it dimensionless.
The inclusion of this input is necessary for our DA in order to be competitive
against a non-DA approach.
It is essential to highlight that all base quantities appear among the
dimensionless inputs in our two previous case studies, and ∆p along with vf
would not appear unless we include ∆p(F ) and vf(F ) respectively.

5.5.6 Simulation Settings


We implement a constant regression term in the GaSP for all simulations.
All training and testing sets will be the same for both models, regardless of
any given input/output transformation. We obtain the training and testing
data from the set of 324 points corresponding to the maximum storm surges
per location. We randomized n = 200 runs as a training set D. The remain-
ing N = 124 runs are left as a testing set H, and we repeat this process 25
times. Thus we calculate the N-RMSE based on 25 different testing sets,
which accounts for variability in prediction accuracy.
Table 5.8 shows the specific simulation settings for Non-DA and DA with
respect to dimensionality, inputs, and outputs. Each approach will have four
possible variable transformations:

• Baseline: No output or input transformations.

136
5.5. Storm Surge Model

• Log-Output: Logarithmic transformation on both outputs (ηimax and


ηimax
(F )
).

• Log-Input. Logarithmic transformations on Rmax , B, ∆p , and vf


for Non-DA; and Rmax(F ) , B, ∆p(F ) , and vf(F ) for DA. Note that a
logarithmic transformation is not done on Di (for Non-DA), θb , θa ,
and αi since they could take on negative values.

• Full-Log: Logarithmic transformations on both outputs as in Log-


Output along with the respective inputs as in Log-Input.

Model Dimensionality Inputs Output


Rmax , B, ∆p , vf , Di ,
Non-DA d=8 θb , θa , and αi ηimax
as in Table 5.7
Rmax(F ) as in (5.20),
vf(F ) as in (5.21), and ηimax
DA d=7 (F )
∆p(F ) as in (5.22) along as in (5.19)
with B, θb , θa , and αi

Table 5.8: Specific simulation settings by Non-DA and DA.

137
5.5. Storm Surge Model

5.5.7 Prediction Results


Figures 5.18 and 5.19 show the prediction accuracy for the southern locations
7, 8, 9, and 10. Figures C.6, C.7, and C.8 exhibit the results for the northern
locations. We compare the eN−RMSE s (2.14), in their original range of 0 to
roughly 1, on the y-axis with side-by-side boxplots for each one of the four
arrangements on the x-axis: Baseline, Log-Output, Log-Input, and Full-Log.
The plots are horizontally faceted by approach: Non-DA and DA. Unlike
our previous case studies in this chapter, there is only one training size
n = 200 with a testing size of N = 124. We also indicate an accuracy
threshold band on the y-axis between 0.10 and 0.20 with orange dotted
lines and a solid line on 0.15, for reference purposes. In particular, we note
the following findings:
• Location 7. The Baseline shows the smallest median (0.163) for Non-
DA with IQR of 0.024. On the other hand, DA with Log-Input has a
median of 0.147 with IQR of 0.013. Note the reduction of about 9.82%
in the median by using DA.
• Location 8. In terms of the Log-Input and Full-Log, the Non-DA
shows medians of 0.153 and 0.124 with IQRs of 0.014 and 0.028, re-
spectively. In DA’s case, the Log-Input and Full-Log show a median
of 0.127 with IQRs of 0.019 and 0.02, respectively.
• Location 9. For this location, in particular, the DA does not outper-
form the Non-DA, whose Baseline has a median of 0.14 and IQR of
0.022. In DA’s case, the best approach is Log-Input with median and
IQR of 0.179 and 0.042, respectively.
• Location 10. In terms of the Log-Input and Full-Log, the Non-
DA shows medians of 0.172 and 0.165 with IQRs of 0.049 and 0.036,
respectively. In DA’s case, the Log-Input and Full-Log show similar
medians of 0.172 and 0.176 with IQRs of 0.038 and 0.04, respectively.
The relative performances of Non-DA and DA vary depending on the
location of interest. Overall, with Log-Input, DA has a better performance
in terms of median eN−RMSE and spread across our 25 repeated experiments
in locations 7 and 8. However, we see different behaviour in location 9,
where Non-DA outperforms DA. A similar situation occurs in location 10
but to a lesser degree.
The different behaviours can be explained by the geographic distribution
of the storm tracks and points of interest, which is also present in the re-
maining six northern locations in Appendix C. As shown in Figure 5.16, the

138
5.5. Storm Surge Model

324 storm tracks used to train the GaSP models are not uniformly around all
ten locations. Additionally in Figure 5.12, the tracks’ landfalls are centred
between locations 5 and 10. We would expect this accuracy variability by
location since most of the inputs are completely associated with the track
(except for Di and αi ).
We also encounter local matters in certain locations behind this accuracy
variability in terms of topology and bathymetry. A detailed map of locations
9 and 10 is shown in Figure 5.17. The map shows an encounter of a river and
the ocean for each location: the Cape Fear River in location 9 and the Little
River in location 10. Recall this is a complex system that is incomplete
from the dimensional point of view. These limitations have an impact on
how much we can improve the GaSP’s accuracy in this study. However, this
issue is of lesser degree for our extrapolation study in Section 5.5.8.

34.1

34.0
Latitude

33.9
9

10

33.8

33.7

−78.6 −78.4 −78.2 −78.0 −77.8


Longitude

Figure 5.17: Storm surge locations 9 and 10 on the North Carolina coast.

139
5.5. Storm Surge Model

Non−DA DA
0.300 ●

0.275

0.250

0.225 ●
eN−RMSE


0.200

0.175

0.150

0.125 ●

0.100

Baseline Log−Output Log−Input Full−Log

(a) Location 7.

Non−DA DA
0.225

0.200

● ●

0.175 ●
eN−RMSE

0.150

0.125

0.100

Baseline Log−Output Log−Input Full−Log

(b) Location 8.

Figure 5.18: Prediction accuracy for the Storm Surge model, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training and N = 124 runs for testing.

140
5.5. Storm Surge Model

Non−DA DA

0.300

0.275

0.250 ●


0.225 ●
eN−RMSE

0.200 ●

0.175

0.150

0.125


0.100 ●

Baseline Log−Output Log−Input Full−Log

(a) Location 9.

Non−DA DA
0.40

0.35

0.30

eN−RMSE

0.25

0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(b) Location 10.

Figure 5.19: Prediction accuracy for the Storm Surge model, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training and N = 124 runs for testing.

141
5.5. Storm Surge Model

5.5.8 Extrapolation
The implementation of an extrapolation study has to be conducted differ-
ently. New data are not available to test prediction accuracy over extended
input ranges. Instead, within the fixed dataset, we define training data and
extrapolated test data based on an identified key input coming from our
previous FANOVA results: the central pressure deficit, ∆p . Subject-matter
expertise (Irish et al., 2009) indicates that this input is a measure of the
storm intensity, and FANOVA shows that it is one of the most important
main effects for the four southern locations.
The input ∆p is the criterion used for defining the extrapolated runs
(among the 324 existing data points for each location). To obtain our testing
set H, we sort the tracks in descending order by ∆p . Then, the testing set
H corresponds to tracks with values in the first 20% while the rest is used as
the training set D. Thus this is a severe test, extrapolating to more extreme
hurricane events. We fit a single GaSP with the training set D and obtain
predictions with the testing set H. We use the Log-Input strategy for both
Non-DA and DA.
Figure 5.20 illustrates the prediction accuracy results. The y-axis shows
the difference between each prediction of the maximum storm surge and the
model’s output (η̂imax − ηimax ) in the testing set H, where negative values
indicate an under-prediction of flood hazard. Unlike previous plots, good
prediction accuracy is determined by values around zero and avoiding large
negative errors. We indicate an accuracy threshold band on the y-axis be-
tween -0.10 and 0.10 (i.e., 10 cm error in predicted surge) with orange dotted
lines and a solid line on 0, for reference purposes.
Table 5.9 provides the summary statistics of the prediction errors of
flood hazard (in m) from Non-DA and DA for the southern locations. We
can summarize the results in the following points:
• DA clearly outperforms Non-DA in locations 7, 8, and 10.
• Location 7. The difference between Non-DA and DA is clear, where
we can see than Non-DA tends to underpredict storm surges when
compared to DA. The median of DA’s prediction errors is -0.01, which
is closer to zero than the one corresponding to Non-DA, -0.13. In
terms of spread, the IQR is smaller for DA (0.25) than for Non-DA
(0.34).
• Location 8. The medians behave like those for location 7: DA has
a better median (0.02) than Non-DA (-0.09). The IQR corresponding
to DA is slightly smaller than with Non-DA, 0.23 versus 0.27.

142
5.5. Storm Surge Model

• Location 9. For this location, we do not find a large difference in


the medians. However, there is a clear difference in the means: -0.02
in DA versus -0.08 with Non-DA. The mean for non-DA is adversely
affected by the more frequent and more extreme under-predictions of
surge, as much as about 0.75 m.

• Location 10. We can see that DA tends to overpredict flood hazards


with a median of 0.04, whereas that Non-DA underpredicts with a
median of -0.03. The spread in DA is smaller than in Non-DA with
IQRs of 0.14 versus 0.19, respectively.

Location 7
Approach Min Q1 Median Mean Q3 Max IQR
Non-DA -0.69 -0.31 -0.13 -0.13 0.04 0.39 0.34
DA -0.45 -0.12 -0.01 -0.01 0.13 0.48 0.25
Location 8
Approach Min Q1 Median Mean Q3 Max IQR
Non-DA -0.70 -0.24 -0.09 -0.08 0.04 0.40 0.27
DA -0.36 -0.09 0.02 0.02 0.14 0.41 0.23
Location 9
Approach Min Q1 Median Mean Q3 Max IQR
Non-DA -0.75 -0.16 0.00 -0.08 0.06 0.31 0.21
DA -0.57 -0.13 -0.01 -0.02 0.06 0.71 0.19
Location 10
Approach Min Q1 Median Mean Q3 Max IQR
Non-DA -0.69 -0.14 -0.03 -0.07 0.05 0.48 0.19
DA -0.30 -0.03 0.04 0.06 0.12 0.66 0.14

Table 5.9: Summary statistics of the prediction errors (η̂imax −ηimax ) of flood
hazard (in m) by Non-DA and DA for the Storm Surge model, with PowExp
correlation function and a constant regression term, with the top 20% runs
in ∆p for testing and the remainder for training.

143
5.5. Storm Surge Model

0.50 0.50

0.25 0.25
^ max − ηmax

^ max − ηmax
0.00 0.00
7

8
7

8
−0.25 −0.25
η

η
−0.50 −0.50

−0.75 −0.75

Non−DA DA Non−DA DA

(a) Location 7. (b) Location 8.

0.8 0.8

0.4 0.4 ●



^ max − ηmax

^ max − ηmax
10
9

0.0 0.0
10
9
η



−0.4 ●
−0.4


● ●


● ●


−0.8 −0.8

Non−DA DA Non−DA DA

(c) Location 9. (d) Location 10.

Figure 5.20: Prediction error of flood hazard (in m) for the Storm Surge
model, with PowExp correlation function and a constant regression term,
with the top 20% runs in ∆p for testing and the remainder for training.

144
5.6. Concluding Remarks

5.6 Concluding Remarks


Implementing a DA approach is an interesting way to improve the predic-
tion accuracy of a GaSP. The Borehole function is an illustrative starting
step in exploring the benefits of DA. We can apply a full DA, along with
input transformations and a further input expansion, to obtain practically
perfect interpolated predictions. Note the key role of FANOVA as a tool for
selecting our base quantities, which shows its advantages when comparing
our approach to an alternative DA setup (Shen et al., 2018). Moreover, our
additional study on extrapolation in a testing set shows the practical advan-
tages of the FANOVA-based DA in terms of prediction accuracy. Whereas
accuracy is good for all methods when interpolating, it is much poorer for
extrapolation when not using DA. While the Borehole function’s modelling
strategies can be expanded to output transformations with or without DA’s
application, we did not find any prediction accuracy improvements other
than logarithmically transforming the inputs.
The Solid Sphere function is not as straightforward as the case study
above. The system has inputs that are not fully taken into account in a
GaSP, since they are kept constant by design. Even though FANOVA per-
centage contributions offer valuable insights for selecting the base quantities
in our DA strategy, we also have to rely on scientific laws to deliver a proper
variable arrangement. Regarding interpolated predictions, FANOVA-based
DA proves to have a better prediction accuracy when compared to other
approaches: non-DA and an alternative DA (Tan, 2017), particularly for
a small training size n. The extrapolation on a testing set shows that the
FANOVA-based DA performs better than the rest of the approaches, espe-
cially with logarithmic transformations with an expanded input space.
The Storm Surge case is a complex system, and of course, there are more
inputs involved that are not taken into account (e.g., the bathymetry of each
one of the locations) beyond those available for modelling. We manage to
use DA, again with the help of FANOVA, to adjust the output with a mea-
surement of storm intensity among the inputs to the computer code. Recall
that we are focusing our attention on those southern NC points with higher
storm surges. Our implementation of a DA improves the prediction accuracy
of the surrogate GaSP while reducing the variability in results over the re-
peated experiments. Additionally, extrapolated testing sets in the southern
locations with the highest storm surges provide accurate predictions with
the DA approach. Of particular scientific note, DA produces fewer extreme
under-predictions of extrapolated storm hazard over the four locations.
Another advantage is the shorter computing times of training a surrogate

145
5.6. Concluding Remarks

GaSP compared to running slow codes (defined in Section 4.11) to gener-


ate further predictions. Of course, the experimenter could overcome the
limitations associated with slow codes. It will then be possible to generate
new computer runs as many times as necessary under different input values
leaving the surrogate GaSP behind. However, even if we run a deterministic
code subject to a whole range of input settings, there will always be a degree
of uncertainty due to other non-considered input vectors. The GaSP (2.1)
addresses this uncertainty in its random function. Finally, DA is meant
to generate dimensionless derived quantities that aim to capture the rela-
tionships of the system’s fundamental physical dimensions. These derived
quantities improve the GaSP’s prediction accuracy, even when extrapolating
outside of the training data, in our three case studies.

146
Chapter 6

Final Remarks and Future


Work
What modelling strategies can we implement when a computer code is hard
to predict? This is a fundamental question in any given computer exper-
iment where prediction is the main objective, or is a key component of
another objective, when using a GaSP as a surrogate model. As we already
pointed out, a computer experiment may encounter high-dimensional input,
highly non-linear effects, and crucial input interactions. The main objec-
tive of this thesis is focused on the improvement of the GaSP prediction
accuracy. We do not follow a single strategy to achieve this, but two:
• The implementation of non-standard GaSP correlation structures in
Chapters 3 and 4 by following a statistical approach.
• The use of DA for input/output transformations in Chapter 5, which
uses both statistical and subject-matter scientific approaches.

6.1 Summary of the Thesis


There is a common thread among both modelling strategies: FANOVA.
This sensitivity analysis tool proves to be useful in most of our eight case
studies. Note that the two strategies use the FANOVA results in different
ways, so we implement a novel and flexible use of this tool throughout this
thesis. The selection of the key inputs in a system is critical. By taking
into account their importance, we manage to translate this insight into a
significant improvement of prediction accuracy.
The principles coming from traditional factorial experiments, listed in
Chapter 1, prove to be useful in the test functions found in Chapter 3
(Michalewicz function) and Chapter 4 (Franke and Friedman functions).
We start out with additive correlation structures based on main effects only,
RE−ME (x, x0 ) (3.3) and RU−ME (x, x0 ) (3.4), in order to address a quite par-
ticular case: the Michalewicz function. Then, we introduce joint-effect cor-
relation structures, which are the sum of main effects and low-order input

147
6.1. Summary of the Thesis

interactions. These correlation arrangements align with common practice


in traditional factorial experiments literature. Therefore, we establish an
interesting link between computer and physical experiments.
In terms of our non-standard correlation structures, it is worth mention-
ing that we are assuming limited interaction effects while achieving com-
petitive prediction accuracy, even with a significant number of inputs and
non-linear effects. The above test functions provide an adequate framework
for assessing main and joint-effect correlation structures. These cases pro-
vide valuable insights on how to handle a given non-standard structure in
the following ways:
• The challenging Michalewicz function shows that main-effect correla-
tion structures outperform the Std GaSP and other complex models
like the CGP (Ba and Joseph, 2012). We partially apply the Effect
Hierarchy principle by using main effects only here.
• The Friedman function shows that joint-effect correlation structures
up to all 2-input interactions outperform the Std GaSP and the MARS
model (Friedman, 1991). We apply the Effect Hierarchy and Heredity
principles in this case.
• The Franke function shows that joint-effect correlation structures up
to selected 2-input interactions offer a competitive prediction accuracy
against an Oracle GaSP, while completely fulfilling the three effect
principles and outperforming the Std GaSP. Note FANOVA’s key role
here in the application of the Effect Sparsity principle. These joint-
effect structures also outperform other approaches, such as CGP and
MARS.
The extension of the non-standard correlation structures to further spe-
cific applications is done in the second half of Chapter 4. We introduce two
additional joint-effect correlation structures up to 2-input interactions with
a residual effect term to address the OTL circuit function. Furthermore,
we introduce joint-effect correlation structures up to all 3-input interactions
in this case. All these structures outperform the Std GaSP and other ap-
proaches, such as MARS and PPR (Friedman and Stuetzle, 1981). Our
final case in this chapter is a more complex computer code with limited
data points for training and testing: the Nilson-Kuusk model. We man-
age to improve prediction accuracy versus the Std GaSP with joint-effect
correlation structures up to selected 3-input interactions.
Recall that most of the GaSP approaches implement SqExp and Pow-
Exp correlation functions in model training and testing. The use of the

148
6.1. Summary of the Thesis

PowExp function provides accuracy improvements, mostly in the Std GaSP.


However, the non-standard structures still outperform this approach while
not showing significant differences between both correlation functions in the
above case studies. Note the trade-off we have in the PowExp case since we
have to estimate a higher number of model parameters than with SqExp.
This matter significantly increases processing times in the optimization pro-
cedures when using PowExp.
Chapter 5 details the second modelling strategy addressed in this thesis:
DA. The implementation of this strategy is quite different compared to the
non-standard correlation structures. Again, FANOVA plays a key role in
selecting base quantities used in the input/output transformations, always
based on an initial Std GaSP with a PowExp correlation function. Our three
case studies in this chapter have the following overall highlights:

• The Borehole function provides the ideal framework for applying the
Buckingham’s Π-theorem, where the dimensional system is complete
since it depicts all the inputs involved in the determination of the
output. Therefore, the application of a full DA and logarithmic input
transformations with further input expansion provides highly accurate
interpolated and extrapolated predictions (compared to a non-DA ap-
proach and another DA found in the literature).

• While we implement a full DA with the Solid Sphere function as in


the case of the Borehole function, the estimated main effects on the
output coming from FANOVA allow us to set up further useful trans-
formations on our dimensionless inputs. Moreover, the definition of
our dimensionless output uses subject-matter scientific laws such as
Newton’s law of cooling. Our full DA provides a better prediction
accuracy against a non-DA approach and another DA found in the
literature, especially in extrapolation.

• Finally, the Storm Surge model fully exploits the use of a DA approach.
As mentioned above, FANOVA is important in identifying those in-
puts considered as base quantities in the system. This example is of
particular attention since it is an extremely complex case study, where
the experimenter does not know all the inputs involved in the process.
Thus, the DA is considered full in the sense that we manage to obtain
dimensionless positive inputs when possible. Recall there are three
bearing angles that are already dimensionless, but they can take on
negative values. Of most practical importance, our DA performs best

149
6.2. Discussion and Future Work

in extrapolated predictions for the most intense storms compared to a


non-DA approach.

6.2 Discussion and Future Work


Before introducing a method called Conditional Main Effect (CME) anal-
ysis for de-aliasing main effects in fractional factorial designs, Wu (2015)
provides detailed historical background on the three effect principles used
in Chapter 4. The author emphasizes the advantages of using these prin-
ciples in physical experiments. Nonetheless, assuming that the relationship
between the output and inputs can be quite complex in a computer code,
Wu (2015) states that “effect hierarchy and heredity principles may be too
simplistic to be useful” in computer experiments. He suggests that new
principles might be formulated as guidelines in this class of experiments.
Unlike Wu (2015), we do not consider the use of Effect Hierarchy and
Heredity Principles as “too simplistic to be useful” in computer experiments
but as a fair starting point to formulate a whole variety of non-standard cor-
relation structures. The introduction of a joint-effect correlation structure
with a residual effect term, RE−JE2R (x, x0 ) (4.17), is just a single example
of this whole variety. The flexibility of these structures is further exploited
with the RE−JE2SR (x, x0 ) (4.18) that incorporates the use of FANOVA in the
selection of the most important input interactions, while ultimately fulfilling
the three effect principles: Sparsity, Heredity, and Hierarchy.
Recent work by Lin and Joseph (2019) introduces a correlation struc-
ture that is exactly RU−ME (x, x0 ) (3.4), with the SqExp function only. Their
model is called Transformed Additive Gaussian (TAG) Process. An impor-
tant characteristic is to model a transformed output, in an additive form,
with a parametric transformation for non-negative data: the Box-Cox trans-
formation (Box and Cox, 1964). As it was stated with previous works related
to changes in the correlation structure proposed by Duvenaud et al. (2011)
and Kandasamy et al. (2015), the TAG Process proposes a structure that is
not as flexible as ours. They only work on a main effects basis, whereas we
allow incorporation of input interactions and a residual component.
Lin and Joseph (2019) also propose an alternative to the TAG Process,
when the output is not fully additive: the Transformed Approximately Ad-
ditive Gaussian (TAAG) Process, which has similar CGP foundations. Its
fitting algorithm starts as TAG. Those model estimates are used as initial
values for a Std GaSP optimization, and TAAG uses the standard estimates
in a one-dimensional optimization. The authors address two of our test func-

150
6.2. Discussion and Future Work

tions in Chapter 4 with the TAAG Process: the 2-dimensional Franke (4.12)
and the OTL circuit (4.19) functions using a single replicate of training size
n = 10d.
In order to compare the performance of our non-standard correlation
functions against the TAG or TAAG Processes found in the R package TAG
(Lin and Joseph, 2020), each one of the case studies in Chapters 3 and 4 was
checked under the same training and testing conditions used for the other
GaSP approaches. As done by the authors, the starting Std GaSP in the
TAAG Process algorithm was obtained with the R package DiceKriging
(Roustant et al., 2012). We obtained the following results:

• Given the form of the Michalewicz function, in both original and


weighted forms, a TAG Process would suffice with its outputs initially
transformed to be non-negative. However, this model is no better than
the Std GaSP and CGP.

• For the Friedman and the 8-dimensional Franke (4.13) functions, a


TAAG Process does not provide better prediction accuracy than the
Std GaSP and CGP.

Approach Min Q1 Median Mean Q3 Max IQR


Std 0.0008 0.0009 0.0009 0.0009 0.0010 0.0010 0.0001
E-JE2SR 0.0005 0.0006 0.0006 0.0006 0.0007 0.0008 0.0001
U-JE3 0.0005 0.0006 0.0007 0.0007 0.0008 0.0009 0.0002
TAAG 0.0044 0.0047 0.0050 0.0049 0.0051 0.0056 0.0004

Table 6.1: Summary statistics of eN−RMSE s (2.14) by type of GaSP for the
OTL circuit function (4.19). Each figure summarizes the results from 20
mLHDs of n = 240 runs for training and N = 10, 000 runs for testing. Std
GaSP is implemented with PowExp. TAAG, E-JE2SR, and U-JE3 GaSPs
are implemented with SqExp.

• In the case of the OTL circuit function, Table 6.1 provides the sum-
mary statistics of the eN−RMSE s (2.14) by GaSP approach at n = 240
from our 20 training designs. We can see that the median corre-
sponding to the Std GaSP is 82% smaller than the one obtained with
TAAG. Moreover, Figure 6.1 illustrates the prediction accuracy in per-
cent (eN−RMSE ×100%) on the log10 (n)-log10 (eN−RMSE ) scale of TAAG
across the training sizes n = 60, 120, 240 and all the values are above
the rest of the GaSP approaches.

151
6.2. Discussion and Future Work

• The Nilson-Kuusk model also shows that a TAAG Process has a poor
prediction accuracy compared to the rest of the GaSP approaches.

10 −1

3.16 −1.5

log10(eN−RMSE)
eN−RMSE (%)

1 −2

0.32 −2.5

0.1 −3

0.03 −3.5
60 120 240
n

TAAG Std E−JE2SR U−JE3

Figure 6.1: Prediction accuracy in percent (eN−RMSE × 100%) versus n on


a log10 (n)-log10 (eN−RMSE ) scale for the OTL circuit function (4.19). Each
boxplot shows results from 20 mLHDs of n = 60, 120, 240 runs for training
and N = 10, 000 runs for testing on the log10 (n)-log10 (eN−RMSE ) scale. Std
GaSP is implemented with PowExp. TAAG, E-JE2SR, and U-JE3 GaSPs
are implemented with SqExp.

The assumption of stationarity in a GaSP has been an ongoing debate


in the literature. One could argue that using a non-constant regression
component might be useful when this assumption is not entirely fulfilled.
We test the use of a non-constant component for two cases in Chapter 5,
where there was only a significant improvement in the Solid Sphere function,
specifically in our DA arrangement for interpolated predictions.
Recall Section 2.2 where it is stated that Sacks et al. (1989) mention
that any departure from the regression component µ(x) in model (2.1) is
absorbed by the random function Z(x). On the other hand, many works in
the literature have tried to deal with non-stationarity in different ways (Lin
and Joseph, 2019; Ba and Joseph, 2012). However, these works were not
effective in our case studies.

152
6.2. Discussion and Future Work

Finally, the use of DA in computer experiments opens up an interesting


field of potential research efforts targeted to improve prediction accuracy.
Shen et al. (2018) pointed out the advantages of DA, though the selection of
key inputs to be used as base quantities is a matter that has to be worked out.
FANOVA is useful for achieving this goal, but one could explore additional
sensitivity analysis tools.
An important point has been addressed in Chapter 5 across the three case
studies: extrapolation. Chen et al. (2016) point out that extrapolation is
necessary when further computer runs cannot be obtained with new input
ranges. Thus, a GaSP fitted with interpolated runs has to be used as a
surrogate model. Moreover, they raise the concern on how to extrapolate.
As a first alternative, they suggest using different regression components.
This practice also poses some risks if one does not find the “right” terms.
On the other hand, they consider that looking at the main effects in a given
case might provide better guidelines.
The literature offers some other additional works where extrapolation is
addressed in a GaSP:

• Wilson and Adams (2013) introduce new base kernels, different from
the SqExp case, that allow pattern discovery. These kernels can ex-
trapolate accurate predictions in time series.

• Duvenaud (2014) explores the construction of covariance structures


with the addition and multiplication of different base kernels in a
greedy selection process: the automatic Bayesian covariance discov-
ery (ABCD). Model comparison is made via the Bayesian information
criterion (BIC). The author claims that ABCD aims to discover co-
variance structures for extrapolation in time series.

As implied above by Chen et al. (2016) in their second suggestion re-


lated to main effects, FANOVA provides useful information on these via the
main-effect plots and the percentage contributions to the variance of the
predictor up to a certain degree of input interactions throughout our three
case studies in Chapter 5. Note the critical importance of extrapolation in a
surrogate GaSP in the Storm Surge model. We want to train a model such
that the inputs can be extended to some future and more extreme storms.
Compared to a non-DA framework, DA proves to be more accurate in terms
of prediction for extrapolation purposes in our case study. Moreover, this
point has a high potential for future extrapolation-related works.
Even though test functions are of mainstream use in the literature and
pose fundamental challenges, one needs to go towards more subtle case stud-

153
6.2. Discussion and Future Work

ies depending on the experimenter’s purpose. Thus, a final remark on the


work done in this thesis is applying both approaches (non-standard correla-
tion structures and DA) in cases that go beyond these regular test functions:

• The Nilson-Kuusk model has a limited amount of data to experiment


with, highlighting the importance of using a GaSP as a surrogate
model of a complex system involving plant canopy.

• On the other hand, using the ADCIRC model to generate the storm
surge data is just a single computational approach to deal with this
class of natural phenomenon. Other approaches include the Weather
Research and Forecasting (WRF) model, Wave Model (WAM), and
Simulating Waves Nearshore (SWAN) numerical model. We can find
works in the literature comparing the prediction accuracy of these
models (Bhaskaran et al., 2013). Since DA is a data pre-processing
tool, it can also be applied along with a surrogate GaSP with these
other models.

154
Bibliography
Abt, M. (1999). Estimating the prediction mean squared error in Gaussian
stochastic processes with exponential correlation structure. Scandinavian
Journal of Statistics, 26(4):563–578.

Asmussen, E. and Heebøll-Nielsen, K. (1955). A dimensional analysis of


physical performance and growth in boys. Journal of Applied Physiology,
7(6):593–603.

Ba, S. and Joseph, V. R. (2012). Composite Gaussian process models for em-
ulating expensive functions. The Annals of Applied Statistics, 6(4):1838–
1860.

Ba, S. and Joseph, V. R. (2018). CGP: Composite Gaussian Process Models.


R package version 2.1-1.

Bastos, L. S. and O’Hagan, A. (2009). Diagnostics for Gaussian process


emulators. Technometrics, 51(4):425–438.

Ben-Ari, E. N. and Steinberg, D. M. (2007). Modeling data from computer


experiments: An empirical comparison of kriging with MARS and projec-
tion pursuit regression. Quality Engineering, 19(4):327–338.

Bhaskaran, P. K., Nayak, S., Bonthu, S. R., Murty, P., and Sen, D. (2013).
Performance and validation of a coupled parallel ADCIRC-SWAN model
for THANE cyclone in the Bay of Bengal. Environmental Fluid Mechanics,
13:601–623.

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. Journal


of the Royal Statistical Society. Series B (Methodological), 26(2):211–252.

Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978). Statistics for Exper-
imenters: An Introduction to Design, Data Analysis, and Model Building.
Wiley Series in Probability and Mathematical Statistics: Applied Proba-
bility and Statistics. Wiley.

155
Bibliography

Box, G. E. P. and Meyer, R. D. (1985). Some new ideas in the analysis of


screening designs. Journal of Research of the National Bureau of Stan-
dards, 90:495–502.

Bridgman, P. W. (1931). Dimensional Analysis. Yale University Press.

Buckingham, E. (1914). On physically similar systems; illustrations of the


use of dimensional equations. Phys. Rev., 4:345–376.

Çengel, Y. A. (2003). Heat Transfer: A Practical Approach. McGraw-Hill


series in mechanical engineering. McGraw-Hill.

Chapman, W. L., Welch, W. J., Bowman, K. P., Sacks, J., and Walsh, J. E.
(1994). Arctic Sea ice variability: Model sensitivities and a multidecadal
simulation. Journal of Geophysical Research, 99C:919–935.

Chen, H., Loeppky, J. L., Sacks, J., and Welch, W. J. (2016). Analysis meth-
ods for computer experiments: How to assess and what counts? Statistical
Science, 31(1):40–60.

Cressie, N. (1993). Statistics for Spatial Data. Wiley Series in Probability


and Statistics. Wiley.

Dancik, G. M. (2013). mlegp: Maximum Likelihood Estimates of Gaussian


Processes. R package version 3.1.5.

Duvenaud, D. (2014). Automatic Model Construction with Gaussian Pro-


cesses. PhD thesis, University of Cambridge, Cambridge, UK.

Duvenaud, D., Nickisch, H., and Rasmussen, C. E. (2011). Additive Gaus-


sian processes. ArXiv e-prints.

Finney, D. J. (1943). The fractional replication of factorial arrangements.


Annals of Eugenics, 12(1):291–301.

Finney, D. J. (1977). Dimensions of statistics. Journal of the Royal Statistical


Society, Series C (Applied Statistics), 26(3):285–289.

Fisher, Ronald Aylmer, S. (1926). The arrangement of field experiments.


Journal of the Ministry of Agriculture of Great Britain, 33:503–513.

Franke, R. (1979). A Critical Comparison of Some Methods for Interpolation


of Scattered Data. Final report. Defense Technical Information Center.

156
Bibliography

Friedman, J. H. (1984). A Variable Span Scatterplot Smoother. Technical


Report No. 5. Laboratory for Computational Statistics, Stanford Univer-
sity.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals
of Statistics, 19(1):1–67.
Friedman, J. H., Grosse, E., and Stuetzle, W. (1983). Multidimensional
additive spline approximation. SIAM Journal on Scientific and Statistical
Computing, 4.
Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression.
Journal of the American Statistical Association, 76(376):817–823.
Gao, Y., Gong, S., and Zhao, G. (2011). A novel and robust evolution
algorithm for optimizing complicated functions. Proceedings of 4th In-
ternational Workshop on Advanced Computational Intelligence, IWACI
2011.
Goovaerts, P. (2014). Geostatistics: A common link between medical geogra-
phy, mathematical geology, and medical geology. Journal of the Southern
African Institute of Mining and Metallurgy, 114:605–612.
Groemping, U. (2017). DoE.wrapper: Wrapper Package for Design of Ex-
periments Functionality. R package version 0.9.
Haaland, B. and Qian, P. Z. G. (2011). Accurate emulators for large-scale
computer experiments. The Annals of Statistics, 39(6):2974–3002.
Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments
with complex aliasing. Journal of Quality Technology, 24:130–137.
Hamada, M. and Wu, C. F. J. (2000). Experiments: Planning, Analysis,
and Parameter Design Optimization. Wiley Series in Probability and
Statistics. Wiley.
Hestenes, M. R. (1980). Conjugate Direction Methods in Optimization.
Stochastic Modelling and Applied Probability. Springer-Verlag New York.
Ho, F. P. and Myers, V. A. (1975). Joint probability method of tide
frequency analysis applied to Apalachicola Bay and St. George Sound,
Florida. Technical Report NWS-18, NOAA Technical Report.
Holland, G. J. (1980). An analytic model of the wind and pressure profiles
in hurricanes. Monthly Weather Review, 108(8):1212–1218.

157
Bibliography

Irish, J. L., Resio, D. T., and Cialone, M. A. (2009). A surge response


function approach to coastal hazard assessment - part 2: quantification of
spatial attributes of response functions. Natural Hazards, 51(1):183–205.

Islam, M. and Lye, L. (2009). Combined use of dimensional analysis


and modern experimental design methodologies in hydrodynamics exper-
iments. Ocean Engineering, 36:237–247.

Jamil, M. and Yang, X.-S. (2013). A literature survey of benchmark func-


tions for global optimization problems. Int. J. of Mathematical Modelling
and Numerical Optimisation, 4.

Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global op-
timization of expensive black-box functions. Journal of Global Optimiza-
tion, 13(4):455–492.

Joseph, V. R., Hung, Y., and Sudjianto, A. (2008). Blind kriging: A new
method for developing metamodels. Journal of Mechanical Design, 130(3).
031102.

Kahle, D. and Wickham, H. (2013). ggmap: Spatial visualization with


ggplot2. The R Journal, 5(1):144–161.

Kalaitzis, A., Honkela, A., Gao, P., and Lawrence, N. D. (2014). gptk:
Gaussian Processes Tool-Kit. R package version 1.08.

Kandasamy, K., Schneider, J., and Poczos, B. (2015). High dimensional


Bayesian optimisation and bandits via additive models. ArXiv e-prints.

Kirkwood, T. B. L. (1979). Geometric means and measures of dispersion.


Biometrics, 35(4):908–909.

Kuusk, A. (1996). A computer-efficient plant canopy reflectance model.


Computers & Geosciences, 22(2):149–163.

Lee, T. Y. and Zidek, J. V. (2020). Scientific versus statistical modelling: a


unifying approach. arXiv, 2002.11259.

Lin, L.-H. and Joseph, V. R. (2019). Transformation and additivity in


gaussian processes. Technometrics, 0(0):1–11.

Lin, L.-H. and Joseph, V. R. (2020). TAG: Transformed Additive Gaussian


Processes. R package version 0.2.0.

158
Bibliography

Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method


for large scale optimization. Mathematical Programming, 45(3):503–528.
Loeppky, J. L., Sacks, J., and Welch, W. J. (2009). Choosing the sample size
of a computer experiment: A practical guide. Technometrics, 51(4):366–
376.
Luce, R. D. (1959). On the possible psychophysical laws. Psychological
Review, 66(2):81–95.
Luettich, Jr, R., Westerink, J., and Scheffner, N. (1992). ADCIRC: An ad-
vanced three-dimensional circulation model for shelves, coasts, and estuar-
ies. Report 1. Theory and methodology of ADCIRC-2DDI and ADCIRC-
3DL. Dredging Research Program Tech. Rep. DRP-92-6, page 143.
MacDonald, B., Ranjan, P., and Chipman, H. (2015). GPfit: An R Package
for Fitting a Gaussian Process Model to Deterministic Simulator Outputs.
McKay, M. D., Beckman, R. J., and Conover, W. J. (1979). A comparison
of three methods for selecting values of input variables in the analysis of
output from a computer code. Technometrics, 21(2):239–245.
Meinsma, G. (2019). Dimensional and scaling analysis. SIAM Review,
61(1):159–184.
Milborrow, S. (2019). earth: Multivariate Adaptive Regression Splines. R
package version 5.1.1.
Moon, H. (2010). Design and Analysis of Computer Experiments for Screen-
ing Input Variables. PhD thesis, Ohio State University, Columbus, OH,
USA. AAI3425387.
Morris, M. D. and Mitchell, T. J. (1995). Exploratory designs for com-
putational experiments. Journal of Statistical Planning and Inference,
43(3):381–402.
Morris, M. D., Mitchell, T. J., and Ylvisaker, D. (1993). Bayesian design and
analysis of computer experiments: Use of derivatives in surface prediction.
Technometrics, 35:243–255.
Myers, V. A. (1975). Storm tide frequencies on the South Carolina coast.
Technical Report NWS-16, NOAA Technical Report.
Neal, R. (2012). Bayesian Learning for Neural Networks. Lecture Notes in
Statistics. Springer New York.

159
Bibliography

Nelder, J. A. and Mead, R. (1965). A simplex method for function mini-


mization. Computer Journal, 7:308–313.
Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage.
Mathematics of Computation, 35(151):773–782.
Peixoto, J. L. (1987). Hierarchical variable selection in polynomial regression
models. The American Statistician, 41(4):311–313.
R Core Team (2019). R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria.
Rayleigh (1915). The principle of similitude. Nature, 95(2368):66–68.
Resio, D. T., Irish, J., and Cialone, M. (2009). A surge response function
approach to coastal hazard assessment - part 1: basic concepts. Natural
Hazards, 51(1):163–182.
Roustant, O., Ginsbourger, D., and Deville, Y. (2012). DiceKriging, DiceOp-
tim: Two R packages for the analysis of computer experiments by kriging-
based metamodeling and optimization. Journal of Statistical Software,
51(1):1–55.
Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). Design
and analysis of computer experiments. Statistical Science, 4(4):409–423.
Schonlau, M. and Welch, W. J. (2006). Screening the input variables to a
computer model via analysis of variance and visualization. In Dean, A. and
Lewis, S., editors, Screening: Methods for Experimentation in Industry,
Drug Discovery, and Genetics, chapter 14, pages 308–327. Springer New
York.
Shen, W., Davis, T., Lin, D., and Nachtsheim, C. (2014). Dimensional
analysis and its applications in Statistics. Journal of Quality Technology,
46(3):185–198.
Shen, W. and Lin, D. K. J. (2018). A conjugate model for dimensional
analysis. Technometrics, 60(1):79–89.
Shen, W., Lin, D. K. J., and Chang, C.-J. (2018). Design and analysis
of computer experiment via dimensional analysis. Quality Engineering,
30(2):311–328.
Sonin, A. A. (2001). The Physical Basis of Dimensional Analysis. Depart-
ment of Mechanical Engineering, MIT, Cambridge, MA, USA, 2 edition.

160
Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging.
Springer Series in Statistics. Springer New York.

Surjanovic, S. and Bingham, D. (2017). Virtual library of simulation exper-


iments: test functions and datasets. Retrieved from https://www.sfu.
ca/%7Essurjano/index.html.

Tan, M. H. Y. (2017). Polynomial metamodeling with dimensional analy-


sis and the effect heredity principle. Quality Technology & Quantitative
Management, 14(2):195–213.

Welch, W. J. (2014). GaSP: Gaussian Stochastic Process. Version 2.9.1.

Westerink, J., Luettich, Jr, R., Feyen, J., Atkinson, J., Dawson, C., Roberts,
H., Powell, M., Dunion, J., Kubatko, E., and Pourtaheri, H. (2008). A
basin to channel-scale unstructured grid hurricane storm surge model ap-
plied to southern Louisiana. Monthly Weather Review, 136:833–864.

Westerink, J. J., Luettich, R. A., Baptists, A. M., Scheffner, N. W., and


Farrar, P. (1992). Tide and storm surge predictions using finite element
model. Journal of Hydraulic Engineering, 118(10):1373–1390.

Wilson, A. G. and Adams, R. P. (2013). Gaussian process kernels for pat-


tern discovery and extrapolation. In Proceedings of the 30th International
Conference on International Conference on Machine Learning - Volume
28, ICML13, pages III–1067–III–1075. JMLR.org.

Worley, B. A. (1987). Deterministic uncertainty analysis. Technical Report


CONF-871101-30, Oak Ridge National Lab., TN (USA).

Wu, C. F. J. (2015). Post-Fisherian experimentation: From physical to


virtual. Journal of the American Statistical Association, 110(510):612–
620.

Yates, F. (1935). Complex experiments. Supplement to the Journal of the


Royal Statistical Society, 2(2):181–247.

Yates, F. (1937). The Design and Analysis of Factorial Experiments. Impe-


rial Bureau of Soil Science. Technical Communication. Imperial Bureau
of Soil Science.

161
Appendix A

Correlation Structures
We provide a summary of all the correlation structures used throughout
this thesis for easy reference. Let x and x0 be two input vectors in Rd , the
standard (Std) correlation structure is
d
Y
RStd (x, x0 ) = Rj (hj ) ∈ [0, 1] (A.1)
j=1

where Rj (hj ) is a chosen correlation function (see Section 2.3), and depends
on the distance hj = xj − x0j for the jth input.

A.1 Main-Effect Correlation Structures


The main-effect correlation structure with equal weights (E-ME) is given
by:
d
1X
RE−ME (x, x0 ) = Rj (hj ) ∈ [0, 1]. (A.2)
d
j=1

Note that this structure assigns the equal weight of 1/d to each one of the
d main effects.
The main-effect correlation structure with unequal weights (U-ME) is
defined as:
d
X
RU−ME (x, x0 ) = ωj Rj (hj ) ∈ [0, 1], (A.3)
j=1

where there is a vector of weights


Ω = (ω1 , . . . , ωd )> ∈ [0, 1]d
subject to the constraint
d
X
ωj = 1.
j=1

162
A.2. Joint-Effect Correlation Structures Up to All 2-Input Interactions

For maximum likelihood optimization, we use a multinomial logit trans-


formation on these weights that fulfils the previous constraint. This trans-
formation is set up with a raw weight vector

τ = (τ1 , . . . , τd−1 )> ∈ (−∞, ∞)d−1 ,

where
exp (τi )
ωi = Pd−1 for i = 1, . . . , d − 1;
1 + k=1 exp (τk )
and
1
ωd = Pd−1 .
1+ k=1 exp (τk )

A.2 Joint-Effect Correlation Structures Up to


All 2-Input Interactions
The joint-effect correlation structure with equal weights using all 2-input
interactions (E-JE2) is given by:
d d d
E−JE2 0 1 X 1 X X
R (x, x ) = Rj (hj ) + d Rj (hj ) · Rk (hk ) ∈ [0, 1].
2d 2 2 j=1 k=j+1
j=1
(A.4)

Note that the two types of terms—main effects and 2-input interactions—
each receive weight 1/2, with the 1/2 distributed equally across all the cor-
relation functions in the type of term.
The joint-effect correlation structure with unequal weights using all 2-
input interactions (U-JE2) is defined as:
d d d
U−JE2 0λ1 X λ2 X X
R (x, x ) = Rj (hj ) + d Rj (hj ) · Rk (hk ) ∈ [0, 1],
d 2
j=1 j=1 k=j+1
(A.5)

where the weights λ1 and λ2 are subject to the constraint λ1 + λ2 = 1.


For maximum likelihood optimization, we also use a multinomial logit
transformation on these weights that fulfils the previous constraint. The
multinomial logit transformation is set up for the parameter

τ ∈ (−∞, ∞),

163
A.3. Joint-Effect Correlation Structures up to Selected 2-Input Interactions

where λ1 is computed as

exp (τ )
λ1 = ;
1 + exp (τ )

and λ2 is defined as
1
λ2 = .
1 + exp (τ )

A.3 Joint-Effect Correlation Structures up to


Selected 2-Input Interactions
Under equal weights on the effect framework, a joint-effect correlation struc-
ture with selected 2-input interactions (E-JE2S) reduces its number of 2-
input interactions in the following way:
d
E−JE2S 0 1 X 1 X
R (x, x ) = Rj (hj ) + Rj (hj ) · Rk (hk ) ∈ [0, 1],
2d 2b
j=1 {j,k}∈B

(A.6)

with the set



B = {j, k} for some 1 ≤ j < k ≤ d} where |B| = b.

We assign weights to main effects and 2-input interactions on a 50-50 basis.


Each main effect and 2-input interaction has an equal weight within the
corresponding half.
The joint-effect correlation structure with unequal weights using selected
2-input interactions (U-JE2S) is defined as:
d
U−JE2S 0 λ1 X λ2 X
R (x, x ) = Rj (hj ) + Rj (hj ) · Rk (hk ) ∈ [0, 1],
d b
j=1 {j,k}∈B

(A.7)

where the weights λ1 and λ2 are subject to the constraint λ1 + λ2 = 1. We


use a multinomial logit transformation on the vector of weights under this
approach, as in the case of RU−JE2 (x, x0 ).

164
A.4. Joint-Effect Correlation Structures with a Residual Effect Term

A.4 Joint-Effect Correlation Structures with a


Residual Effect Term
The joint-effect correlation structure using all 2-input interactions with a
residual effect term (E-JE2R) assigns fixed equal weights to the main effects,
2-input interactions, and residual component in the following way:
d d d
1 X 1 X X
RE−JE2R (x, x0 ) = Rj (hj ) + d Rj (hj ) · Rk (hk )
3d 3 2 j=1 k=j+1
j=1
d
1Y
+ RjRes (hj ) ∈ [0, 1]. (A.8)
3
j=1

This correlation structure resembles the structure RE−JE2 (x, x0 ), but there
is an additional third addend RjRes (hj ) representing the residual component
under a standard multiplicative structure.
Under equal weights on the effect framework, a joint-effect correlation
structure with selected 2-input interactions and a residual effect term (E-
JE2SR) is set up in the following way:
d
1 X 1 X
RE−JE2SR (x, x0 ) = Rj (hj ) + Rj (hj ) · Rk (hk )
3d 3b
j=1 {j,k}∈B
d
1 Y
+ RjRes (hj ) ∈ [0, 1], (A.9)
3
j=1

with the set



B = {j, k} for some 1 ≤ j < k ≤ d where |B| = b.
We are also making simplifications to the parameters in the correlation
function in the residual component. For instance, in the case of the SqExp
correlation function, this component will be represented as follows:
d d d
" #
SqExp
Y Y X
0 0 2 0 2
RjRes (hj , θ ) = exp(−θ hj ) = exp − θ hj .
j=1 j=1 j=1

On the other hand, for the PowExp correlation function, a residual compo-
nent becomes:
d d d
" #
Y PowExp
Y p 0 X p0
RjRes (hj , θ0 , p0 ) = exp(−θ0 hj ) = exp − θ0 hj .
j=1 j=1 j=1

165
A.5. Joint-Effect Correlation Structures Up to All 3-Input Interactions

Note that, depending on the correlation function, we are respectively setting


up common correlation and smoothness parameters θ0 and p0 across the d di-
mensions. The common parameter(s) across input dimensions distinguishes
this residual term from correlation structure RStd (x, x0 ).

A.5 Joint-Effect Correlation Structures Up to


All 3-Input Interactions
The joint-effect correlation structure with equal weights using up to all 3-
input interactions (E-JE3) is given by:
d d d
E−JE3 0 1 X 1 X X
R (x, x ) = Rj (hj ) + d Rj (hj ) · Rk (hk )
3d 3 2 j=1 j=k+1
j=1
d d d
1 X X X
+ d Rj (hj ) · Rk (hk ) · Rl (hl )
3 3 j=1 j=k+1 k=l+1
∈ [0, 1]. (A.10)

Note that the three types of terms—main effects, 2-input interactions,


and 3-input interactions—each receive weight 1/3, with the 1/3 distributed
equally across all the correlation functions in the type of term. As in the case
of the previous joint-effect structures, for any input, correlation functions
are shared between the main effects and both orders of input interactions.
Thus, we use the same correlation hyperparameters for the three types of
effects.
The joint-effect correlation structure with unequal weights using all 3-
input interactions (U-JE3) is given as:
d d d
λ1 X λ2 X X
RE−JE3 (x, x0 ) = Rj (hj ) + d Rj (hj ) · Rk (hk )
d 2
j=1 j=1 j=k+1
d d d
λ3 X X X
+ d
 Rj (hj ) · Rk (hk ) · Rl (hl )
3 j=1 j=k+1 k=l+1

∈ [0, 1]. (A.11)

where the weights fulfil the constraint 3j=1 λj = 1.


P
For maximum likelihood optimization, we again use a multinomial logit
transformation to satisfy the constraint. Optimization is performed on a

166
A.6. Joint-Effect Correlation Structures up to Selected 3-Input Interactions

raw weight vector


τ = (τ1 , τ2 )> ∈ (−∞, ∞)2 ,
where
exp (τi )
λi = P2 for i = 1, 2;
1 + k=1 exp (τk )
and
1
λ3 = P2 .
1+ k=1 exp (τk )

A.6 Joint-Effect Correlation Structures up to


Selected 3-Input Interactions
With equal weights, a joint-effect correlation structure up to selected 3-input
interactions (E-JE3S) reduces its number of 2 and 3-input interactions in
the following way:
d
1 X 1 X
RE−JE3S (x, x0 ) = Rj (hj ) + Rj (hj ) · Rk (hk )
3d 3b
j=1 {j,k}∈B
1 X
+ Rj (hj ) · Rk (hk ) · Rl (hl )
3c
{j,k,l}∈C

∈ [0, 1], (A.12)


with the set of pairs

B = {j, k} for some 1 ≤ j < k ≤ d where |B| = b,
and triples

C = {j, k, l} for some 1 ≤ j < k < l ≤ d where |C| = c.
We assign equal weights to main effects and both interaction orders.
We can set up unequal weights for this class of correlation structure
(U-JE3S) as:
d
U−JE3S 0 λ1 X λ2 X
R (x, x ) = Rj (hj ) + Rj (hj ) · Rk (hk )
d b
j=1 {j,k}∈B
λ3 X
+ Rj (hj ) · Rk (hk ) · Rl (hl )
c
{j,k,l}∈C

∈ [0, 1], (A.13)

167
A.6. Joint-Effect Correlation Structures up to Selected 3-Input Interactions

where the weights fulfil the constraint 3j=1 λj = 1. We use a multinomial


P
logit transformation on the vector of weights under this approach, as in the
case of RU−JE3 (x, x0 ).

168
Appendix B

Heat Transfer in a Solid


Sphere

308 315

● ● ●




310

306 ●



305 ●
Ts
Ts

304 ●

300

302


295


300 290

0.0 0.2 0.4 0.6 0.8 1.0 0.050 0.075 0.100 0.125 0.150 0.175 0.200
R r

(a) Ratio of distance from center and (b) Radius of sphere (r).
sphere radius (R).

Figure B.1: Estimated main effect (red) on temperature of sphere (Ts ) from
FANOVA, using a Std GaSP with PowExp correlation function and a con-
stant regression term with approximate 95% confidence limits (green) for
the Solid Sphere function (5.11). One mLHD of n = 70 runs is used for
training the Std GaSP.

169
Appendix B. Heat Transfer in a Solid Sphere


320 ●

320

315

315

310

310 ●


305 ●
Ts
Ts

305

300

300


295


295

290 ●

290

0 100 200 300 400 500 600 240 245 250 255 260 265 270
t Tm

(a) Time (t). (b) Temperature of medium (Tm ).

Figure B.2: Estimated main effect (red) on temperature of sphere (Ts ) from
FANOVA, using a Std GaSP with PowExp correlation function and a con-
stant regression term with approximate 95% confidence limits (green) for
the Solid Sphere function (5.11). One mLHD of n = 70 runs is used for
training the Std GaSP.

170
Appendix B. Heat Transfer in a Solid Sphere

320 307

315


306

310 ●

● ●


Ts
Ts

305 ● 305


● ●
300



304


295

290 303

50 55 60 65 70 75 80 30 40 50 60 70 80 90 100
∆T k

(a) Initial sphere temperature minus tem- (b) Thermal conductivity (k).
perature of medium (∆T ).

Figure B.3: Estimated main effect (red) on temperature of sphere (Ts ) from
FANOVA, using a Std GaSP with PowExp correlation function and a con-
stant regression term with approximate 95% confidence limits (green) for
the Solid Sphere function (5.11). One mLHD of n = 70 runs is used for
training the Std GaSP.

171
Appendix B. Heat Transfer in a Solid Sphere

308


307

306


Ts

305

304

303 ●

302

100 110 120 130 140 150 160


hc

Figure B.4: Estimated main effect (red) of convective heat transfer coeffi-
cient (hc ) on temperature of sphere (Ts ) from FANOVA, using a Std GaSP
with PowExp correlation function and a constant regression term with ap-
proximate 95% confidence limits (green) for the Solid Sphere function (5.11).
One mLHD of n = 70 runs is used for training the Std GaSP.

172
173
Appendix C. Storm Surge Model

Appendix C

Storm Surge Model

50 10

45

40 8

35
Variance Percentage

30 6

25

20 4

15

10 2

5 ●

0 0
α1 ∆p Rmax θa θb B D1 vf ∆p.α1 B.α1 Rmax.∆p θa.α1 Rmax.α1 θb.α1

Main Effects Largest 2−Input Interactions

(a) Location 1.

50

6
45

40

35
Variance Percentage

30 4

25 ●

20

15 ●
2

10 ●

5 ●

0 0
α2 ∆p Rmax D2 θa B θb vf ∆p.α2 Rmax.∆p B.α2 Rmax.α2 θa.α2 Rmax.B

Main Effects Largest 2−Input Interactions

(b) Location 2.

Figure C.1: FANOVA percentage contributions of main effects and 2-input


interactions for the Storm Surge model using a Std GaSP, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training.

174
Appendix C. Storm Surge Model

60 10

55 9
50
8
45
7
Variance Percentage

40 ●

35 6

30 5

25 4
20 ●

3
15
2 ●

10

5 1
● ●
● ●

0 ●


0
D3 Rmax ∆p vf θa α3 B θb D3.∆p Rmax.D3 Rmax.∆p B.D3 Rmax.B.D3 Rmax.D3.∆p

Main Effects Largest 2 and 3−Input Interactions


(a) Location 3.

60 20

55 18 ●

50
16
45
14
Variance Percentage

40

35 12

30 10

25 8

20
6
15 ●

4
10 ●

5 2

● ●

0 ●

0
D4 ∆p α4 Rmax vf B θa θb D4.∆p Rmax.∆p B.D4 Rmax.D4 D4.α4 Rmax.α4

Main Effects Largest 2−Input Interactions


(b) Location 4.

Figure C.2: FANOVA percentage contributions of main effects and higher-


order input interactions for the Storm Surge model using a Std GaSP, with
PowExp correlation function and a constant regression term, with 25 ran-
domized datasets of n = 200 runs for training.

175
Appendix C. Storm Surge Model

60 16

55
14
50

45 12

Variance Percentage

40 ●

10
35

30 8

25
6
20

15 4

10
2 ●

5 ●






● ●

0 ● ●
0
D5 ∆p vf B Rmax α5 θa θb D5.∆p B.D5 Rmax.D5 Rmax.∆p vf.D5 D5.α5

Main Effects Largest 2−Input Interactions


(a) Location 5.

50 10

45 9

40 8

35 7
Variance Percentage

30 6

25 5

20 ●
4

15 3
● ●

10 2 ●

5 ●
1

0 ●
0
∆p D6 vf B α6 Rmax θa θb D6.∆p vf.∆p vf.D6 D6.α6 B.D6 Rmax.D6

Main Effects Largest 2−Input Interactions


(b) Location 6.

Figure C.3: FANOVA percentage contributions of main effects and 2-input


interactions for the Storm Surge model using a Std GaSP, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training.

176
Appendix C. Storm Surge Model

1.00 0.8

● ●
● ●
0.75 ●
0.6

● ●



0.50 0.4
ηmax

ηmax

1

2


0.25 ● 0.2 ●



0.00 0.0

25 50 75 20 40 60 80
∆p ∆p

(a) Location 1. (b) Location 2.

0.8
0.4



● 0.6 ●

0.3 ● ●



ηmax

ηmax


0.4
3

0.2 ●






● ●
0.2 ●
0.1 ●

0.0 0.0

40 60 80 0 25 50 75
∆p ∆p

(c) Location 3. (d) Location 4.

Figure C.4: Estimated main effect (red) of central pressure deficit (∆p ) on
maximum storm surge (ηimax ) from FANOVA, using a Std GaSP with Pow-
Exp correlation function and a constant regression term with 95% confidence
limits (green). One randomized dataset of n = 200 runs is used for training
the Std GaSP.

177
Appendix C. Storm Surge Model

1.5 1.5



● ●
1.0 1.0
● ●
ηmax

ηmax
● ●
5

● ●

● ●
0.5 0.5

● ●
● ●


● ●

0.0 0.0
0 20 40 60 80 25 50 75
∆p ∆p

(a) Location 5. (b) Location 6.

Figure C.5: Estimated main effect (red) of central pressure deficit (∆p ) on
maximum storm surge (ηimax ) from FANOVA, using a Std GaSP with Pow-
Exp correlation function and a constant regression term with 95% confidence
limits (green). One randomized dataset of n = 200 runs is used for training
the Std GaSP.

178
Appendix C. Storm Surge Model

Non−DA DA

0.50

0.45 ● ●

0.40 ●

● ●
● ●

0.35
eN−RMSE

0.30

0.25


0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(a) Location 1.

Non−DA DA
0.70

0.65 ●

0.60

0.55

0.50

0.45

eN−RMSE


● ●

0.40

0.35 ●

0.30

0.25

0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(b) Location 2.

Figure C.6: Prediction accuracy for the Storm Surge model, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training and N = 124 runs for testing.

179
Appendix C. Storm Surge Model

Non−DA DA
0.55

0.50

0.45

0.40

eN−RMSE

0.35

0.30

0.25

0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(a) Location 3.

Non−DA DA

0.50

0.45


0.40

0.35

eN−RMSE

0.30 ●


0.25

0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(b) Location 4.

Figure C.7: Prediction accuracy for the Storm Surge model, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training and N = 124 runs for testing.

180
Appendix C. Storm Surge Model

Non−DA DA
0.45

0.40

0.35


0.30 ●

eN−RMSE

0.25 ●

0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(a) Location 5.

Non−DA DA
0.40

0.35 ●

0.30
eN−RMSE

0.25

0.20

0.15

0.10

Baseline Log−Output Log−Input Full−Log

(b) Location 6.

Figure C.8: Prediction accuracy for the Storm Surge model, with PowExp
correlation function and a constant regression term, with 25 randomized
datasets of n = 200 runs for training and N = 124 runs for testing.

181

You might also like