Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

Enhancing surrogate models of engineering

structures with graph-based and physics-informed


learning
by
Eamon Jasper Whalen
B.S.E. Mechanical Engineering
The University of Michigan, 2016
Submitted to the Center for Computational Science and Engineering
in partial fulfillment of the requirements for the degree of
Master of Science in Computational Science and Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2021
© Eamon Jasper Whalen 2021. All rights reserved.
The author hereby grants to MIT permission to reproduce and to
distribute publicly paper and electronic copies of this thesis document
in whole or in part in any medium now known or hereafter created.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Center for Computational Science and Engineering
May 19, 2021
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Caitlin Mueller
Associate Professor, Civil and Environmental Engineering
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nicolas Hadjiconstantinou
Professor, Mechanical Engineering
Co-Director, Center for Computational Science and Engineering
2
Enhancing surrogate models of engineering structures with
graph-based and physics-informed learning
by
Eamon Jasper Whalen

Submitted to the Center for Computational Science and Engineering


on May 19, 2021, in partial fulfillment of the
requirements for the degree of
Master of Science in Computational Science and Engineering

Abstract
This thesis addresses several opportunities in the development of surrogate models
used for structural design. Though surrogate models have become an indispensable
tool in the design and analysis of structural systems, their scope is often limited by the
parametric design spaces on which they were built. In response, this work leverages
recent advancements in geometric deep learning to propose a graph-based surrogate
model (GSM). The GSM learns directly on the geometry of a structure and thus can
learn on designs from multiple sources without the typical restrictions of a parametric
design space.
Engineering surrogate models are often limited by data availability, since designs
and performance data can be expensive to produce. This work shows that transfer
learning, through which training data of varying topology, complexity, loads and
applications are repurposed for new predictive tasks, can be used to improve the data
efficiency of surrogates, often reducing the required amount of training data by one or
two orders of magnitude. This work also explores new potential sources for training
data, namely engineering design competitions, and presents SimJEB, a new public
dataset of simulated engineering components designed specifically for benchmarking
surrogate models. Finally, this work explores the emerging technology of physics-
informed neural networks (PINNs) for structural surrogate modeling, proposing two
new heuristics for improving the convergence and accuracy of PINNs in practice.
Combined, these contributions advance the generalizability and data efficiency of
surrogate models used in structural design.

Thesis Supervisor: Caitlin Mueller


Title: Associate Professor, Civil and Environmental Engineering

3
4
Acknowledgments
I would like to extend a major thank you to my advisor, Dr. Caitlin Mueller, for
her technical expertise, academic guidance, and moral support. Thank you, along
with the other members of the Digital Structures Group, for creating an exciting and
supportive environment in which to learn and grow.

A special thanks to Fatma Kocer, Brett Chouinard, and many others at Altair for
encouraging me to attend graduate school and helping me make that dream a reality.

Thank you to Renaud Danhaive, Yijiang Huang, Joe Pajot, Jonathan Ollar, and
David Xu for their mentorship and technical expertise.

Thank you also to Simon Ganeles and Azariah Beyene for their many technical con-
tributions.

Last but not least, thank you to my partner, Janene, for her guidance, encouragement
and love.

This research was supported by the Engineering Data Science group at Altair Engi-
neering Inc. and is based upon work supported by the National Science Foundation
under Grant No. 1854833.

5
6
Contents

1 Introduction 11
1.1 Engineering surrogate modeling . . . . . . . . . . . . . . . . . . . . . 11
1.1.1 Physical simulations in engineering . . . . . . . . . . . . . . . 11
1.1.2 Surrogate modeling . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Existing models are restrictive . . . . . . . . . . . . . . . . . . 13
1.2.2 Training data is scarce . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Graph representations . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 New datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.4 Physics-informed learning . . . . . . . . . . . . . . . . . . . . 17

2 Towards Reusable Surrogate Models: Graph-Based Transfer Learn-


ing on Space Frame Structures 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Surrogate modeling with parametric design features . . . . . . 22
2.2.2 Surrogate modeling without parametric design features . . . . 22
2.2.3 Geometric deep learning: learning on graphs . . . . . . . . . . 23
2.2.4 Transfer learning: recycling data . . . . . . . . . . . . . . . . 24
2.3 Methodology: surrogate modeling with graphs . . . . . . . . . . . . . 25
2.3.1 Data representation: space frames as graphs . . . . . . . . . . 25

7
2.3.2 Convolutions on graphs . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 The graph-based surrogate model (GSM) . . . . . . . . . . . . 26
2.3.4 A naive alternative: the pointwise surrogate . . . . . . . . . . 27
2.3.5 A baseline: predicting the mean . . . . . . . . . . . . . . . . . 27
2.4 Characterizing the GSM . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Data generation and filtering . . . . . . . . . . . . . . . . . . . 28
2.4.2 Training and tuning . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Comparing the GSM to the pointwise surrogate . . . . . . . . 30
2.4.4 Studying generalizability . . . . . . . . . . . . . . . . . . . . . 30
2.5 Transfer learning: repurposing the GSM . . . . . . . . . . . . . . . . 34
2.5.1 Effects on generalizability . . . . . . . . . . . . . . . . . . . . 34
2.5.2 Studying data efficiency . . . . . . . . . . . . . . . . . . . . . 37
2.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . 39

3 SimJEB: Simulated Jet Engine Bracket Dataset 41


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Synthetically generated shape datasets . . . . . . . . . . . . . 45
3.2.2 Collected shape datasets . . . . . . . . . . . . . . . . . . . . . 45
3.2.3 Design competition data . . . . . . . . . . . . . . . . . . . . . 46
3.3 Geometry cleaning and simulation pipeline . . . . . . . . . . . . . . . 47
3.3.1 Design competition overview . . . . . . . . . . . . . . . . . . . 47
3.3.2 CAD file acquisition . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.3 Geometry cleaning . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.4 Finite element structural simulation . . . . . . . . . . . . . . . 49
3.4 Dataset characterization . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.1 Characterization of geometry . . . . . . . . . . . . . . . . . . 50
3.4.2 Characterization of structural performance . . . . . . . . . . . 52
3.5 Licensing, attributions and access . . . . . . . . . . . . . . . . . . . . 52
3.6 Surrogate modeling benchmark . . . . . . . . . . . . . . . . . . . . . 54

8
3.7 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . 55

4 Heuristics for improving the accuracy and convergence of physics-


informed neural networks in structural mechanics 57
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Equations of elasticity . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 Soft boundary conditions . . . . . . . . . . . . . . . . . . . . . 61
4.3.3 Network and loss function . . . . . . . . . . . . . . . . . . . . 62
4.3.4 Hard boundary conditions . . . . . . . . . . . . . . . . . . . . 64
4.3.5 Heuristic 1: loss term normalization . . . . . . . . . . . . . . . 64
4.3.6 Heuristic 2: multi-step refinement . . . . . . . . . . . . . . . . 65
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.1 Test problem: cantilever beam . . . . . . . . . . . . . . . . . . 66
4.4.2 Initial training . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.3 Refinement training: soft boundary conditions . . . . . . . . . 68
4.4.4 Refinement training: hard boundary conditions . . . . . . . . 69
4.5 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . 70

5 Conclusion 71

9
10
Chapter 1

Introduction

This work presents new strategies for augmenting surrogate models used in structural
design. The following sections present a broad overview of surrogate modeling as well
as existing challenges and opportunities in the field. Four opportunities in particular,
graph representations, transfer learning, new datasets, and physics-informed learning,
are addressed in subsequent chapters with novel methods and insights. Each chapter
also contains an in-depth introduction and literature review of the state of the art of
each topic.

1.1 Engineering surrogate modeling

1.1.1 Physical simulations in engineering

Physical simulations have become ubiquitous in nearly every engineering field, includ-
ing structural design. Computational methods such as finite element, finite difference
and finite volume, can be used to solve partial differential equations (PDEs) on ar-
bitrarily complex domains, for which analytical solutions rarely exist. Simulation is
an increasingly popular alternative to physical testing, where physical tests may be
prohibitively expensive, time consuming, or simply not possible. Simulations allow
engineers to quickly evaluate "what if" scenarios. The result is often faster design
iterations and higher-performance outcomes. The parametric nature of engineering

11
simulations naturally lends itself to a wide array of invaluable computational experi-
ments, including design optimization, uncertainty quantification, main effects analy-
sis, and parameter estimation, to name a few. In structural design, the finite element
method is most commonly used to solve for displacement given the geometry, loads,
supports, and material properties as inputs.

1.1.2 Surrogate modeling

Surrogate models, also known as metamodels, response surfaces, reduced order mod-
els, approximation models, or emulators, are mathematical approximations of a sys-
tem’s behavior. In contrast with physical simulations, surrogate models are almost
always data-driven, meaning that they are trained on a set of observations in a su-
pervised manner. There are many potential uses for surrogate models, including
generating smooth approximations of noisy systems, studying the effects of input pa-
rameters, and sharing system behavior while protecting intellectual property details;
however, the most common use of surrogate models is speeding up design evalua-
tion. Surrogate models are traditionally simple regression models which can make
performance predictions several orders of magnitude faster than physical simulation.
Surrogate models are invaluable for applications where either physical simulations are
prohibitively slow (e.g. simulating a full-vehicle car crash), when a prohibitively large
number of simulations is required (e.g. design optimization, uncertainty quantifica-
tion, generating dense visualizations), or both. In a typical workflow, observations are
generated by creating a parametric simulation model, sampling the parameters sev-
eral times in a computational experiment, and collecting key performance indicators
(KPIs). A supervised machine learning model is then trained to predict KPIs given
design parameters as inputs. The surrogate model can then replace the engineering
simulation in a variety of tasks.
Generally, surrogate models take a design as input and output a performance
prediction; however, the forms that these inputs and outputs take often depend on
the specific algorithm being used. Traditional surrogate models rely on standard
regression techniques that operate over fixed-length vectors. This framework, where

12
each design is represented by a vector, fits conveniently with the parametric modeling
strategies often used to generate training data. In a typical workflow, the engineer
manually designates one or more parameters in a physical model to be studied (i.e.
the "design variables"). These design variables form a design space: the theoretical
space containing all possible designs. A design of experiment (DOE) is then used to
efficiently sample the design space, resulting in a dataset of designs and accompa-
nying performance labels. Since each design is already represented by a parameter
vector, it is convenient to use this same vector as the input to a surrogate model.
Surrogate models based on hand-crafted design parameters are simple to implement
and understand because they rely on time-tested regression algorithms.

1.2 Challenges
Though engineering surrogate modeling has been applied successfully for decades,
fundamental challenges regarding surrogate modeling workflows still restrict their
adoption and impact.

1.2.1 Existing models are restrictive

While hand-crafted design parameters simplify the learning algorithm, they also limit
the design process in several ways. To start, designing a parametric design space is a
nontrivial task. Knowing which design changes will result in interesting or valuable
outcomes often requires experience, and yet the selection of a suitable design space is
critical to the success of the subsequent design exploration. Even when an effective
design space can be known a priori, a list of design parameters is far less expressive
than the geometry representations typically used in Computer Aided Design (CAD),
like splines and triangular meshes, because it limits the user to a fixed design space.
Finally, the use of design parameters as surrogate model inputs couples the surrogate
to the data generation process. A surrogate model trained on one design space is
effectively useless in another, and data from different design spaces (or data created
by other means) cannot be easily combined during training. Ideally, surrogate models

13
would operate on more organic representations of geometry that do not limit design
freedom or necessarily couple the surrogate model to the data source.

1.2.2 Training data is scarce

One of the primary reasons why deep learning has disrupted fields like computer
vision and natural language processing is that there exists an abundance of training
data. Images, video, audio, and tabular data are collected by increasingly affordable
sensors and distributed on the web at impressive rates. This is not the case with
engineering design. A single CAD design may take hours to days for an engineer
to create by hand and, once created, the physical simulation of its behavior can be
equally expensive. Furthermore, intellectual property concerns regarding engineering
designs often limit the open sharing of engineering data on the web. The result is that
the engineering equivalents of massive, labeled databases like Imagenet and AudioSet
are few to none.

Traditionally, the surrogate modeling community has addressed data scarcity by


limiting the scope. By restricting design exploration to a parametrized design space,
training data can be synthetically generated rather than collected. The simple re-
gression models used in this case also often have less learnable parameters than deep
learning models and thus require less training data. However, as new methods emerge
for learning directly from 3D shapes, the data scarcity problem in engineering is be-
coming more relevant. The problem can be approached from two angles: either new
methods for collecting, generating or augmenting data are required to meet the needs
of current algorithms, or more data-efficient algorithms are required to make use-
ful predictions from limited samples. It seems likely that the optimal approach will
involve a combination of the two.

14
1.3 Opportunities

1.3.1 Graph representations

Recent advancements in deep learning have led to the ability to learn on more or-
ganic representations of shape, including multi-view images, voxels, point clouds, and
meshes (graphs). Graphs in particular are a promising representation for engineering
surrogate models. Graphs are a natural representation of both polygonal meshes and
space frames, and thus little to no preprocessing is required to convert from a deep
learning representation to that used by designers. Graph representations avoid the
lossy rasterization required for Euclidian representations like images and voxels. Un-
like point clouds, graphs encode the topology as well as the geometry of the shape.
The emerging field of geometric deep learning has produced a variety of algorithms
for learning classification, segmentation, and regression tasks on graphical domains.
The integration of geometric deep learning techniques into structural surrogate
modeling has potential to revolutionize how surrogates are trained and deployed in
practice. No longer confided to a design space, graph-based surrogate models support
arbitrary shape changes, increasing design freedom. Since the geometry representa-
tion is decoupled from the design generation process, they also accommodate training
on data from multiple sources, including different parametric studies, and data from
previous design iterations, projects, or domains. This work presents a graph-based
surrogate model (GSM) for predicting the structural performance of space frame
structures in chapter 2. The GSM can accurately predict the deflection of space
frames using only the structure’s geometry, loads and supports as inputs. It is shown
that the GSM can learn on data from multiple design studies simultaneously, and that
doing so is often advantageous compared to training on data from a single source.

1.3.2 Transfer learning

Besides improving design freedom, graph-based surrogate modeling also enables trans-
fer learning. The advantages of re-purposing previously trained models is well doc-

15
umented in the deep learning community, yet transfer learning is rarely applied to
engineering surrogate models. Engineering design is a natural candidate for trans-
fer learning. The incremental nature of engineering design results in many design
variants which are all created for the same functional purpose. Furthermore, tra-
ditional surrogate modeling requires that a training set be generated for each new
parametric model. The result is often several small, disjoint design studies which
differ slightly in geometry, topology, or loading conditions. Transfer learning has the
potential to significantly reduce the amount of new training data required to train
graph-based surrogate models by leveraging this existing data. Chapter 2 presents a
transfer learning methodology for the GSM and demonstrates how positive transfer
(i.e. improved prediction accuracy) can be achieved while learning across designs of
varying topologies, loads, complexities and applications. The implication is that the
amount of training data required to achieve a desired accuracy is reduced by one or
two orders of magnitude.

1.3.3 New datasets

Since graph-based representations decouple the data generation process from the sur-
rogate model, training data no longer necessarily has to come from parametric models
at all. An alternative potential data source involves collecting engineering designs
from the web (i.e. "the wild"). Though there does exist a few large collections of 3D
models, they are rarely of engineering components, and those that are are essentially
random grab bags of CAD designs without any information about their intended load
conditions or functional purpose. One notable exception is online engineering design
contests. Design contest submissions are all designed for the same functional purpose
and so they can be more readily used by surrogate models. Since the contest sub-
missions are created by hand by various engineers, they exhibit significantly higher
geometric diversity and complexity than can be produced from a parametric design
space. Design contest data is therefore a middle ground between generated data and
data collected from the wild.
Chapter 3 presents SimJEB: a collection of 381 engineering brackets collected

16
from an online design contest. The brackets have been cleaned, oriented, meshed,
and simulated according to the original competition load conditions. The bracket
geometry and accompanying simulation results form a non-parametric dataset for
evaluating graph-based surrogate models. Chapter 3 also proposes a methodology
for using SimJEB as a benchmark, including guidelines for training surrogate models
and quantifying their performance in a consistent way. The SimJEB dataset has been
released for public use by researchers in geometric machine learning and engineering
surrogate modeling.

1.3.4 Physics-informed learning

The advent of physics-informed neural networks (PINNs) has interesting implications


for engineering surrogate modeling. PINNs are neural networks that have a PDE
(or system of PDEs) as well as boundary conditions in their loss function. When
the network is trained, the residual of the solution is decreased, leveraging the fully-
differentiable nature of neural networks to evaluate terms involving partial deriva-
tives. PINNs can be used as standalone forward solvers of PDEs; however, when
additional data-driven terms are added to the loss function, PINNs become a type
of solver/surrogate model hybrid. From a surrogate modeling perspective, one could
equate the PDE in the loss function as a kind of physics-aware regularization of a
regression model. The implication is that PINNs could lead to more efficient and
perhaps even more trustworthy surrogates.
While PINNs show promise for improving purely data-driven surrogate models,
they also come with unique challenges. The training process is sensitive to the co-
efficients used to weight loss terms. Furthermore, the scale of the output is often
assumed to be known a priori. Chapter 4 presents two novel heuristics for address-
ing these challenges and demonstrates their effectiveness on a canonical plane-stress
problem.

17
18
Chapter 2

Towards Reusable Surrogate Models:


Graph-Based Transfer Learning on
Space Frame Structures

Surrogate models are often employed to speed up engineering design optimization;


however, they typically require that all training data conform to the same parametriza-
tion (e.g. design variables), limiting design freedom and prohibiting the reuse of
historical data. In response, this chapter proposes Graph-based Surrogate Models
(GSMs) for space frame structures. The GSM can accurately predict displacement
fields from static loads given the structure’s geometry as input, enabling training
across multiple parametrizations. GSMs build upon recent advancements in geo-
metric deep learning which have led to the ability to learn on undirected graphs: a
natural representation for space frames. To further promote flexible surrogate mod-
els, the paper explores transfer learning within the context of engineering design, and
demonstrates positive knowledge transfer across data sets of different topologies, com-
plexities, loads and applications, resulting in more flexible and data-efficient surrogate
models for space frame structures.

19
2.1 Introduction

Surrogate models, also known as metamodels, response surfaces, reduced order mod-
els, approximation models, or emulators, are used extensively in engineering to ap-
proximate computationally-intensive processes. In a typical workflow, training data
is produced by running a design of experiment (DOE) of physics-based simulations,
after which a surrogate model is trained in a supervised manner to predict one or
more of the simulated quantities. The trained surrogate model might then be used
to perform fast optimizations or provide real-time performance predictions. Gener-
ally, these methods require that each design be represented as a fixed-length vector
of design parameters (e.g. design variables). This requirement restricts the surrogate
model to a single design space, requiring the user to train a new surrogate model
every time the parametrization changes.

Ideally, surrogate models would operate on more organic representations of ge-


ometry, enabling learning across design data from multiple sources. Many design
processes are incremental in nature. The result is often several small, disjoint design
studies which differ slightly in geometry, topology, or loading conditions. A more flex-
ible surrogate model could be trained across design iterations, perhaps supplemented
with historical designs from previous projects, and could be continuously updated as
new data becomes available. The ability to learn across related projects would not
only save computational resources but might also yield powerful insights that could
not have been inferred from a single design space. Such models would also grant
engineers greater design freedom since design changes would not be restricted to the
parametrization used to generate training data.

One challenge in developing such a model is choosing a geometry representation.


An ideal representation would accommodate arbitrary changes to the geometry or
topology, and encode loads and supports to enable learning across load cases. A
second challenge is quantifying the extent to which such a surrogate generalizes to
new designs. Unlike with traditional parametrization, the notions of interpolation
and extrapolation are not well-defined for representations that span the set of all

20
possible shapes. How might one determine which inputs are “safe” and which are not
likely to yield quality predictions?
This work explores the use of graph neural networks as surrogate models for space
frame structures. The proposed Graph-based Surrogate Model (GSM) learns to pre-
dict a displacement field given only the geometry, supports, and loads as inputs. It is
shown that the GSM can be trained on data from multiple design models simultane-
ously, often outperforming GSMs trained on a single source. Transfer learning is then
explored as an effective method to repurpose previously trained GSMs to new tasks.
Both the generalizability and data efficiency of the GSM are improved with trans-
fer learning, with positive transfer being observed across varying topologies, loads,
complexities, and even different applications.
The key contributions of this chapter are as follows:

1. Graph-based Surrogate Models (GSMs), which operate directly on the geometry


and do not require parametric design features, are proposed for the modeling
of space frame structures

2. Transfer learning is shown to improve the GSMs data efficiency and generaliz-
ability, leveraging historical data to reduce the required number of simulations
by one or two orders of magnitude

3. Various source/target pairs that arise naturally in a design context, including


design data of varying topologies, loads, complexities and applications, are used
to demonstrate the utility of transfer learned GSMs in a real world setting

The remainder of this paper is organized as follows: section 2.2 reviews related
work, section 2.3 introduces the methodology of the GSM and a few naive alternatives
used for comparison, section 2.4 outlines data generation methods and presents exper-
imental results, section 2.5 introduces transfer learning and presents further results,
and section 2.6 contains conclusions and ideas for future work.
The following terminology is used throughout the paper: Let design refer to a
specific design concept of a structure (i.e. something that could be built), design

21
model (DM) refer to a hand-parametrized design space which can be sampled to
generate designs, and surrogate model refer to a data-driven predictive model that
learns to predict a structure’s engineering performance.

2.2 Related work

Engineering surrogate modeling is a thoroughly explored topic with applications dat-


ing back to the 1980s. Conversely, transfer learning and geometric deep learning are
relatively young research areas with hundreds of papers published in the last few years
alone. The following is a brief review of what are considered to be the most relevant
works to this one, but is by no means comprehensive.

2.2.1 Surrogate modeling with parametric design features

Surrogate models have been used in engineering design for several decades (see [84, 25,
61] for a review). Some of the most common surrogate modeling algorithms include
polynomial regression [70], kriging (also known as Gaussian processes) [17], radial ba-
sis functions [23], random forest [78] and neural networks [57]. [79] compared several of
these algorithms for civil engineering problems. Dimensionality reduction techniques
have been used to derive more suitable parametrizations [11, 21] and quantities of
interest [91]. All of the aforementioned methods require that a design be represented
as a fixed-length vector of parametric design features, restricting the feasible designs
to some pre-determined space. This work proposes a surrogate model that operates
on the geometry directly and is thus not limited to a particular parametrization.

2.2.2 Surrogate modeling without parametric design features

Recently, a few surrogate models have been proposed that do not rely on handcrafted
design parameters. [92] proposed using "knowledge-based" characteristics, which are
independent of design variables, as features. While this may enable the combination
of training data from multiple design spaces, it still relies heavily on the user to

22
craft useful characteristics. Other approaches, have sought to learn on the geometry
itself. The pursuit of deep learning methods for shape data has led to the ability
to learn on several geometry representations, including shape descriptors, images,
voxels, polycubes, signed distance functions, point clouds, and graphs (see [68, 4] for
a review). Surrogate models have been trained on images [35, 48, 93, 45, 26, 30], voxels
[94, 86] and polycubes [80, 6]. Images and voxels suffer from resolution problems and
data loss due to rasterization. Polycubes solve this problem by mapping the geometry
to a regular grid but are limited to fixed-topology data sets.
The advent of geometric deep learning techniques has enabled learning on non-
Euclidian domains which are generally more natural representations of geometry.
[18] trained a surrogate model to predict lift and drag coefficients from 3D point
clouds. While potentially useful for solid bodies, point clouds are not an adequate
representation of space frames because they lack topological information. Other works
have represented designs as graphs. [6] used a graph-based convolutional model to
learn fluid dynamics on meshed surfaces, [20] used a similar approach to learn the
structural behavior of a thin shell, and [83] learned material properties from graph-
based microstructures. The closest existing work to this one is probably [15], in which
graph representations of space frames were used to optimize cross section sizes for
structural loads. The structures in [15] had constant loads and geometry (apart from
the cross sections), whereas this study explores the flexibility of graph-based networks
to generalize across various geometries, topologies and loads.
Other notable engineering applications of graph-based learning include feature
recognition on 3D CAD [13], shape correspondence for additive manufacturing [33],
and generation of design decision sequences [63]; however, these do not directly ad-
dress surrogate modeling.

2.2.3 Geometric deep learning: learning on graphs

Graph-based learning, both for shape analysis as well as other tasks, has recently
received a lot of attention. [10] introduced the term geometric deep learning to mean
learning from non-Euclidian data structures such as graphs and point clouds. See

23
[90, 95] for a general survey on graph neural networks (GNNs). MoNet [46] was
the first framework to apply a GNN to meshed surfaces by leveraging convolutions
over local geodesic patches. ACNN [8] defined similar patches based on anisotropic
heat kernels, while GCNN [50] generalized these patches to user-defined pseudo co-
ordinates. FeaStNet [81] introduced an attention mechanism to perform "feature
steering" which acts as dynamic filtering over neighbors. Other notable extensions
of GNNs to shapes include MeshCNN [32] which introduced learnable edge pooling
and StructureNet [49] which introduced a graph-based encoder for hierarchical part
representations. The aforementioned frameworks were applied to geometry process-
ing tasks including shape correspondence, classification, and segmentation, whereas
this work focuses on structural surrogate modeling.

2.2.4 Transfer learning: recycling data

Transfer learning, where predictive models previously trained on source data are re-
trained on target data from a different domain, task, or distribution, is a widely
applied concept in machine learning [56]. Deep learning models in particular often
benefit from transfer learning due to their data-intensive nature [77]. [42] addressed
some of the particular challenges of transfer learning in graph neural networks. A few
works have explored transfer learning in the context of engineering design. [93] trained
a convolutional autoencoder on 2D wheel designs before retraining the encoder as a
surrogate model, reducing the required number of simulations. [45] first trained a
model to predict the original parametric design features of an artery before retraining
it to predict the location of maximum stress. [43] used a clustering algorithm to
identify which designs would make for useful source data when applying transfer
learning to microprocessor performance prediction. [6] trained a surrogate model to
predict the drag coefficient of 2,000 primitive shapes before tuning the model on 54
car designs. This paper differs from previous works in that it seeks to systematically
quantify the effects of transfer learning on data efficiency and generalizability across
several common source/target pairs in structural design.

24
2.3 Methodology: surrogate modeling with graphs
The following section presents a new graph-based surrogate model (GSM) for pre-
dicting the displacement of space frame structures.

2.3.1 Data representation: space frames as graphs

This chapter proposes a graph-based representation of space frames, where a set of


vertices 𝑉 = {𝑣1 , ..., 𝑣𝑛 } represent the joints and a set of edges 𝐸 ∈ 𝑉 ×𝑉 represent the
bars. The set of vertices that share an edge with 𝑣𝑖 is referred to as its neighborhood,
and is understood to include 𝑣𝑖 itself. Each vertex 𝑣𝑖 is assigned a feature vector 𝑥𝑖 of
length 𝑟 (𝑥𝑖 ∈ R𝑟 ). The geometry of the space frame is encoded by using the joints’
spatial coordinates 𝑐𝑖 ∈ R2 as vertex features. Additional binary features indicate
the presence of a support 𝑠𝑖 ∈ {0, 1}2 or load 𝑙𝑖 ∈ {0, 1}2 for each degree of freedom.
The geometry, supports and loads are thus encoded by the graph 𝐺0 = (𝑉0 , 𝐸). The
deformed structure is represented by a topologically identical graph 𝐺𝐻 = (𝑉𝐻 , 𝐸),
where now the vertex features encode the displacements 𝑑𝑖 ∈ R2 of each joint under
static load. The proposed graph representation has three main advantages:

1. it encodes the exact spatial coordinates of the geometry

2. it facilitates arbitrary topologies

3. it does not rely on handcrafted design parameters

In contrast with Euclidian representations like images, 1. implies that there is no


information loss when converting the geometry to or from the deep learning repre-
sentation. 2. and 3. enable learning across multiple design spaces.

2.3.2 Convolutions on graphs

The GSM’s primary mechanism is a graph-based convolutional layer. The FeaStNet


[81] convolution was selected because it extends to arbitrary graph topologies, does
not require the selection and pre-computation of pseudo coordinates, and can be

25
Graph-based Surrogate Model (GSM)

batch norm linear FeaStNet

FeaStNet Convolution

+
...

new feature vector

neighbor feature vectors


...
weight matrices

...
Figure 2-1: The graph-based surrogate model (GSM) learns to predict nodal displace-
ments given only geometry, supports and loads as inputs. Structures are represented
as undirected graphs, where each vertex is assigned a feature vector consisting of a
joint’s spatial coordinates and binary variables indicating the presence of supports or
loads. Graph convolutional layers utilize the FeaStNet operator [81].

made transformation invariant in feature space. The latter implies that raw spatial
coordinates can be used directly as input features without having to learn spatial
invariance or transform all designs to a common pose. Geometric deep learning is an
active field; it is likely that other graph-based learning methods are also suitable for
this context and should be considered as future research.

2.3.3 The graph-based surrogate model (GSM)

The proposed surrogate model learns to predict joint displacements given the geom-
etry, supports and loads as inputs. It does so by learning a map from an input graph
𝐺0 = (𝑉0 , 𝐸) to a topologically identical output graph 𝐺𝐻 = (𝑉𝐻 , 𝐸). The surrogate
model is implemented as a graph-based convolutional neural network built from a
single sequence of 𝐻 linear and FeaStNet convolutional layers (Fig. 2-1). All layers
except the final one are followed by a rectified linear (ReLu) activation function. It is
observed that batch normalization applied to the input and after each convolutional
operation significantly improves prediction accuracy. The network architecture, layer

26
dimensions, and number of attention heads per FeaStNet layer dictate the total num-
ber of learnable parameters.

2.3.4 A naive alternative: the pointwise surrogate

A second, simpler type of surrogate model was used to compare against the proposed
graph-based method. This pointwise surrogate consists of several simple regression
models, which each take the spatial coordinates of the structure’s joints (flattened into
a vector) as inputs and predict a single scalar quantity. For a 2D truss with 15 nodes,
this corresponds to training 30 regression models (for the x and y displacement of each
node). The random forest algorithm was selected for this study, but any regression
technique (e.g. kriging, polynomials, radial basis functions) could be used. Note
that the pointwise surrogate relies on a fixed ordering of joints and thus cannot be
extended to multi-topology data sets. Also, note that in the case where all designs are
identically loaded, there is no benefit to including support or load information in the
input, since the designs are represented by a single vector. The pointwise surrogate
was implemented using the scikit-learn [58] random forest class using default settings.

2.3.5 A baseline: predicting the mean

As an additional reference point, consider an even simpler predictive model that sim-
ply predicts the mean displacement across each joint in the training set. Throughout
the chapter, the performance of this naive model is referred to as the baseline. Models
that fail to beat the baseline effectively have no predictive value.

2.4 Characterizing the GSM

The following section presents a series of trials designed to characterize the prediction
accuracy and generalizability of the proposed graph-based surrogate model.

27
2.4.1 Data generation and filtering

Surrogate modeling is most advantageous for computationally-intensive simulations;


however, this work focuses on relatively simple designs because they more effectively
depict the specific design scenarios used to evaluate the GSM (more on this in section
2.5). A set of space frame designs was generated as follows. First, a parametric
design model of a simple two-dimensional truss was built using a combination of
commercial [69] and open source [34] software. The truss is made of steel (E = 30.5
Msi) and consists of beams with constant cross section (A = 0.29 𝑚2 , I = 2.3e-3 𝑚4 ).
A vertical static load of 11.1 kN is applied to all joints on the top of the truss, and
simple supports are applied to two of the bottom joints (Fig. 2-2). The truss was
parametrized using five handcrafted design variables 𝑝1 -𝑝5 , each perturbing the truss
geometry in a particular way. Next, the design model was sampled 1,000 times using
a Latin Hypercube and the resulting designs were simulated with bar elements using
linear elastic Finite Element Analysis (FEA). Finally, the 10% of trusses with the
largest maximum displacement (i.e. the worst-performing designs) were discarded.
For the remainder of the chapter, this design model will be referred to a design model
7 (DM7).

2.4.2 Training and tuning

A GSM was trained to predict joint displacements given a truss design as input.
The truss designs were randomly partitioned such that 68% were used for training,
12% were used for validation, and 20% were reserved for testing. The GSM was
implemented with Pytorch Geometric [24] and trained for 100 epochs on a Tesla
K80 GPU using the ADAM optimizer [37] and a mean squared error (MSE) loss
function. Through a series of grid searches, the optimal architecture was found to
be L16/C32/C64/C128/C256/C512/C256/C128/L64/L2, where L denotes a linear
layer, C denotes a FeaStNet convolutional layer, and the numbers represent the length
of the vertex feature vectors after passing through a given layer. Similarly, the optimal
learning rate was found to be 1e-3 and the optimal number of FeaStNet heads was

28
Figure 2-2: A parametric design model of a truss. Data sets are created by perturbing
design variables 𝑝1 -𝑝5 . Each design is loaded with a uniformly distributed vertical load
across the top and simply supported on the bottom. This particular design model is
referred to as DM7.

found to be 8. The resulting model has 2.7 million training parameters. Throughout
all trials, batch normalization was applied to the input and after each convolutional
layer, the ADAM weight decay was set to 1e-3, and the batch size was set to 256.

Four data transformation strategies were studied: standardization, log transfor-


mation, standardization followed by log transformation, and no transform. It was
found that standardization alone yields the lowest testing MSE.. To study the ef-
fects of including support and load information in the feature vectors, the model was
trained once using spatial coordinates alone as features and compared to when spa-
tial coordinates are used in addition to binary support or load features. It was found
that including both the support and load features in the feature vector improves
prediction accuracy, despite the fact that all trusses were loaded identically. This is
understandable, since the convolution can be thought of as acting on one vertex at a
time.

29
20
Density

10

0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Mean Absolute Error (cm)

Figure 2-3: The GSM and pointwise surrogate achieve comparable predictive per-
formance on the test designs. Both error distributions are left-skewed, with 85% of
designs producing a mean average error of less than 0.1 cm on either model.

2.4.3 Comparing the GSM to the pointwise surrogate

Both the GSM and pointwise surrogate successfully learn to predict a wide range
of structural behaviors. Figure 2-3 shows the distribution of prediction errors for
both models evaluated on the test set. The predictive performance of the two models
is roughly comparable: the mean absolute error (MAE) over the entire test set is
0.049 cm for the GSM and 0.053 cm for the pointwise surrogate (30% and 33% of
the baseline respectively). The error distributions for both models are skewed left,
implying that the models perform well on most of the designs but poorly on a few.
Interestingly, it is observed that many of the designs for which prediction accuracies
are low tend to also exhibit poor structural performance (i.e. large displacements).

2.4.4 Studying generalizability

Effective surrogate models should generalize well to unseen designs. For surrogate
models that rely on bounded, handcrafted design parameters, one might assess gen-
eralizability simply by sampling the design space with sufficient density. In contrast,
graph representations span the set of all conceivable space frames and thus a bounded

30
DM 5
...

DM 6
...
DM 7
...
DM 8

...
DM 9

...
End loads

...

...
Tower
Bridge

...

Figure 2-4: Each row shows a few designs generated from one of the eight design
models used in this chapter. Loads and supports are omitted on all but the first
column for clarity. The design models were selected to test specific scenarios that
commonly arise in engineering design.

Trial Train on Re-train on Test on


(transfer learning)
DM 9 DM 9
A. Learning on a single
--
design model

repeated for DMs 5-8


DM 5 DM 6 DM 7 DM 8 DM 9 DM 9
B. Learning on multiple
--
design models

DM 5 DM 6 DM 7 DM 8 DM 9
C. Generalization to unseen
--
design models

DM 5 DM 6 DM 7 DM 8 DM 9* DM 9
D. Transfer learning
across design models

DM 7 End Loads* End Loads


E. Transfer learning across
load conditions
DM 7 Tower* Tower
F. Transfer learning across
domains - tower

DM 7 Bridge* Bridge
G. Transfer learning across
complexities - bridge

- target design model *small dataset used for re-training

Figure 2-5: An overview of the trials used to assess the GSM’s generalizability across
seven specific scenarios. The first three trials involve a single training, while the
remainder of the trials leverage transfer learning to repurpose a previously trained
GSM for new tasks.

31
Figure 2-6: The GSM can learn on data from multiple design models at once (Trial
B ), and doing so is sometimes advantageous even for cases when only a single design
model is of interest. The GSM does not seem to generalize well to unseen design
models (Trial C ); however, transfer learning is an effective remedy (Trial D ) and
requires a fraction of the data required to train a GSM from scratch.

Figure 2-7: Left: A previously trained Graph-based Surrogate Model (GSM) can be
re-trained on a new data set with differing geometry, loads or topology. Right: Pre-
training significantly increases the data efficiency of the GSM. In these results from
Trial F, a pre-trained GSM trained on 20 designs (N =20) outperforms a fresh GSM
trained on 500.

32
design space does not exist. Developing practical intuition regarding the extent to
which graph-based surrogate models generalize to new designs is an open challenge.

Towards this end, a series of data sets and trials were designed to test the gen-
eralizability of the GSM under a variety of conditions. The truss design model from
section 2.4.1 (DM7) was modified to create four new design models. The new design
models, named DM5, DM6, DM8, DM9 for the number of bars along the top, have
identical outer profiles as DM7 but differing topologies (Fig. 2-4).

The following trials were designed to test the generalizability of the GSM. The
reader is referred to Figure 2-5 for an overview of the trials used throughout the rest
of the chapter. Let the term target refer to the design model of interest to the user,
that is, the design model from which the test set was generated. In Trial A, a GSM
was trained and tested on designs generated from the target design model. Note that
there is no overlap between the training and testing sets. In Trial B, training data
from all of the design models was combined to train the GSM. The GSM was then
tested on designs from the target design model as in Trial A. Trial B thus quantifies
the GSM’s ability to learn on multiple design models simultaneously. Note that this
would be impossible with the pointwise surrogate which is limited to fixed-topology
data. In Trial C, designs originating from the target design model were removed from
the training set, thus testing the GSM’s ability to generalize to unseen design models.
Trials A-C were repeated with each of the five design models (DMs 5-7) as the target,
the results of which can be seen in Figure 2-6.

In Trial A, the GSM archives a MAE of less than 0.1 cm for all design models,
confirming the previous conclusion that the GSM effectively approximates single de-
sign model data. Trial B also produced MAEs less than 0.1 cm across each design
model, indicating that the GSM can learn on data from multiple design models si-
multaneously. Interestingly, for three of the design models (DMs 6-8), the inclusion
of data from other design models actually improved predictions on the target. These
results indicate that it is sometimes beneficial to add designs to the training data
even if they are not from the design model of interest. Note that this did not hold
true for DMs 5 or 9 which might be considered the most different from the rest of the

33
design models in that they have the fewest and most bars, respectively. The degree to
which including off-target designs in the training data benefits training may therefore
depend on how similar those designs are to the target.
Since the GSM is able to learn on multiple design models simultaneously, one
might hope that the model generalizes well to previously unseen design models; how-
ever, this was not the case. The MAEs produced in Trial C were on average 76%
higher than those in Trial B. While the mid-range topologies (DMs 6-8) showed better
generalization than the extremes (DMs 5,9), the general trend was that removing all
target designs from the training data significantly reduces predictive performance. In
other words, the GSM does not seem to generalize well to unseen design models.

2.5 Transfer learning: repurposing the GSM


In light of the GSM’s poor generalization to unseen design models, one might con-
clude that a separate GSM must be trained for each potential target; luckily this is
not the case. This chapter proposes transfer learning as a means of repurposing pre-
viously trained GSMs to new targets, using a fraction of the training data required
to train a GSM from scratch. Transfer learned GSMs thus reduce the number of
required simulations and training epochs, and enable learning across design models,
load conditions, and even separate applications. This section demonstrates the ben-
efits of applying transfer learning to GSMs through four trials that emulate common
design scenarios. Namely, learning across small data sets which vary in topology,
loads, application, or complexity.

2.5.1 Effects on generalizability

Consider a GSM that has been trained to predict the performance of one or more
design models as described in sections 2.3 and 2.4. Let these design models now
be referred to as the source. The subsequent trials demonstrate the performance of
this GSM when re-trained on a small training set from a new target design model
(Fig. 2-7). Multiple strategies exist for applying transfer learning to neural networks.

34
GSM Undeformed Deformed Prediction

GSM with pre-training

25th percentile 50th 75th 100th

20
Density

10

0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Mean Absolute Error (cm)

Figure 2-8: A comparison of prediction error distributions between a pre-trained GSM


and a GSM trained from scratch (Trial D, DM7). Both GSMs were trained on N =200
target designs. Visualizations of predictions representing each error quartile are also
shown. Transfer learning reduces both the mean and standard deviation of prediction
errors across the test designs.

This study employs what is perhaps the most basic: simply retraining all learnable
parameters on the target data set for an additional 100 epochs. The further explo-
ration of transfer learning strategies, for example those that freeze parameters or add
new ones, for engineering design encouraged as future work.

In Trial D, the GSMs that were previously trained (pre-trained) on all design
models but the target (Trial C ) are re-trained on a small dataset (N =200) from the
target model. The results can be seen in the final series of Figure 2-6. The re-trained
GSMs produce significantly better predictions than those in Trial C. In fact, the re-
trained GSMs on average produce 5.5% lower errors than those in Trial B and use less
than a third of the training data. To further analyze the effects of transfer learning
on prediction accuracy, the error distribution from the pre-trained GSM in Trial D
was directly compared to that of a GSM trained only on the 200 design training set
(without pre-training). The distributions can be seen in Figure 2-8. Pre-training on
related source models reduces the average MAE across the test set by 70% and the
standard deviation by 54%, resulting in a more accurate and robust surrogate.

35
0.3
MAE (cm)

baseline
0.2

0.1

0.0
1 10 100 1,000
N

Figure 2-9: Transfer learning consistently improved the GSMs data efficiency, reduc-
ing the amount of training data required to achieve a given prediction accuracy. The
baseline refers to a naive model which always predicts the mean displacement from
the 1,000 design training set.

Prtn (validation) Prtn (training) no prtn (validation) no ptrn (training)


4.0
N = 10 N = 50
Loss

Loss

1.0 2.0

0.0 0.0
0 20 40 60 80 100 0 20 40 60 80 100
Epoch Epoch

6.0
N = 200 N = 1,000
4.0 5.0
Loss
Loss

2.0

0.0 0.0
0 20 40 60 80 100 0 20 40 60 80 100
Epoch Epoch

Figure 2-10: Pre-trained (prtn) GSMs converge faster and to a lower loss value than
those trained from scratch, particularly when the training size (N) is small. All curves
taken from Trial D, design model 7 (DM7).

36
2.5.2 Studying data efficiency

Effective surrogate models should achieve a useful level of prediction accuracy with
a minimum amount of training data. Data efficiency is particularly important in
engineering design, where quality design data is often scarce or prohibitively expensive
to generate. On the other hand, deep learning methods, with their large number of
trainable parameters, are notorious for requiring large data sets. This section explores
the effects of transfer learning on the GSM’s data efficiency.

Trial D was repeated for a variety of target data set sizes. Each data set was
generated as in section 2.4.1, and the full 1,000-design set was reserved for testing.
To ensure that all data sets were similarly distributed, any designs with maximum
displacements exceeding the 90th percentile from the test set were discarded. A
different random seed was used in sampling the 1,000-design training set to ensure
that training and testing sets did not overlap. In addition to the pre-trained GSM
from Trial D, a second (not pre-trained) GSM and a pointwise surrogate were trained
on the target sets for comparison.

The mean absolute prediction errors as a function of training set size (N ) can
be seen in the top row of Figure 2-9. For all models, prediction error correlates
negatively with training size, which is expected. In nearly all cases, the pre-trained
GSM achieves the lowest prediction errors, followed by the pointwise surrogate and
finally by the GSM trained from scratch. Transfer learning improves prediction MAEs
by 48.6%, 40.0% and 34.1% for DMs 5, 7, and 9 respectively. The implication is that
the amount of training data required to achieve a given predictive performance is
reduced by roughly one or two orders of magnitude. For DM 5, a pre-trained GSM
requires only 200 designs to achieve an MAE that is within 10% of the MAE produced
by training on 1,000 designs. For DM7, just 100 designs were sufficient to achieve a
similar result.

Interestingly, transfer learning was most beneficial for the medium-sized training
sets. It is presumed that the smallest training sets do not sufficiently represent the
differences between source and target distributions, while the largest training sets are

37
sufficiently large to train a GSM to its predictive limit from scratch. Positive transfer
was observed across all design models and training sizes, with the exception of the
largest training set for DM9 in which transfer learning increased MAE by 13.7%. This
was the only observed case of negative transfer throughout all trials.

The loss histories of both GSMs reveal further insights about the effects of transfer
learning. Figure 2-10 shows the evolution of training and validation losses for both
GSMs, plotted for four training set sizes. Note that the validation losses for the
transfer learned GSM at epoch zero are initially high and comparable to an untrained
model. At this point, the conditions are quite similar to those in Trial C : the model is
attempting to generalize to an unseen topology. However as training progresses, the
transfer learned validation losses converge faster and to a lower value than those of
the models trained from scratch. Roughly 30 epochs are sufficient to re-train a model,
compared to 100 epochs without transfer learning, representing further computational
savings.

Encouraged by the positive transfer observed in Trial D, one might ask “for which
source and target data sets is transfer learning useful?” The design models DM5-9
differ in topology but have the same outer profile, supports and loads. The following
trials were designed to test other source/target differences that might occur in a
design process. In Trial E, a GSM is pre-trained on 1,000 designs from DM 7 and
re-trained on identical geometry but with different loads (point-loads at the ends as
opposed to a uniform load across the top). Thus Trial E, tests the ability to transfer-
learn across load cases. The remaining two trials test the ability to transfer-learn
across domains. In Trial F, a GSM is again pre-trained on DM 7 and retrained on
a set of trussed towers. The towers were generated by sampling three handcrafted
design parameters. Each is pinned at the bottom and loaded horizontally on the
remaining joints. The spanning trusses (DM5-9) and towers differ in topology and
outer profile, but have a similar number of bars (27 and 26, respectively). In Trial G,
the GSM is pre-trained on DM 7 and re-trained on a set of densely trussed bridges.
The bridges each consist of 404 bars, making them significantly more complex than
the trusses. The bridges are uniformly loaded across the top and simply supported at

38
Table 2.1: The difference in mean absolute error (∆ MAE) between the pre-trained
GSM and a GSM trained from scratch, averaged across all training sizes. Transfer
learning improved prediction accuracy by 19-48%.

Trial Target ∆ MAE (cm) ∆ MAE %


D DM5 -0.087 -48.6%
D DM7 -0.0586 -40.0%
D DM9 -0.106 -34.1%
E End Loads -0.148 -25.5%
F Tower -0.0219 -54.1%
G Bridge -0.0157 -19.8%

the bottom. The hyperparameters described in section 2.4.2 were used for all trials
with the exception of Trial G, which used a batch size of 128 and learning rate of
5e-4.
The results from Trials E, F and G are shown in the bottom row of Figure 2-
9. In Trial E, pre-training on the same geometry but different load cases improved
MAE by an average of 25.5% across all training sizes. In Trial F, pre-training on
trusses improved MAE predictions on towers by an average of 54.1%, and in Trial
G, the same process improved MAE predictions on bridges by 19.8%. The result is a
significant reduction in the amount of required training data. For example, in Trial
E, a GSM pre-trained on trusses achieves better prediction accuracy when re-trained
on 20 tower designs as a GSM which was trained on 500 towers from scratch. Table
2.1 summarizes the findings from Trials D-G. Positive transfer was observed across
all trials and training sizes, although to varying degrees. As before, the medium-
sized training sets generally showed largest benefit and the smallest and largest data
sets showed the least. These results further motivate the use of transfer learning to
repurpose design data and surrogate models for new tasks.

2.6 Conclusions and future work

The proposed Graph-based Surrogate models (GSMs) learn to predict displacement


fields given a structure’s geometry, supports and loads as inputs. Since the GSM does

39
not rely on handcrafted design parameters, it can be trained on data from multiple
design spaces simultaneously, and often benefits from doing so. Transfer learning
was presented as an effective method for repurpose GSMs to new tasks by leveraging
historical data. GSMs that are pre-trained on a related data set achieve 19-48%
lower prediction errors than those trained from scratch. The result is a more flexible,
general and data-efficient surrogate model for space frame structures.
Future work could consider the increasingly wide array of graph-based learning
methods and assess their suitability for space frames. A similar analysis could be
performed for surface and volumetric meshes. Though both are easily represented
as graphs, meshes differ from space frames in that the topology is not physically
meaningful. In terms of transfer learning, further work is required to be able to predict
the most effective sources for a given target. One might also explore alternative
transfer learning strategies in which learnable parameters are added or frozen during
re-training. Finally, the ability to learn across designs of varying complexity (Trial
G) might support hierarchical learning strategies in which models are progressively
trained on higher complexity designs.

40
Chapter 3

SimJEB: Simulated Jet Engine


Bracket Dataset

Figure 3-1: Introducing SimJEB: a diverse collection of hand-designed en-


gineering CAD models and accompanying structural simulations. Ac-
cess at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.
7910/DVN/XFUWJGSimJEB

Recent advancements in geometric deep learning have enabled a new class of en-
gineering surrogate models; however, few existing shape datasets are well-suited to
evaluate them. This chapter introduces the Simulated Jet Engine Bracket Dataset
(SimJEB): a new, public collection of crowdsourced mechanical brackets and high-
fidelity structural simulations designed specifically for surrogate modeling. SimJEB
models are more complex, diverse, and realistic than the synthetically generated
datasets commonly used in parametric surrogate model evaluation. In contrast to
existing engineering shape collections, SimJEB’s models are all designed for the same
engineering function and thus have consistent structural loads and support conditions.
The models in SimJEB were collected from the original submissions to the GrabCAD

41
Jet Engine Bracket Challenge: an open engineering design competition with over 700
hand-designed CAD entries from 320 designers representing 56 countries. Each model
has been cleaned, categorized, meshed, and simulated with finite element analysis ac-
cording to the original competition specifications. The result is a collection of diverse,
high-quality and application-focused designs for advancing geometric deep learning
and engineering surrogate models. This chapter also appeared as a standalone paper
accessible at https://arxiv.org/abs/2105.03534.

3.1 Introduction

Physical simulation plays an important role in designing high-performance engineer-


ing components, though the number of required simulations can be computationally
prohibitive. Surrogate models, also known as metamodels or response surfaces, are
data-driven approximations of computationally-intensive physical simulations that
critically accelerate the design process [84, 25, 61]. While traditional surrogates re-
lied on simple regression models that operated over handcrafted shape parameters,
modern surrogates are beginning to leverage geometric deep learning methods for
learning directly on 3D shapes [59, 6, 20, 18]. The implications for engineering de-
sign are profound. No longer bound by a parametric design space, this new class of
surrogate models has the ability to learn on shape collections of arbitrary complexity
and diversity. Unfortunately, a lack of quality datasets means that most of these
surrogate models are still being evaluated on synthetically generated data and thus
not exploiting their full potential. Furthermore, there does not yet exist a standard
benchmark for surrogate modeling for structural engineering applications with which
different models can be compared.
Existing shape collections tend to embody one of two extremes. On one hand are
synthetically generated datasets, in which a domain expert handcrafts shape proce-
dures or parameters and then randomly samples until the desired number of shapes
has been created. While synthetic generation allows for precise control over shape
variation, it is challenging to design parameters that will produce both diverse and

42
realistic shapes. As a result, most synthetically generated collections suffer from ei-
ther excessive homogeneity or a lack of realism. On the other hand are shape datasets
collected from various public repositories (i.e. "the wild"). Collected shapes do not
suffer from the realism problem and have been invaluable for developing tasks like
classification and segmentation; however, the lack of control over shape variation and
function typically makes these datasets ill-suited for surrogate modeling, where each
shape should be designed for the same engineering task.

This work explores a third source of shape data: online design competitions.
Design competition entries occupy the sweet spot between generated and collected
datasets. The designs are complex, diverse, and realistic since each one is hand-
designed by a different domain expert, and yet each conforms to the functional engi-
neering requirements enforced by the competition. Furthermore, the participating en-
gineers typically design CAD with structural simulation in mind, resulting in cleaner,
higher quality CAD models than one might encounter in the wild.

This chapter introduces the Simulated Jet Engine Bracket Dataset (SimJEB),
a new public shape collection for testing geometric machine learning methods with
an emphasis on surrogate modeling (Figure 3-1). The bracket designs in SimJEB
originate from the GE Jet Engine Bracket Challenge: an open engineering design
competition hosted in 2013 by GrabCAD.com [36] (Figure 3-2). The original compe-
tition featured over 700 entries, representing 320 designers from 56 countries whose
work is estimated to have taken 14 person-years [52]. The diversity of the entries
reflects that of their creators, employing a broad range of design strategies, styles
and structural behaviors. As mandated by the competition, each bracket has the
same four bolt holes and interface point so that they might all be used for the same
engineering task. In SimJEB, each design has been cleaned, meshed, and simulated
according to the competition’s original structural load cases by the author of this
thesis. For additional analytical potential, each design is labeled as belonging to one
of six design categories as determined by a domain expert. Although these particular
brackets were designed for use in a jet engine, their structural behavior and design
objectives are representative of most structural engineering tasks in civil, mechanical

43
and biomedical engineering. The primary contributions are summarized as follows:

1. Introducing SimJEB: a public collection of 381 hand-designed structural engi-


neering CAD models, accompanied by structural simulation results and design
category labels, designed for evaluating engineering surrogate models

2. Characterizing the collection in terms of geometry, meshes, and structural be-


havior

3. Proposing a benchmark, including suggested train/test splits, quality metrics,


and results from a naive approach, to advance engineering surrogate models and
geometric deep learning models

Section 3.2 describes how SimJEB differs from existing datasets, section 3.3 de-
scribes the geometry processing pipeline used to create the dataset, section 3.4 char-
acterizes the dataset in terms of shape and structural behavior, section 3.5 addresses
licensing, access and attributions, section 3.6 proposes how SimJEB might be used
as a surrogate modeling benchmark, and section 3.7 summarizes the conclusions and
offers suggestions for future work.

Figure 3-2: The GE Jet Engine Bracket Competition hosted by GrabCAD.com drew
contributions from engineers of many backgrounds and experience levels to compete
for cash prizes

44
3.2 Related work

3.2.1 Synthetically generated shape datasets

Many techniques exist for generating 3D shapes. The graphics community has made
extensive use of procedural models for generating content like buildings [54], space-
ships [67] and indoor scenes[60] (see [74] for a review of procedural modeling tech-
niques). While effective for digital content generation, procedural modeling can be
difficult to apply to engineering design where manufacturing constraints or package
space requirements necessitate more precise control over allowable shapes. Paramet-
ric CAD models allow engineers to precisely explore shape parameters (i.e. a design
space) [73]. Parametric models can also be constructed via mesh morphing [40],
a technique used extensively in mechanical engineering for shape optimization [76].
Both procedural and parametric generation require the user to manually codify all of
the ways in which shapes may vary. Put eloquently by Krispel et al., "Shape design
becomes rule design" [39]. This constraint limits the diversity and realism of shapes
that can be generated.
More recently, deep learning methods have been used to learn shape generation
schemes from a collection of training shapes (see [16] for a recent review). Deep shape
generation has even been applied to engineering design. [80] used an autoencoder
to learn shape parameters from collected vehicle designs. [55] trained a generative
adversarial model on samples from a parametric topology optimization model. While
learning is a promising direction for shape generation, models require large training
sets for each new application. Datasets like SimJEB can play a critical role in learning
more realistic and practical shape generation models for mechanical design.

3.2.2 Collected shape datasets

Several large shape datasets have been released in recent years, including the Prince-
ton Shape Benchmark [75], ModelNet [89], ShapeNet [14], Thingi10k [96], ABC [38]
and Fusion360 [87]. These datasets have been impactful on a wide range of geometry

45
Figure 3-3: The semi-automated pipeline used to filter, clean, mesh and simulate
the raw CAD contest submissions. Tasks such as orienting, meshing, and checking
cleanliness are relatively easy to automate, while tasks that require engineering intu-
ition like assessing part relevancy are best left up to a domain expert.

processing tasks, including classification, segmentation, surface normal estimation


and shape retrieval (to name a few); however, the lack of accompanying engineer-
ing simulations prohibits their use for developing surrogate models. [18] performed
fluid dynamics simulations on some of the ShapeNet aircrafts and watercrafts but the
dataset was not made public. Aside from the effort required to clean geometry, mesh
and simulate, the fundamental challenge with using shape collections in engineering
simulations is that the operating conditions for which the part was designed are un-
known, and thus it is not possible to characterize its design performance. [72] used
the method of manufactured solutions to solve the Poisson equation on Thingi10k
models. While suitable for PDE discretization studies, the method of manufactured
solutions imposes an arbitrary analytical solution and thus does not attempt to cap-
ture the physical response of the part under normal operating conditions. In contrast,
the specific load and support conditions are known and consistent across all SimJEB
models, enabling accurate engineering simulation.

3.2.3 Design competition data

Engineering design competitions have long been a source of innovation (e.g. The Lon-
gitude Act of 1714 [12] and the Tower of London competition of 1890 [82]). Though

46
modern design competitions, like those hosted by NASA [22], and ASME [3], con-
tinue to yield useful designs, they are a relatively untapped source for functional
shape data. Previous works have utilized data from the GE Jet Engine Bracket Chal-
lenge, including several that have used the geometry and loads for testing topology
optimization methods [27, 19, 53, 51]. These works used the competition package
space and loads but did not consider the dataset as a whole. [52] studied 10 of the
challenge entries in detail as part of a case study in sustainable design but did not
perform physical simulation or release the data. [47] trained a voxel-based surrogate
model to predict support material and print time for additive manufacturing using
300 voxelized bracket designs. The data were not made public.
SimJEB is a public collection of 381 cleaned, meshed and simulated designs from
the GE Jet Engine Bracket Challenge. The collection is a step towards advancing
geometric deep learning methods for realistic, functional engineering components.

3.3 Geometry cleaning and simulation pipeline


The following section describes the semi-automated workflow used to acquire, prepro-
cess and simulate each bracket model in SimJEB. The complete workflow is depicted
in Figure 3-3.

3.3.1 Design competition overview

The GE Jet Engine Bracket Challenge was a large engineering design competition
hosted by General Electric and GrabCAD.com in 2013 [36]. The competition at-
tracted 700 submissions from 320 designers representing 56 countries. By one esti-
mate, the total amount of human-hours required to design the brackets was 700 work
weeks or 14 human years [52]. Participants were challenged to design the lightest pos-
sible lifting bracket for a jet engine, subject to the constraint that the maximum stress
in the part did not exceed the yield stress of Ti-6Al-4V titanium over four specified
load cases. Entries were limited to designs that fit within a provided package space,
and were required to have four bolt holes and an interface hole in specific locations.

47
The bracket designs were also required to be manufacturable via additive manufactur-
ing. Cash prizes totalling to $30,000 USD were distributed among multiple winners
selected by a panel of mechanical engineers from GE and GrabCAD.com.

Simulation successful: 381


Clean: 514

All CAD: 635

Meshing failed: 103

Require manual repair: 43 Simulation failed: 30


Irrelevant/duplicate: 40
Poor quality: 31
Could not open: 7

Figure 3-4: The "leaky" pipeline used to filter, clean, orient, mesh, and simulate
bracket models. Often models can be repaired manually but with diminishing return

3.3.2 CAD file acquisition

On the date of access (June 4th, 2020), the GE Jet Engine Bracket Competition
website had 629 entries. While most entries contained a single CAD file, some entries
contained redundant designs in different CAD file formats, some contained multiple
design variations, and still others were missing a CAD file entirely. In the first pass,
files were filtered programmatically. If entries contained multiple CAD files with dif-
ferent names (e.g. "bracket_v1.stp", "bracket_v2.stp") then both files were retained
as they were assumed to be different design variants. If entries contained multiple
CAD files with the same name save the extension (e.g. "model1.stp", "model1.igs"),
only the file with the highest priority extension was retained, where the priority was
defined as follows: .catpart, .sldasm, .sldprt, .prt, .stp, .step, .x_b, .x_t, .iges, .igs,
.ipt. Note that native formats were preferred over neutral ones. In the second pass,
the remaining entries with multiple CAD files were manually screened for obvious
redundancies (e.g. "GE_Bracket.stp", "GE_Bracket_color_changed.stp"). At the
end of the file selection process, 56 entries had more than one valid CAD file, 518

48
entries had exactly one, and 55 entries did not have any, resulting in a total of 635
raw CAD files.

3.3.3 Geometry cleaning

Prior to performing structural analysis, each CAD model was cleaned, tagged, scaled,
and oriented to a canonical pose through the following semi-automated process. First,
an automated check was performed to get the total number of closed volumes in the
model. Next, the user was prompted to review the model and optionally assign one of
the following tags: duplicate, irrelevant, non-repairable. The model was considered to
be clean if it contained exactly one closed volume and was not assigned a tag. 447 of
the 635 models were determined to be clean after the first pass. The units of length
for each clean model were automatically inferred from the volume of its bounding box
and the appropriate scale was applied such that all models were defined in millimeters.
The user was then prompted to select three reference points on the model which were
used to translate and rotate it to a canonical pose. The models that were considered
unclean were either manually cleaned, by deleting extraneous geometry and sliver
surfaces or patching non-watertight volumes, or determined to be non-repairable in
a second pass, resulting in a total of 514 clean CAD models (Figure 3-4).

3.3.4 Finite element structural simulation

A similar semi-automated process was used to build the finite element models using
commercial software [1, 2]. First, the user was prompted to select the surfaces defin-
ing all four bolt holes and the interface hole. Next, a first-order tetrahedral mesh was
generated with an average element size of 2 mm. Each bolt was modeled by a rigid
RBE2 spider element connected to each mesh node on the selected bolt surfaces and
constrained by a Single Point Constraint (SPC) at the center. An RBE3 spider ele-
ment was used to distribute each of the four loads across the surfaces of the interface
hole. As specified in the original competition, the bracket material was modeled as
Ti-6Al-4V aluminum (E=113.8 MPa, 𝜈 =0.342, 𝜌 =4.47e-2 g/mm3 ). Finally, each

49
of the four load conditions were simulated using linear-static FEA and the resulting
displacements and von Mises stresses were recorded for each node (Figure 3-5). Note
that the structural analysis performed for SimJEB may use slightly different assump-
tions than those of the original competition are are not meant to replace or correct
any simulations performed by the original designers.

Figure 3-5: Each bracket was simulated according to the four load conditions speci-
fied by the competition. Five vertex-valued scalar fields were extracted for each load
case: the displacement in X,Y,Z directions, the displacement magnitude, and the von
Mises stress.

3.4 Dataset characterization


Despite being designed for the same functional purpose, the SimJEB bracket models
are remarkably diverse. This section characterizes variation in geometry and struc-
tural performance across the dataset.

3.4.1 Characterization of geometry

The broad range of designer backgrounds, experience levels, and software tools behind
the bracket designs are reflected in the design diversity. While each design has the
same bolt holes and interface point mandated by the competition, the remainder of the
shape was left to the engineer’s imagination. The topology, complexity, and structural

50
design strategy thus vary significantly (Figure 3-6). In SimJEB, each bracket has been
manually assigned to one of six general design categories: block, flat, arch, butterfly,
beam and other. Block designs were defined as those that occupy a large portion of the
allotted package space. Flat designs were considered to those that have mostly flat
regions between the bolt holes and interface point, while arch and butterfly designs
have positive and negative curvature in these regions, respectively. Beam designs
were considered those that have long, slender beam-like regions. Designs that did not
fit well into any of these categories were labeled as other. The above categorization
serves two purposes: 1) it provides a convenient way to partition the data into more
homogeneous subsets, and 2) it can be used as labels for classification tasks. Note
that both the definition and assignment of these categories is subjective and imperfect
but may be useful in practice for certain modeling or geometry processing tasks.

Figure 3-6: Top left: all brackets are manually labeled as belonging to one of six gen-
eral design categories. Top right: The volume was bounded above by the competition-
specified package space (apart for three designs which violate the rule). Bottom:
brackets range in the number of triangular and tetrahedral elements in the surface
and simulation meshes, respectively.

51
3.4.2 Characterization of structural performance

A common objective in structural engineering is to find lightweight shapes and mate-


rials that can withstand specified structural loads. Mechanical failure of metal parts
will occur if the maximum von Mises stress at any point inside the part exceeds the
known yield stress of the material. Note that the maximum stress does not necessarily
lie on the part boundary. A second common objective is to maximize the stiffness of
the part, which can be thought of as minimizing the maximum displacement resulting
from a given load. Minimizing mass is almost always a competing objective with min-
imizing displacement and stress, thus the challenge lies in finding (manufacturalble)
shapes that provide the optimal balance of these quantities for a given application.
The geometric diversity of SimJEB models naturally leads to a wide array of
structural performance. Figure 3-7 shows the performance distribution for each design
category in terms of two competition objectives: maximum displacement (over all load
cases and vertices) and mass. Designs closer to the origin are lighter and stiffer, and
thus more desirable from a structural engineering standpoint. Note that block designs
tend to be heavier and stiffer, while more minimalist beam designs are generally
the opposite. Arch and butterfly designs seem to exhibit the tightest clustering of
desirable behaviors. Interestingly, the Pareto front, that is, the set of designs that are
optimal for at least one relative weighting of objectives, contains at least one bracket
from each design category.

3.5 Licensing, attributions and access


The Simulated Jet Engine Bracket Dataset (SimJEB) is made available under the
Open Data Commons Attribution License. CAD files within the database are from the
“GE jet engine bracket challenge”, and licensed for non-commercial use by GrabCAD.
All other rights in individual contents of the database are licensed under the Open
Data Commons Attribution License.
SimJEB is hosted through Harvard Dataverse at the following link: dataverse.
harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JYJ094 (Figure 3-

52
Figure 3-7: Multi-objective plots for each of the design categories and for the Pareto
optimal designs. Maximum displacements are taken over all vertices and load cases.
Optimal designs, which are lightweight and stiff, are located closer to the origin.
Note the variety of structural performance within each design category and across
the dataset.

8). Access requires making a free Harvard Dataverse account. The following data
are available for each bracket design: clean CAD (.stp), finite element model (.fem),
tetrahedral mesh (.vtk ), triangular surface mesh (.obj ) and simulation results (.csv ).
Each file type is packaged into a separate zip file to facilitate use cases requiring
only a subset of file types. A single metadata file provides summary statistics for
each bracket and three train/test splits for a benchmark (further explained in section
3.6). Additionally, a sample zip file containing one of each file type is provided for
convenience. Models are identified by an integer; the files 0.stp, 0.fem, 0.vtk, 0.obj
and 0.csv thus all belong to model 0. A global README and local README in
each zip file attribute each model to its original designer and provide a link to the
original submission on grabCAD.com. Any questions regarding the dataset should
be directed to ewhalen@mit.edu.

53
Figure 3-8: The SimJEB dataset is available for public use and hosted through
Harvard Dataverse. See section 3.5 for access instructions.

3.6 Surrogate modeling benchmark

Though SimJEB is applicable to a wide range of geometry processing tasks, it was


designed primarily for engineering surrogate modeling, that is, learning to predict
the displacement (or stress) fields on a 3D shape. The diversity and complexity of
the models in SimJEB both exceed those of traditional surrogate modeling datasets,
which are almost always synthetically generated. Therefore, SimJEB can be seen as
a challenge problem for the learning and structural modeling communities.

A naive surrogate model is presented to demonstrate how SimJEB might be used


as a benchmark. This naive model is simply a degree-three polynomial in the spatial
coordinates, with the resulting function approximating the average scalar field across
all designs in the training set. Besides demonstrating the benchmark process, this
naive model serves as a reference point for predictive performance. Surrogate models
that fail to beat this naive model effectively have no predictive value. To standardize
training and testing data, three 80/20 train/test splits are provided with the SimJEB
metadata. A naive surrogate was trained to predict each of the five scalar fields (x,y,z )
components of the displacement, magnitude of displacement, von Mises stress) for
each of the four load cases (vertical, horizontal, diagonal, torsional) and for each of
the three train/test splits, resulting in 60 surrogate models total. The mean absolute
errors (MAEs) in prediction averaged over the three test sets can be seen in Table 3.1.

54
Table 3.1: The Mean Absolute Error (MAE) of the naive surrogate model averaged
over three train/test splits. These values can be used as a reference point for bench-
marking future surrogate models.

Vert. Horiz. Diag. Tor.


Disp-X (mm) 6.27e-2 1.62e-1 3.17e-2 1.27e-1
Disp-Y (mm) 4.17e-2 1.97e-2 1.66e-2 4.87e-2
Disp-Z (mm) 1.46e-1 1.62e-1 2.80e-2 2.37e-1
Disp-Mag (mm) 1.69e-1 2.51e-1 4.21e-2 2.87e-1
VM Stress (MPa) 6.01e+1 8.93e+1 3.61e+1 8.44e+1

Note that MAE is typically preferred over other potential quality metrics because the
units are easily interpretable.

3.7 Conclusions and future work


SimJEB is an new collection of realistic, hand-designed engineering models for advanc-
ing geometric deep learning methods. The designs have the same boundary conditions
and are accompanied by high-fidelity structural simulation results which makes them
ideal for evaluating engineering surrogate models, though the dataset is applicable to
a wide range of geometry processing tasks. As the bracket models are hand-designed
by structural engineers, they are more realistic and diverse than the synthetically
generated datasets. The dataset is characterized in terms of geometry and structural
performance, and a benchmark is proposed for surrogate model evaluation. Future
work could include improving the robustness of the geometry cleaning and simulation
pipeline to increase the total number of bracket designs. Other future projects may
include providing RBG-D or multi-view images to support a wider range of learning
representations.

55
56
Chapter 4

Heuristics for improving the accuracy


and convergence of physics-informed
neural networks in structural
mechanics

Physics-informed neural networks (PINNs) have the potential to improve the data-
efficiency of structural surrogate models by leveraging governing equations; however,
the training process is more complex and less robust than that of purely data-driven
methods. This work proposes two heuristics that aid in the training process. The first
concerns the normalization of each term in the loss function. The second concerns
a multi-step refinement strategy in which the magnitudes of the predictions in one
step are used to scale the outputs of the next. Both heuristics are demonstrated on
a canonical linear elastic problem, for both hard and soft boundary conditions. The
proposed methods have the potential to improve the accuracy and convergence of
PINNs used for structural applications.

57
4.1 Introduction

Physics-informed neural networks (PINNs) were popularized by [64] as a way of solv-


ing partial differential equations (PDEs) with a neural network. By including the
residuals of a PDE and boundary conditions into the loss function, a neural network
can learn a smooth approximation of the solution as it is trained, essentially acting as
an iterative solver. PINNs can also be trained with a combination of physics-informed
and data-driven loss terms, resulting in a hybrid model between data-driven surro-
gates and physical solvers. While PINNs have potential to be more accurate and
data-efficient than data-driven surrogate models because they incorporate a system’s
governing equations, they are also significantly more difficult to train. This limits
their accessibility and impact as engineering tools.

The challenges associated with training PINNs are well-documented [85]. Since
the loss function is typically formulated as a weighted sum of residuals, one challenge
concerns choosing effective coefficients for each term. Improper weighting of the loss
terms can result in poor accuracy and even a failure to converge to the correct solution.
Though some heuristics exist for choosing loss weights, they are often chosen through
trial and error. The second challenge concerns output scaling. Previous works have
noted that scaling the PINNs outputs to be the same order of magnitude as the
desired solution aids training, but this requires knowing the solution magnitudes
a priori. While one might argue that the magnitude of the solution in structural
problems can often be inferred from engineering judgement, this is not always the
case. For new types of materials, geometries, or loads, guessing the magnitude of the
solution is nontrivial.

This chapter presents new heuristics for addressing two existing challenges in
training PINN’s. The first challenge concerns choosing effective coefficients for each
term in the loss function. By noting that training performance improves when all loss
terms are roughly of equal magnitude, this work proposes normalizing each term in
the loss function by dividing it by the losses produces from the first training step. The
second challenge concerns output scaling. Rather than guessing solution magnitudes,

58
this work suggests a multi-step training process, where the outputs of an un-scaled
network become the scales of the second network. The result is shown to improve
prediction accuracy and convergence on a canonical linear elastic problem. Finally,
this work concludes with a quantitative comparison between hard and soft implemen-
tations of the boundary conditions in the refinement step. The key contributions of
this chapter are as follows:

1. A multi-step training procedure for heuristically selecting loss weights and out-
put scales for physics-informed neural networks that results in improved con-
vergence and accuracy

2. A quantitative comparison of the heuristics for both hard and soft constraint
enforcement

The remainder of the chapter is organized as follows: section 4.2 briefly reviews
previous works, section 4.3 proposes the heuristics for training PINNs, section 4.4
presents experimental results, and section 4.5 concludes the chapter and offers sug-
gestions for future work.

4.2 Related work

The use of neural networks for approximating the solution to PDEs dates back to the
1990s [41]; however, the PINN was developed and popularized by [64], who suggested
using the network’s auto differentiation [7] capabilities for solving PDEs in the strong
form. Since then, PINNs have been proposed for a wide range of PDEs and appli-
cations, including fluid dynamics [65, 5, 88], fracture mechanics [29], and linear and
nonlinear structural mechanics [66, 31, 71]. Rao et al. [66] introduced the concept
of hard boundary conditions for elastic body problems, guaranteeing that Dirichlet
boundary conditions are exactly satisfied. This work utilizes a similar PINN formu-
lation to that in Rao et al. but applies the new heuristics for weighting loss terms
and scaling outputs.

59
This is not the first work to propose heuristics for training PINNs. Wang et al.
[85] explored the numerical instabilities that occur when training PINNs, drawing
on intuition from forward Euler integrators. Wang et al. propose a heuristic for
dynamically annealing the loss terms based on the concept of momentum. In contrast,
the method proposed in this work is simpler to implement and only needs to be
calculated at the beginning of training.

4.3 Methodology

The following section briefly outlines PINNs for static, plane-stress, linear elastic
problems and then introduces two heuristics for improving the convergence and ac-
curacy of PINNs in practice.

4.3.1 Equations of elasticity

The behavior of elastic bodies under load is governed by the following system of
PDEs, known as the the elasticity equations:

∇ · 𝜎 + 𝐹 = 𝜌𝑢𝑡𝑡 (4.1)

1
𝜀 = (∇𝑢 + (∇𝑢)𝑇 ) (4.2)
2

𝜎 = 𝐶𝜀 (4.3)

where 𝜎 is the Cauchy stress tensor, 𝐹 is the body force, 𝜌 is the density, 𝑢𝑡𝑡 is
the acceleration vector, 𝜀 is the strain tensor, 𝑢 is the displacement vector, and
𝐶 is the fourth-order constitutive tensor. Equation 4.1 is known as the equation of
motion, equation 4.2 is the strain-displacement relationship, and 4.3 is the constitutive
equation. Under the assumptions of equilibrium, plane stress, and linear isotropic
materials, and in the absence of body forces, the problem can be rewritten as follows:

60
𝜕𝜎𝑥𝑥 𝜕𝜏𝑥𝑦
+ =0 (4.4)
𝜕𝑥 𝜕𝑦

𝜕𝜎𝑦𝑦 𝜕𝜏𝑥𝑦
+ =0 (4.5)
𝜕𝑦 𝜕𝑥

𝐸 𝜕𝑢𝑥 𝜕𝑢𝑦
( + 𝜈 ) − 𝜎𝑥𝑥 = 0 (4.6)
1 − 𝜈 2 𝜕𝑥 𝜕𝑦

𝐸 𝜕𝑢𝑥 𝜕𝑢𝑦
2
(𝜈 + ) − 𝜎𝑦𝑦 = 0 (4.7)
1−𝜈 𝜕𝑥 𝜕𝑦

𝐸 𝜕𝑢𝑥 𝜕𝑢𝑦
( + ) − 𝜏𝑥𝑦 = 0 (4.8)
2(1 + 𝜈) 𝜕𝑦 𝜕𝑥
where 𝐸 is the Young’s modulus, 𝜈 is the Poisson’s ratio, 𝜎𝑥𝑥 and 𝜎𝑦𝑦 are the normal
stresses in the 𝑥 and 𝑦 directions, and 𝜏𝑥𝑦 is the shear stress.

4.3.2 Soft boundary conditions

The loads and supports in a structural analysis problem define boundary conditions,
which in turn specify a unique solution to the elasticity equations. The two most
common types of boundary conditions in structural problems are enforced displace-
ments and enforced tractions. Supports are a special case of enforced displacement
where the displacement is zero.
Enforced displacements can be described by the following Dirichlet boundary con-
ditions:
𝑢𝑥 − 𝑑𝑥 = 0 on 𝜕Ω (4.9)

𝑢𝑦 − 𝑑𝑦 = 0 on 𝜕Ω (4.10)

where 𝑑 is the imposed displacement and 𝜕Ω is some domain boundary or subset of


a boundary. External forces that are distributed over a boundary can be represented

61
as a traction:
𝜎𝑥𝑥 𝑛𝑥 + 𝜏𝑥𝑦 𝑛𝑦 − 𝑇𝑥 = 0 on 𝜕Ω (4.11)

𝜏𝑥𝑦 𝑛𝑥 + 𝜎𝑦𝑦 𝑛𝑦 − 𝑇𝑦 = 0 on 𝜕Ω (4.12)

where 𝑇 is the traction in units of force per area, and 𝑛𝑥 and 𝑛𝑦 are the 𝑥 and 𝑦
components of the surface normal. Note that when the applied load is normal to
the surface the traction conditions impose a normal stress. Similarly, loads that are
tangent to the surface impose a shear stress.

4.3.3 Network and loss function

A neural network can approximate the solution to a PDE by learning a map between
spatial or temporal variables and state variables (Figure 4-1). For the case of a static,
plane stress problem, the network can take the following form:

𝒩𝜃 ([𝑥, 𝑦]𝑇 ) = [𝑢𝑥 , 𝑢𝑦 , 𝜎𝑥𝑥 , 𝜎𝑦𝑦 , 𝜏𝑥𝑦 ]𝑇 = 𝑠 (4.13)

where 𝒩𝜃 is a fully connected neural network with learnable parameters 𝜃. Note that
while the displacement vector alone is enough to fully define the state of an elastic
body problem, this work additionally includes stresses in the output. This mixed
variable formulation was shown by [66] to improve accuracy and convergence.
Training a neural network equates to solving an optimization problem, where
the optimal network parameter values are those that minimize a loss function. In
a traditional, purely data-driven neural network, the most common loss function for
regression tasks is the mean squared error (MSE) between the predictions and the
ground truth:
𝑁
1 ∑︁
ℒ𝑑𝑎𝑡𝑎 = (𝑠𝑖 − 𝑠ˆ𝑖 )2 = ‖𝑠𝑖 − 𝑠ˆ𝑖 ‖2 (4.14)
𝑁 𝑖=1

where 𝑠ˆ𝑖 denotes the ground truth value for observation 𝑖 in a training set of size 𝑁 .
All that is required to convert a data-driven network into a physics-informed network

62
predict

...

...

...
...

auto diff

Figure 4-1: The PINN learns a map between 2D spatial coordinates and the displace-
ment and stress field by minimizing a loss function containing PDE residuals. Partial
derivatives are computed via auto differentiation.

is to add terms to the loss function that enforce the PDE and boundary conditions.
Note that equations 4.6 - 4.12 evaluate to zero for the desired solution, so a natural
loss function is to minimize the MSE of these terms:

ℒ𝑝𝑖𝑛𝑛 = ℒ𝑑𝑎𝑡𝑎
⃦ 𝜕𝜎𝑥𝑥 𝜕𝜏𝑥𝑦 ⃦2
⃦ ⃦
+𝜆𝑚𝑥 ⃦ ⃦ + ⃦
𝜕𝑥 𝜕𝑦 ⃦Ω
⃦ 𝜕𝜎𝑦𝑦 𝜕𝜏𝑥𝑦 ⃦2
⃦ ⃦
+𝜆𝑚𝑦 ⃦ ⃦ + ⃦
𝜕𝑦 𝜕𝑥 ⃦Ω
⃦ ⃦2
⃦ 𝐸 𝜕𝑢 𝑥 𝜕𝑢 𝑦 ⃦
+𝜆𝜀𝑥 ⃦ (
⃦ 1 − 𝜈 2 𝜕𝑥 + 𝜈 ) − 𝜎 𝑥𝑥

𝜕𝑦 ⃦
⃦Ω
⃦2

⃦ 𝐸 𝜕𝑢𝑥 𝜕𝑢𝑦
+𝜆𝜀𝑦 ⃦
⃦ (𝜈 + ) − 𝜎𝑦𝑦 ⃦ (4.15)
1 − 𝜈 2 𝜕𝑥 𝜕𝑦 ⃦
⃦Ω
⃦2

⃦ 𝐸 𝜕𝑢 𝑥 𝜕𝑢 𝑦
+𝜆𝜀𝑥𝑦 ⃦ (
⃦ 2(1 + 𝜈) 𝜕𝑦 + ) − 𝜏 𝑥𝑦 ⃦

𝜕𝑥 Ω

+𝜆𝑑𝑥𝑖 ‖𝑢𝑥 − 𝑑𝑥 ‖2𝜕Ω𝑖

+𝜆𝑑𝑦𝑖 ‖𝑢𝑦 − 𝑑𝑦 ‖2𝜕Ω𝑖

+𝜆𝑇𝑥𝑖 ‖𝜎𝑥𝑥 𝑛𝑥 + 𝜏𝑥𝑦 𝑛𝑦 − 𝑇𝑥 ‖2𝜕Ω𝑖

+𝜆𝑇𝑦𝑖 ‖𝜏𝑥𝑦 𝑛𝑥 + 𝜎𝑦𝑦 𝑛𝑦 − 𝑇𝑦 ‖2𝜕Ω𝑖

where 𝜕Ω𝑖 is the 𝑖th boundary and the 𝜆s are scalar weights that determine the
priority of each term. Note that the data loss ℒ𝑑𝑎𝑡𝑎 is optional and that without it,

63
the PINN effectively becomes a solver.

4.3.4 Hard boundary conditions

With soft boundary conditions, boundary condition enforcement is weighted against


the other residuals in the loss function, thus, there is no guarantee that they will
be satisfied perfectly. This may be undesirable in a structural engineering context,
where the solution is assumed to be in static equilibrium. Rao et al. [66] introduced
a method for enforcing boundary conditions exactly (i.e. in a "hard" manner) as
follows:
𝒩𝜃𝐻𝐴𝑅𝐷 (𝑥, 𝑦) = 𝒩𝜃 (𝑥, 𝑦) × 𝐷𝑖 (𝑥, 𝑦) + 𝑠𝑖 (𝑥, 𝑦) (4.16)

where 𝐷𝑖 (𝑥, 𝑦) is the shortest distance from a point (𝑥, 𝑦) to the boundary 𝜕Ω𝑖 on
which the value 𝑠𝑖 (𝑥, 𝑦) is imposed. Note that this formulation guarantees that pre-
dictions on the boundary will match the prescribed value. Note also that boundary
conditions imposed in a hard manner no longer need to be included in the loss func-
tion.

4.3.5 Heuristic 1: loss term normalization

This work proposes an automated method for selecting the coefficients 𝜆 in the
weighted loss function (Eq. 4.3.3). Based on the observation that PINNs converge
faster when the loss terms are roughly of equal magnitude, this work proposes nor-
malizing each loss term by selecting 𝜆s such that the terms are scaled to one. In
practice, this can easily be achieved by training for a single step, recording the value
of each term in the loss function, and setting the coefficients equal to the reciprocal:

1
𝜆𝑛 = (4.17)
ℒ0𝑛

where 𝜆𝑛 is the coefficient for the 𝑛th term in the loss function and ℒ0𝑛 is the value
of that term after the 0th training step. In practice, this loss term normalization
seems to work well, although there are some cases where a term needs to be further

64
Initialize weights

ADAM: 1 step, lr=1e-3 Scale by distance to BC


Add boundary value
Balance loss terms Hard BCs
Soft BCs

Scale network outputs


ADAM: 10k steps, lr=1e-2
by predictions

L-BFGS: 5k steps

Train ...
ADAM: 1k steps, lr=1e-3

...
L-BFGS: 1k steps ...

Predict
Contributions

Figure 4-2: The loss term normalization is applied before training to ensure that each
objective is weighted equally. After the initial training, additional refinement trainings
can be used to improve the prediction. Refinement trainings scale the outputs by the
magnitude of the predictions from the previous training.

manually scaled by ± 10.

4.3.6 Heuristic 2: multi-step refinement

This work also proposes a method for selecting the values by which the outputs of
the network are scaled. In the absence of output scaling, PINNs generally achieve
poor accuracy; however, the magnitude of the predicted solution is frequently correct.
This work thus proposes that training occur in two steps:

1. an initial training, in which the outputs are not scaled

2. a refinement training, in which the maximum absolute value of the predictions


from the first training are used as output scales

The proposed training procedure is depicted in Figure 4-2. Note that, since the initial
training only needs to produce a good initial guess of the solution, it does not have
to be trained for as long as the refinement step.

65
4.4 Results
The proposed heuristics were tested on a simple linear elastic problem. The PINN
was implemented using the DeepXDE python library [44]. For each trial, the PINN
was implemented using a fully connected network with four hidden layers, each with
64 neurons, hyperbolic tangent activation functions, and Glorot uniform parameter
initialization [28]. Unless otherwise noted, all trainings used the following optimiza-
tion sequence: 10,000 steps of the ADAM optimizer [37] with a learning rate of 1e-2,
5,000 steps of the L-BFGS optimizer [97], 1,000 steps of ADAM with a learning rate
of 1e-3, and finally 1,000 steps of L-BFGS. The L-BFGS steps used an early stop-
ping mechanism with default hyperparameters. All models were trained without any
labeled data.

4.4.1 Test problem: cantilever beam

The proposed heuristics are demonstrated on a canonical linear elastic problem,


namely the cantilever beam for which the closed-form solution is known [62]. The
beam is assumed to have a length of 1 m, a height of 0.1 m and a width of 0.05 m.
One end is fully constrained, and the other is loaded with a force of 1e5 N which is as-
sumed to be distributed quadratically over the tip of the beam (Figure 4-3). A point
cloud of training points was generated via Sobol pseudo-random sampling [9], with
1,000 points inside the domain Ω and 200 points distributed along the boundaries
𝜕Ω𝑖 . Additionally, a uniform grid of 2,544 points was reserved for testing.

4.4.2 Initial training

The initial training was performed using the hard boundary conditions described in
section 4.3.4. Loss term normalization was applied after the first step. Figure 4-4
depicts the weighted value of each term in the loss function over time. Note that loss
values are only approximately equal to one due to the random initialization of the
network. Interestingly, all terms except the equation of motion in the x-direction,
(𝑚𝑥), decrease during the first phase of ADAM optimization. The 𝑚𝑥 term then

66
train points
0.05

0.00
y

−0.05
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
x

test points

Figure 4-3: The cantilever beam problem used for testing. Left: the beam is fully
constrained at the wall and loaded on the end with a distributed shear load. Two
point clouds are sampled: one for testing and one for training. Right: the analytical
solution.

Initial training - no tuning


uX
100
uX loss term
5.0e-5 total
0.0e+0 mX
mY
5.0e-5 1
eX
uY
uY eY
9.4e-4 eXY

0.01
0.0e+0
sigX
loss

sigX
1.0e+3
0.0001
0.0e+0

1.0e+3
sigY
sigY
1.0e-4 0.000001
5.0e-5

0.0e+0

tauXY
tauXY 1e-8
1.9e-6

0 2,000 4,000 6,000 8,000 10,000


3.0e+1 step

Figure 4-4: Results from the initial training with all loss terms normalized. The
displacement predictions are off by one or two orders of magnitude.

decreases during L-BFGS, after which the training converges. Figure 4-4 also shows
the predicted solution. While the heat maps appear to have the correct patterns,
they are off by one or two orders of magnitude.

Since the 𝑚𝑥 term was clearly the limiting factor, the initial training was repeated
by scaling 𝜆𝑚𝑥 by 0.1, the results of which can be seen in Figure 4-5. Though the loss
history looks similar, the predicted results are now on the correct order of magnitude.
The mean absolute error (MAE) of the displacement terms are 5e-4 m and 8e-3 m
for 𝑢𝑥 and 𝑢𝑦 , respectively.

67
Initial training
uX
100
uX loss term
2.0e-3
total
mX
2.1e-3 mY
1
uY eX
uY eY
2.0e-2
eXY
1.0e-2
0.01
0.0e+0
sigX
sigX

loss
1.0e+3
0.0001
0.0e+0

1.0e+3
sigY
sigY
1.0e-5 0.000001
0.0e+0

1.0e-5
tauXY
tauXY
1e-8
1.7e-3

0 2,000 4,000 6,000 8,000 10,000


3.0e+1 step

Figure 4-5: The initial training repeated with 𝜆𝑚𝑥 scaled by 0.1. The predictions are
now close to the analytical solution and can be used in future refinement steps.

uX
Refinement - soft BCs
uX loss term
100,000
2.0e-3 total
0.0e+0 mX
2.0e-3 mY
uY eX
uY eY
3.8e-2 eXY
1 Ty_end
Dx_wall
1.6e-5 Dy_wall
sigX
Tx_end
loss

sigX
1.0e+3 T_top/bot
0.0e+0 0.00001
1.0e+3
sigY
sigY
1.0e-6

5.0e-7
1e-10
0.0e+0
tauXY
tauXY
3.8e-2

0 2,000 4,000 6,000 8,000 10,000 12,000 14,00016,000


3.0e+1 step

Figure 4-6: Results from the refinement step with soft boundary conditions. Predic-
tions show good agreement with the analytical solution.

4.4.3 Refinement training: soft boundary conditions

In the refinement training, the outputs are scaled by the maximum absolute value of
the predictions from the initial training. The predictions and loss histories for the soft
refinement step can be seen in Figure 4-6. Note that the loss function now has ten
terms since boundary conditions are included. Following similar logic to the previous
section, the 𝜆𝑇 𝑦𝑒𝑛𝑑 coefficient was scaled by 10. The predictions have improved by
almost an order of magnitude over the initial training: the MAE in displacement is
now 7e-5 m and 1e-3 m for 𝑢𝑥 and 𝑢𝑦 , respectively.

68
uX
Refinement - hard BCs
uX loss term
2.0e-3 total
0.0e+0 100 mX
2.0e-3 mY
uY eX
uY eY
3.9e-2 eXY
1

0.0e+0
sigX

loss
sigX
1.0e+3

0.0e+0
0.01

1.0e+3
sigY
sigY
0.0e+0
5.0e-6 0.0001
1.0e-5
tauXY
tauXY
4.9e-5
0.000001
0 4,000 8,000 12,000 16,000
3.0e+1 step

Figure 4-7: Results from the refinement step with hard boundary conditions. The
results are slightly better than those produced by the soft boundary conditions, sug-
gesting that hard boundary conditions should be used in the refinement step.

Initial training Refinement - soft BCs Refinement - hard BCs


uX error uX error uX error
uX uX uX

1.0e-3 1.0e-3 1.0e-3

5.0e-4 5.0e-4 5.0e-4


0.0e+0 0.0e+0 0.0e+0
uY error uY error uY error
uY uY uY
1.9e-2 1.9e-2 1.9e-2

0.0e+0 0.0e+0 0.0e+0


sigX error sigX error sigX error
sigX sigX sigX

4.0e+1 4.0e+1 4.0e+1

2.0e+1 2.0e+1 2.0e+1


0.0e+0 0.0e+0 0.0e+0
sigY error sigY error sigY error
sigY sigY sigY
1.6e-5 1.6e-5 1.6e-5

0.0e+0 0.0e+0 0.0e+0


tauXY error tauXY error tauXY error
tauXY tauXY tauXY
6.4e-2 6.4e-2 6.4e-2

0.0e+0 0.0e+0 0.0e+0

Figure 4-8: Prediction errors compared across the initial training and the two refine-
ment trainings. Both refinements improved over the initial training, with the hard
enforcement slightly outperforming soft enforcement in displacement prediction.

4.4.4 Refinement training: hard boundary conditions

The same refinement step was repeated using hard boundary conditions to directly
compare the multi-step refinement process on both boundary condition types. The
results from the refinement step with hard boundary conditions can be seen in Figure
4.3.4. A slight improvement is observed in the prediction: the MAE in displacement
is now 5e-5 m and 7e-4 m for 𝑢𝑥 and 𝑢𝑦 , respectively. A comparison of the prediction
errors from all three trainings is shown in Figure 4-8. These results suggest that a
refinement step with hard boundary conditions is an effective strategy for improving
PINN prediction accuracy.

69
4.5 Conclusions and future work
This work proposed two heuristics for improving the accuracy and convergence of
PINNs trained on linear elastic problems. The first heuristic normalizes each term
in the loss function, while the second uses a multi-step refinement technique to scale
the network outputs. Both heuristics can be implemented in a few lines of code and
have been demonstrated to improve the training performance on a canonical elastic
problem. More research is required to verify whether these heuristics are effective
on other PDEs and domains. Future work may also explore where more than two
refinement steps are advantageous, possibly using shorter training cycles in each step.
While PINNs show promise for improving the data efficiency of structural surrogate
models, more work is required to improve the ease and robustness of the training
process.

70
Chapter 5

Conclusion

This work makes four main contributions towards advancing the state of surrogate
modeling for structural engineering applications:

1. A graph-based surrogate model (GSM) is proposed which can predict the struc-
tural behavior of space frames given only their geometry, loads, and supports
as inputs. Since the GSM does not rely on hand-crafted design parameters to
make predictions, it can be trained on designs from multiple sources, often with
a performance advantage.

2. Transfer learning is proposed as an effective technique for improving the data


efficiency of GSMs by repurposing data and model from other applications.
Positive transfer is observed across varying topology, loads, and complexity,
resulting in a reduction in the amount of training data required by one or two
orders of magnitude.

3. SimJEB: a new public dataset of engineering brackets and structural simulations


is presented which was designed specifically for benchmarking surrogate models.
The designs in SimJEB are more diverse and complex than those generated
synthetically and are thus ideal for advancing the frontier in generalizable shape-
based surrogate models and geometric deep learning algorithms.

4. Two heuristics are presented for improving the accuracy and convergence of

71
physics-informed neural networks (PINNs) for structural applications. These
methods, demonstrated on a canonical problem, represent an important step
towards making PINNs easier to train and use in practice.

Combined, the proposed methods have potential to improve the generalizability and
data efficiency of surrogate models used to design engineering structures.

72
References

[1] Altair HyperMesh. https://www.altair.com/hypermesh/.

[2] Altair OptiStruct. https://www.altair.com/optistruct/.

[3] ASME Competitions - https://www.asme.org/conferences-events/competitions.

[4] Eman Ahmed, Alexandre Saint, Abd El Rahman Shabayek, Kseniya Cherenkova,
Rig Das, Gleb Gusev, Djamila Aouada, and Bjorn Ottersten. A survey on Deep
Learning Advances on Different 3D Data Representations. arXiv:1808.01462 [cs],
April 2019. arXiv: 1808.01462.

[5] Christopher J. Arthurs and Andrew P. King. Active Training of Physics-Informed


Neural Networks to Aggregate and Interpolate Parametric Solutions to the
Navier-Stokes Equations. arXiv:2005.05092 [physics, stat], May 2020. arXiv:
2005.05092.

[6] Pierre Baque, Edoardo Remelli, Francois Fleuret, and Pascal Fua. Geodesic
Convolutional Shape Optimization. In Jennifer Dy and Andreas Krause, ed-
itors, Proceedings of the 35th International Conference on Machine Learning,
volume 80 of Proceedings of Machine Learning Research, pages 472–481, Stock-
holmsmässan, Stockholm Sweden, July 2018. PMLR.

[7] Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and
Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey.
Journal of machine learning research, 18, 2018. Publisher: Journal of Machine
Learning Research.

[8] Davide Boscaini, Jonathan Masci, Emanuele Rodoià, and Michael Bronstein.
Learning Shape Correspondence with Anisotropic Convolutional Neural Net-
works. In Proceedings of the 30th International Conference on Neural Infor-
mation Processing Systems, NIPS’16, pages 3197–3205, Red Hook, NY, USA,
2016. Curran Associates Inc. event-place: Barcelona, Spain.

[9] Paul Bratley and Bennett L Fox. Algorithm 659: Implementing Sobol’s quasiran-
dom sequence generator. ACM Transactions on Mathematical Software (TOMS),
14(1):88–100, 1988. Publisher: ACM New York, NY, USA.

73
[10] Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre
Vandergheynst. Geometric deep learning: going beyond Euclidean data. IEEE
Signal Processing Magazine, 34(4):18–42, July 2017. arXiv: 1611.08097.

[11] Nathan C Brown and Caitlin T Mueller. Design variable analysis and genera-
tion for performance-based parametric modeling in architecture. International
Journal of Architectural Computing, 17(1):36–52, March 2019.

[12] M Diane Burton and Tom Nicholas. Prizes, patents and the search for longitude.
Explorations in Economic History, 64:21–36, 2017. Publisher: Elsevier.

[13] Weijuan Cao, Trevor Robinson, Yang Hua, Flavien Boussuge, Andrew R. Colli-
gan, and Wanbin Pan. Graph Representation of 3D CAD Models for Machining
Feature Recognition With Deep Learning. In Volume 11A: 46th Design Au-
tomation Conference (DAC), page V11AT11A003, Virtual, Online, August 2020.
American Society of Mechanical Engineers.

[14] Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing
Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianx-
iong Xiao, Li Yi, and Fisher Yu. ShapeNet: An Information-Rich 3D Model
Repository. arXiv:1512.03012 [cs], December 2015. arXiv: 1512.03012.

[15] Kai-Hung Chang and Chin-Yi Cheng. Learning to simulate and design for struc-
tural engineering. arXiv:2003.09103 [cs, stat], August 2020. arXiv: 2003.09103.

[16] Siddhartha Chaudhuri, Daniel Ritchie, Jiajun Wu, Kai Xu, and Hao Zhang.
Learning Generative Models of 3D Structures. Computer Graphics Forum,
39(2):643–666, May 2020.

[17] Noel Cressie. Spatial prediction and ordinary kriging. Mathematical Geology,
20(4):405–421, May 1988.

[18] James D. Cunningham, Timothy W. Simpson, and Conrad S. Tucker. An In-


vestigation of Surrogate Models for Efficient Performance-Based Decoding of 3D
Point Clouds. Journal of Mechanical Design, 141(12):121401, December 2019.

[19] Asmaa Ibrahem Dallash and Amr Ali Abdelmonaem. Optimal design of jet
engine bracket. Military Technical College, Cairo, Egypt, July 2017.

[20] Renaud Danhaive. Structural Design Synthesis Using Machine Learning. PhD
thesis, Massachusetts Institute of Technology, September 2020.

[21] Renaud Danhaive and Caitlin Mueller. Design subspace learning: Structural de-
sign space exploration using performance-conditioned generative modeling. Au-
tomation in Construction (in press), 2021.

[22] Brian Dunbar and Lillian Gipson. NASA Design Challenges and Competitions,
November 2019.

74
[23] Nira Dyn, David Levin, and Samuel Rippa. Numerical Procedures for Surface
Fitting of Scattered Data by Radial Functions. SIAM Journal on Scientific and
Statistical Computing, 7(2):639–659, April 1986.

[24] Matthias Fey and Jan E. Lenssen. Fast Graph Representation Learning with
PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs
and Manifolds, 2019.

[25] Alexander I. J. Forrester, András Sóbester, and A. J. Keane. Engineering design


via surrogate modelling: a practical guide. J. Wiley, Chichester, West Sussex,
England ; Hoboken, NJ, 2008.

[26] Anthony P. Garland, Benjamin C. White, Scott C. Jensen, and Brad L. Boyce.
Pragmatic generative optimization of novel structural lattice metamaterials with
machine learning. Materials & Design, page 109632, March 2021.

[27] Aboma Wagari Gebisa and Hirpa G Lemu. A case study on topology optimized
design for additive manufacturing. In IOP Conference Series: Materials Science
and Engineering, volume 276, page 012026. IOP Publishing, 2017. Issue: 1.

[28] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep
feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors,
Proceedings of the Thirteenth International Conference on Artificial Intelligence
and Statistics, volume 9 of Proceedings of Machine Learning Research, pages
249–256, Chia Laguna Resort, Sardinia, Italy, May 2010. PMLR.

[29] Somdatta Goswami, Cosmin Anitescu, and Timon Rabczuk. Adaptive fourth-
order phase field analysis using deep energy minimization. Theoretical and Ap-
plied Fracture Mechanics, 107:102527, June 2020.

[30] Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional Neural Networks
for Steady Flow Approximation. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 481–
490, San Francisco California USA, August 2016. ACM.

[31] Ehsan Haghighat, Maziar Raissi, Adrian Moure, Hector Gomez, and Ruben
Juanes. A physics-informed deep learning framework for inversion and surro-
gate modeling in solid mechanics. Computer Methods in Applied Mechanics and
Engineering, 379:113741, June 2021.

[32] Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and
Daniel Cohen-Or. MeshCNN: A Network with an Edge. ACM Transactions on
Graphics, 38(4):1–12, July 2019. arXiv: 1809.05910.

[33] Jida Huang, Hongyue Sun, Tsz-Ho Kwok, Chi Zhou, and Wenyao Xu. Ge-
ometric Deep Learning for Shape Correspondence in Mass Customization by
Three-Dimensional Printing. Journal of Manufacturing Science and Engineer-
ing, 142(6):061003, June 2020.

75
[34] Yijiang Huang. pyconmech - https://pypi.org/project/pyconmech/, 2020.

[35] Haoliang Jiang, Zhenguo Nie, Roselyn Yeo, Amir Barati Farimani, and Lev-
ent Burak Kara. StressGAN: A Generative Deep Learning Model for 2D Stress
Distribution Prediction. In ASME 2020 International Design Engineering Tech-
nical Conferences and Computers and Information in Engineering Conference.
American Society of Mechanical Engineers Digital Collection, 2020.

[36] Kaspar Kiis, Jared Wolfe, Gregg Wilson, David Abbott, and William Carter.
GE Jet Engine Bracket Challenge, 2013. https://grabcad.com/challenges/ge-
jet-engine-bracket-challenge.

[37] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimiza-
tion. arXiv:1412.6980 [cs], January 2017. arXiv: 1412.6980.

[38] Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Arte-
mov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. ABC: A
Big CAD Model Dataset for Geometric Deep Learning. In 2019 IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition (CVPR), pages 9593–9603,
Long Beach, CA, USA, June 2019. IEEE.

[39] Ulrich Krispel, Christoph Schinko, and Torsten Ullrich. The Rules Behind –
Tutorial on Generative Modeling. Proceedings of Symposium on Geometry Pro-
cessing / Graduate School, 12:2:1–2:49, 2014.

[40] Aaron W. F. Lee, David Dobkin, Wim Sweldens, and Peter Schröder. Mul-
tiresolution mesh morphing. In Proceedings of the 26th annual conference on
Computer graphics and interactive techniques - SIGGRAPH ’99, pages 343–350,
Not Known, 1999. ACM Press.

[41] Hyuk Lee and In Seok Kang. Neural algorithm for solving differential equations.
Journal of Computational Physics, 91(1):110–131, 1990. Publisher: Elsevier.

[42] Jaekoo Lee, Hyunjae Kim, Jongsun Lee, and Sungroh Yoon. Transfer Learning
for Deep Learning on Graph-Structured Data. page 7.

[43] Dandan Li, Senzhang Wang, Shuzhen Yao, Yu-Hang Liu, Yuanqi Cheng, and
Xian-He Sun. Efficient Design Space Exploration by Knowledge Transfer.
page 10.

[44] Lu Lu, Xuhui Meng, Zhiping Mao, and George E. Karniadakis. DeepXDE:
A deep learning library for solving differential equations. arXiv:1907.04502
[physics, stat], February 2020. arXiv: 1907.04502.

[45] Ali Madani, Ahmed Bakhaty, Jiwon Kim, Yara Mubarak, and Mohammad R. K.
Mofrad. Bridging Finite Element and Machine Learning Modeling: Stress Predic-
tion of Arterial Walls in Atherosclerosis. Journal of Biomechanical Engineering,
141(8):084502, August 2019.

76
[46] Jonathan Masci, Davide Boscaini, Michael M. Bronstein, and Pierre Van-
dergheynst. Geodesic convolutional neural networks on Riemannian manifolds.
arXiv:1501.06297 [cs], June 2018. arXiv: 1501.06297.

[47] Christopher McComb, Nicholas Meisel, T. W. Simpson, and Christian Murphy.


Predicting Part Mass, Required Support Material, and Build Time via Autoen-
coded Voxel Patterns. preprint, engrXiv, July 2018.

[48] Mark C. Messner. Convolutional Neural Network Surrogate Models for the
Mechanical Properties of Periodic Structures. Journal of Mechanical Design,
142(2):024503, February 2020.

[49] Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, and
Leonidas J. Guibas. StructureNet: Hierarchical Graph Networks for 3D Shape
Generation. arXiv:1908.00575 [cs], August 2019. arXiv: 1908.00575.

[50] Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svo-
boda, and Michael M. Bronstein. Geometric deep learning on graphs and mani-
folds using mixture model CNNs. arXiv:1611.08402 [cs], December 2016. arXiv:
1611.08402.

[51] Sourena MOOSAVI, Dominique CHAMORET, S Tie Bi, Naoual Sabkhi, and
Yannick Culnard. Topology Optimization Considering Additive Manufacturing
Constraints In An Industrial Context. In 14th WCCM-ECCOMAS Congress
2020, volume 1000, 2021.

[52] H D Morgan, H U Levatti, J Sienz, A J Gil, and D C Bould. GE Jet Engine


Bracket Challenge: A Case Study in Sustainable Design. Sustainable Design and
Manufacturing, page 14, 2014.

[53] WM Wan Muhamad, KA Abdul Wahid, and MN Reshid. Mass Reduction of


a Jet Engine Bracket using Topology Optimisation for Additive Manufacturing
Application. 2020.

[54] Pascal Müller, Peter Wonka, Simon Haegler, Andreas Ulmer, and Luc Van Gool.
Procedural modeling of buildings. In ACM SIGGRAPH 2006 Papers, pages
614–623. 2006.

[55] Sangeun Oh, Yongsu Jung, Seongsin Kim, Ikjin Lee, and Namwoo Kang. Deep
Generative Design: Integration of Topology Optimization and Generative Mod-
els. arXiv:1903.01548 [cs], May 2019. arXiv: 1903.01548.

[56] Sinno Jialin Pan and Qiang Yang. A Survey on Transfer Learning. IEEE Trans-
actions on Knowledge And Data Engineering, 22(10):15, 2010.

[57] Manolis Papadrakakis, Nikos D. Lagaros, and Yiannis Tsompanakis. Structural


optimization using evolution strategies and neural networks. Computer Methods
in Applied Mechanics and Engineering, 156(1-4):309–333, April 1998.

77
[58] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine
Learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

[59] Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia.
Learning Mesh-Based Simulation with Graph Networks. arXiv preprint
arXiv:2010.03409, 2020.

[60] Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu.
Human-Centric Indoor Scene Synthesis Using Stochastic Grammar. In Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2018.

[61] Nestor V. Queipo, Raphael T. Haftka, Wei Shyy, Tushar Goel, Rajkumar
Vaidyanathan, and P. Kevin Tucker. Surrogate-based analysis and optimization.
Progress in Aerospace Sciences, 41(1):1–28, January 2005.

[62] Abdel-Rahman A Ragab and Salah Eldin Ahm Bayoumi. Engineering solid
mechanics: fundamentals and applications. Routledge, 2018.

[63] Ayush Raina, Christopher McComb, and Jonathan Cagan. Learning to Design
From Humans: Imitating Human Designers Through Deep Learning. In Volume
2A: 45th Design Automation Conference, Anaheim, California, USA, August
2019. American Society of Mechanical Engineers.

[64] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural net-


works: A deep learning framework for solving forward and inverse problems
involving nonlinear partial differential equations. Journal of Computational
Physics, 378:686–707, February 2019.

[65] Maziar Raissi, Alireza Yazdani, and George Em Karniadakis. Hidden fluid me-
chanics: Learning velocity and pressure fields from flow visualizations. Science,
367(6481):1026–1030, 2020. Publisher: American Association for the Advance-
ment of Science.

[66] Chengping Rao, Hao Sun, and Yang Liu. Physics informed deep learning for com-
putational elastodynamics without labeled data. arXiv:2006.08472 [cs, math],
June 2020. arXiv: 2006.08472.

[67] Daniel Ritchie, Ben Mildenhall, Noah D. Goodman, and Pat Hanrahan. Control-
ling procedural modeling programs with stochastically-ordered sequential Monte
Carlo. ACM Transactions on Graphics, 34(4):1–11, July 2015.

[68] Charles Ruizhongtai Qi. Deep Learning on 3D Data. In Yonghuai Liu, Nick
Pears, Paul L. Rosin, and Patrik Huber, editors, 3D Imaging, Analysis and
Applications, pages 513–566. Springer International Publishing, Cham, 2020.

78
[69] David Rutten. Grasshopper 3D. v6. Robert McNeel & Associates.

[70] Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. De-
sign and Analysis of Computer Experiments. Statistical Science, 4(4):409–423,
November 1989.

[71] Esteban Samaniego, Cosmin Anitescu, Somdatta Goswami, Vien Minh Nguyen-
Thanh, Hongwei Guo, Khader Hamdia, Timon Rabczuk, and Xiaoying Zhuang.
An Energy Approach to the Solution of Partial Differential Equations in Compu-
tational Mechanics via Machine Learning: Concepts, Implementation and Appli-
cations. Computer Methods in Applied Mechanics and Engineering, 362:112790,
April 2020. arXiv: 1908.10407.

[72] Teseo Schneider, Yixin Hu, Xifeng Gao, Jeremie Dumas, Denis Zorin, and
Daniele Panozzo. A Large Scale Comparison of Tetrahedral and Hexahedral
Elements for Finite Element Analysis. arXiv preprint arXiv:1903.09332, 2019.

[73] Adriana Schulz, Jie Xu, Bo Zhu, Changxi Zheng, Eitan Grinspun, and Wojciech
Matusik. Interactive design space exploration and optimization for CAD models.
ACM Transactions on Graphics, 36(4):1–14, July 2017.

[74] Noor Shaker, Julian Togelius, and Mark J Nelson. Procedural content generation
in games. Springer, 2016.

[75] Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser. The
princeton shape benchmark. In Proceedings Shape Modeling Applications, 2004.,
pages 167–178. IEEE, 2004.

[76] Matthew L. Staten, Steven J. Owen, Suzanne M. Shontz, Andrew G. Salinger,


and Todd S. Coffey. A Comparison of Mesh Morphing Methods for 3D Shape
Optimization. In William Roshan Quadros, editor, Proceedings of the 20th Inter-
national Meshing Roundtable, pages 293–311, Berlin, Heidelberg, 2012. Springer
Berlin Heidelberg.

[77] Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chun-
fang Liu. A Survey on Deep Transfer Learning. arXiv:1808.01974 [cs, stat],
August 2018. arXiv: 1808.01974.

[78] Tin Kam Ho. Random decision forests. In Proceedings of 3rd International
Conference on Document Analysis and Recognition, volume 1, pages 278–282,
Montreal, Que., Canada, 1995. IEEE Comput. Soc. Press.

[79] Stavros Tseranidis, Nathan C Brown, and Caitlin T Mueller. Data-driven ap-
proximation algorithms for rapid performance evaluation and optimization of
civil structures. Automation in Construction, 72:279–293, 2016. Publisher: El-
sevier.

79
[80] Nobuyuki Umetani. Exploring generative 3D shapes using autoencoder networks.
In SIGGRAPH Asia 2017 Technical Briefs on - SA ’17, pages 1–4, Bangkok,
Thailand, 2017. ACM Press.

[81] Nitika Verma, Edmond Boyer, and Jakob Verbeek. FeaStNet: Feature-Steered
Graph Convolutions for 3D Shape Analysis. In 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 2598–2606, Salt Lake City, UT,
June 2018. IEEE.

[82] Irina Vinnitskaya. Tower of London Competition 1890, April 2019.

[83] Nikolaos Vlassis, Ran Ma, and WaiChing Sun. Geometric deep learning for
computational mechanics Part I: Anisotropic Hyperelasticity. Computer Meth-
ods in Applied Mechanics and Engineering, 371:113299, November 2020. arXiv:
2001.04292.

[84] G. Gary Wang and S. Shan. Review of Metamodeling Techniques in Support


of Engineering Design Optimization. Journal of Mechanical Design, 129(4):370–
380, April 2007.

[85] Sifan Wang, Yujun Teng, and Paris Perdikaris. Understanding and mitigating
gradient pathologies in physics-informed neural networks. arXiv:2001.04536 [cs,
math, stat], January 2020. arXiv: 2001.04536.

[86] Glen Williams, Nicholas A. Meisel, Timothy W. Simpson, and Christopher


McComb. Design Repository Effectiveness for 3D Convolutional Neural Net-
works: Application to Additive Manufacturing. Journal of Mechanical Design,
141(11):111701, November 2019.

[87] Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lam-
bourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 Gallery:
A Dataset and Environment for Programmatic CAD Reconstruction. arXiv
preprint arXiv:2010.02392, 2020.

[88] Jin-Long Wu, Heng Xiao, and Eric Paterson. Physics-Informed Machine Learning
Approach for Augmenting Turbulence Models: A Comprehensive Framework.
Physical Review Fluids, 3(7):074602, July 2018. arXiv: 1801.02762.

[89] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou
Tang, and Jianxiong Xiao. 3D ShapeNets: A Deep Representation for Volumetric
Shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), June 2015.

[90] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and
Philip S. Yu. A Comprehensive Survey on Graph Neural Networks. IEEE Trans-
actions on Neural Networks and Learning Systems, pages 1–21, 2020. arXiv:
1901.00596.

80
[91] Jiayang Xu and Karthik Duraisamy. Multi-level convolutional autoencoder net-
works for parametric prediction of spatio-temporal dynamics. Computer Methods
in Applied Mechanics and Engineering, 372:113379, 2020. Publisher: Elsevier.

[92] Zack Xuereb Conti and Sawako Kaijima. A flexible simulation metamodel for
exploring multiple design spaces. In Proceedings of IASS Annual Symposia, vol-
ume 2018, pages 1–8. International Association for Shell and Spatial Structures
(IASS), 2018. Issue: 2.

[93] Soyoung Yoo, Sunghee Lee, Seongsin Kim, Kwang Hyeon Hwang, Jong Ho Park,
and Namwoo Kang. Integrating Deep Learning into CAD/CAE System: Gen-
erative Design and Evaluation of 3D Conceptual Wheel. arXiv:2006.02138 [cs],
February 2021. arXiv: 2006.02138.

[94] Zhibo Zhang, Prakhar Jaiswal, and Rahul Rai. FeatureNet: Machining feature
recognition based on 3D Convolution Neural Network. Computer-Aided Design,
101:12–22, August 2018.

[95] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang,
Changcheng Li, and Maosong Sun. Graph Neural Networks: A Review of Meth-
ods and Applications. arXiv:1812.08434 [cs, stat], July 2019. arXiv: 1812.08434.

[96] Qingnan Zhou and Alec Jacobson. Thingi10k: A dataset of 10,000 3d-printing
models. arXiv preprint arXiv:1605.04797, 2016.

[97] Ciyou Zhu, Richard H Byrd, Peihuang Lu, and Jorge Nocedal. Algorithm 778:
L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization.
ACM Transactions on Mathematical Software (TOMS), 23(4):550–560, 1997.
Publisher: ACM New York, NY, USA.

81

You might also like