Tan - Et Al.2020.exploration and Validation Making Sense Og Generated Data in Arge Options Sets

Exploration & Validation
Making sense of generated data in large option sets
Rachel Tan1 , Trevor Patt2 , Seow Jin Koh3 , Edmund Chen4

1,3,4
Sembcorp Architects & Engineers 2 Singapore University of Technology and
Design
1,2,3
{rae.twx94|trpatt|kohseowjin}@gmail.com
4
chen_mingyie@yahoo.com
The project is a real-world case study where we advised our client in the selection
of a viable and well-performing design from a set of computationally generated
options. This process was undertaken while validating the algorithmic generative
process and user-defined evaluation criteria through scrutinizing the other
alternative options to ensure ample variability was considered. Optimisation
algorithms were not ideal as low performing options were not visible to validate
variability. We established variability by extracting the different groups of
options, proving to the client that various operational behaviours were present
and accounted for. In order to sieve through the noise and derive meaningful
results, we employed methods to filter through thousands of options, including:
k-means clustering, archetypal labelling and analysis, pareto front analysis and
visualisation overlays. We present a sense-making and decision-making process
that utilizes principles of genetic algorithms and analysis of multi-dimensional
user-derived evaluation scores. To enable the client's confidence in the
computational model, we proved the effectiveness of the generative model
through communicating and visualizing the impact of different criterias. This
ensured that operational needs were considered. The visualization methods we
employed, including pareto front extraction and analysis eventually helped our
clients to arrive at a decision.
Keywords: generative design, validation, multi-objective optimisation, k-means,

pareto front, decision-making
INTRODUCTION Generating a large amount of information invites a

One of the main benefits of computational design is fair amount of noise in resultant data. Managing this
the ability to generate large option sets, the upper data and controlling the limits of the algorithm is ex-
bound only limited by available computing power. tremely important when faced with clients that are
D1.T4.S2. MAKING THROUGH CODE –BUILT BY DATA AND THE ARCHITECTURAL ILLUSTRATION - Volume 1 - eCAADe 38 | 653
not design-trained or data-inclined. When dealing case, the client was keen to establish variability and
with domain-specific, operational or performance- make sure that all possible options were evaluated.
driven concerns, helping the client to understand Options needed to be visible and optimisation algo-
and validate the logic of a computational model is a rithms were not ideal as the whole spectrum of op-
necessary step. tions were not shown.
Clients and stakeholders are also very likely to At the same time, traditionally, clients rely on
provide a long list of domain-specific requirements professional consultants to present a handful of de-
and criteria, not fully comprehending the impact of sign recommendations and are not used to mak-
these metrics on a computational model and unable ing decisions when presented with large amounts
to confidently establish priorities or weightage. This of data. Urban planners and designers have begun
eventually translates into a multi-objective criteria, to use computer simulation and models to influence
and its complexity and noise increases with every di- decision-making by presenting the impacts of each
mension added. recommended option in the form of evaluation met-
Multi-objective optimization is currently very rics (Sevtsuk and Mekonnen 2012, Wilson et al 2019,
popular in computational design (Ashour and Ko- Hemmersam et al 2015, Al-Douri 2018). When pre-
larevic 2015). As objectives compete with each sented with generative design and the potential of
other, designers study the tradeoffs through opti- limitless options (Figure 1), our clients were inter-
mizing and visualizing the set of solutions along the ested in understanding the rationale behind the al-
pareto front (Evins 2013). A 2-dimensional pareto gorithmic generation of options before agreeing on
front curve has optimised solutions that move from the appropriate evaluation metrics and how to apply
high score in one criteria to high score in another to them. This became a precarious participatory pro-
determine appropriate tradeoffs and compromises cess to define cardinal requirements, variables and
(Gero and Balachandran 1986) In genetic optimisa- acceptable compromises upfront instead of the typi-
tion, according to the principle of “survival of the cal iterative design process with pen and paper. Due
fittest”, the low-performing solutions or options not to the multi-objective complexity of the project, the
in the Pareto front are usually weeded out behind the clients were not only interested in the “best” op-
scenes (Belem and Leitao 2019). Utilization of this tions, but also the reasons why certain options were
approach usually only visualises a small number of deemed low-performing by the algorithm.
top performing solutions and is not as effective in a
noisy dataset or a generated dataset still undergo- OVERVIEW OF CASE STUDY
ing validation. Furthermore, when the pareto-front Computational design need not revolve around “ar-
is in more than 2 or 3 dimensions, the visualisation chitecture”, but can expand into realms of urban plan-
of trade-offs becomes problematic to analyse. In our
Figure 1
Examples of
generated options
654 | eCAADe 38 - D1.T4.S2. MAKING THROUGH CODE –BUILT BY DATA AND THE ARCHITECTURAL ILLUSTRATION - Volume 1
ning and design. Our project pushes into uncom- configurations needed to be reflected and generated
mon territory in a case of site-planning, focusing on by the model (Figure 3). This is in contrast with a more
a small port facility, comprising 60% sea and 40% parametric approach where the land or basin is deter-
land. At this early stage, the clients were less con- mined by setting out dimensions and positions.
cerned with what the facility looked like, the building
Figure 2 massing, or even area distribution. Instead they were
Adapted process more focused on space planning: where critical op-
from Akin and erations were located and the overall impact on out-
Ozkaya, 2002 put, connectivity and flexibility. To gather initial de-
sign concepts and possible land configurations, we
engaged the users and maritime experts in partici-
patory design workshops. At later stages, we worked
with a smaller group to provide updates and clarify
queries. The overall computational process was in-
herently performance driven and the contrast with
conventional design process can be exemplified in
Figure 3 the Figure 2. The conventional design process en-
Examples of gages in archetype-centred specification, where dif-
configurations ferent concepts (archetypes) are developed and iter-
generated by ated through design reviews (Akin and Ozkaya 2002).
grid-based To manage the client’s comfort level with the compu-
algorithm tational process, we re-introduced archetypal analy-
sis on top of generated data to tie in with the initial
design concepts.
We engaged the users through interviews and work-
GENERATION OF OPTION SETS ing sessions to derive the different levels of infor-
The algorithm for option generation was built in C mation needed and different evaluation metrics at
sharp programming language for Rhinoceros and each level (Figure 4). Many of these criteria were
Grasshopper. We approached the problem with a specific to the client’s operations and could not be
grid-based approach, taking the perspective of a benchmarked against architectural standards, hence
basin-centric facility. The general geometrical opera- even the logic of calculating these metrics had to
tions are as follows: be carefully validated. The users observed that raw
data from the land profile configurations was insuf-
• Starting with a blank area of “water” within site ficient for decision-making thus they proposed addi-
boundary tional information for a more comprehensive evalua-
• Inserting a few required spaces as “obstacles” tion and selection process. Additional levels of infor-
(areas such as berths and zones) mation were generated on top of the 1st level (Land):
• Determining a feasible, manoeuvrable path the 2nd level being Piers, Wharves and Vessels, and
through the obstacles the 3rd level being Land-side Activities. Each level
• Forming and joining up resultant land pixels of information was evaluated by a set of metrics and
from site boundary and berth areas compressed into a one-dimensional score through
We used this grid-based approach because it could normalization and weighting.
handle fewer constraints, as different types of land
Figure 4
Evaluation metrics
for different levels
of information and
corresponding R2
correlation
SENSE-MAKING K-means clustering
The objective of sense-making was not to arrive at To further analyse the individual criteria, we ap-
a single decision from the output but, rather, to proached the issue from the angle of variability. We
make sure that the eventual solution is not a myopic identified methods to study variability at different
one. As the options were completely generated from stages, using data generated from multi-dimensional
scratch and evaluation metrics were all user-defined, evaluation metrics. The first method at high-level
there was no benchmarking data readily available for was simple unsupervised machine learning in the
this small scale of port facility to validate the output. form of k-means clustering. K-means clustering is
The evaluation criteria were expected to play a big a centroid-based method to group data points into
part in decision-making, therefore the clients wanted “k” groups or clusters according to their “distance”
to be sure of the geometry and measurements going to each other (Everitt and Hothorn 2011, Han et al
into it. It was a challenge to judge options simply by 2011, Chen et al 2015). The silhouette score, rang-
a reduced one-dimensional fitness score, for the rea- ing from -1 to +1, is used to measure the data point’s
son that it did not lend much confidence in decision- similarity to its cluster in contrast with other clusters
making despite it being weighted according to user- (Rousseeuw 1987).
defined priorities. We applied k-means clustering for various k val-
Our key struggle was dissecting the relation- ues. To validate, we visualised the data after re-
ship between evaluation metrics. Within all 3 levels duction to 2-dimensions using Principal Component
of evaluation criteria, the R2 correlation (Figure 4), Analysis (PCA) and retrieved the cluster centroids and
which quantifies the correlation between two pairs their nearest neighbours to try and visually analyse
of axes, was found to be relatively low. The different the similarities (Figure 5). However, as with the cor-
evaluation criteria did not appear to achieve more relation data, the overall average silhouette scores
than 0.6 hence we were unable to make confident fared poorly and failed to exceed 0.2. Moving for-
statements about the impact of one user-defined cri- ward, we recognized that this k-means was not as ef-
teria on another. fective due to high noise and imbalance within the
dataset. Simply clustering them with existing unfil-
Figure 5
Visualising k-means
cluster centroids &
neighbours with
2-dimensional PCA
tered and unlabelled data may not have been the Pareto front analysis
best approach. At a lower level between individual metrics obtained
Given the amount of noise overall and chal- from all the three levels of information, we pro-
lenges faced in clustering data numerically, we be- ceeded to visualize 2-dimensional pareto fronts and
lieve that there are still a large amount of tacit infor- continued to make observations based on perfor-
mation that were not captured in the 1st, 2nd and 3rd mance and imposed archetypes. Selection of axes
levels for meaningful k-means clustering and analysis for each 2-D pareto front were scoped around ques-
to show distinct results and recommendations. tions. For example, for a question such as, “How do
the archetypes fare for flexibility and connectivity?”,
Archetypal analysis we selected metrics from Level 1 (Land) that corre-
The second method was to manually introduce more sponded to flexibility and connectivity. For a ques-
information through imposing archetypes and fea- tion such as “How do the archetypes fare in provid-
tures based on visually similar characteristics. This ing tucked in wharfage and compact activities?”, we
was derived based on known preferences in discus- selected metrics from Level 2 (Pier) and Level 3 (Ac-
sion with the clients and combined with several ob- tivities) for analysis. The data points were labeled
servations of patterns that we made. We manually with corresponding archetypes, and the visual rep-
tagged a sample set of 1000 options according to 4 resentation of each option was pulled up (Figure 6).
characteristics to identify trends in evaluation met- Through this process, we had hoped to persuade the
rics. By comparing and overlaying the similarities client that the perfect option did not exist and could
across options with the same unique combinations of only be a combination of characteristics. The obser-
4 characteristics, we were able to identify the trends vations at sense-making stage were later developed
in the location of certain spaces and configuration to recommended design principles.
of the basin. This effectively compressed 1000 indi-
vidual options into 89 number of unique combina- Visualisation of operations
tions. Upon further analysis of this additional data, The evaluation criteria at the 3rd level of information
we found that the Archetypes and “H” visual charac- measured the movement between 28 land-side ac-
teristics were shown to be rather distinct on a PCA tivities. However, it also lacked the clients’ in-depth
plot from evaluation data (Figure 6). This supports prioritsation due to lack of time to develop elabo-
the idea that the visual characteristics were in fact, re- rate logic based on the large number of options al-
lated to the output from evaluation criteria. Further- ready generated. As such, we proposed to group the
more, plotting the same PCA data labelled with the movement paths by user groups to try obtaining di-
unique combinations from 4 characteristics (stylized rect feedback from the users (Figure 7).
in Figure 6 as w | x | y | z, where the letters each corre- While initially to designers the impact was not as
spond to a characteristic) demonstrated that the clus- clear, the relevant users involved were able to quickly
tering of visually similar options (derived from unique identify possible trade-offs, such as accepting long
combinations) were also reflected in evaluation data. distances between logistics nodes with the possible
This method proved to be more useful in com- investment into autonomous vehicles. As informa-
municating the outcome of the generative algorithm tion in the 2nd and 3rd levels were closer to the users’
with the clients, as this divided the options into a expertise, they were able to observe comment on re-
categorical grouping and provided a more sensible lationships between the 2. For example, they swiftly
yardstick for comparison besides the genetic fitness drew a relationship between having east-west pier
score. configuration (2nd level: Pier) and the movement of
personnel (3rd level: Activities), where options with
Figure 6
Archetypal analysis
(top), the
effectiveness of
categorical
grouping (center),
pareto front
visualisation
(bottom)
Figure 7
Visualisation of
movement
between activities
according to user
groups
east-west piers tended to force the activities to the We ran the grid based algorithm at the 1st level
east border and west border. Even though noise in (Land) through a genetic algorithm. The 300 top per-
the 3rd level (travel distances) and limitations in cat- forming options from 3 different optimization runs
egorical data in the 2nd level (pier configuration) pre- (with different initial seeds) were selected to con-
vented this relationship from becoming apparent in tinue with the generation of additional levels of in-
data analysis, we believe that feedback is equally use- formation After the 2nd level (Pier, Wharves and Ves-
ful. Numerical data may not appear to show patterns sels), a total of 772 pier configurations were gener-
but visualizing them using different angles can result ated. After encountering many options with inade-
in new insights. quate berthing space, the clients decided to intro-
duce a cardinal requirement: that the design must be
DECISION MAKING able to fit in all the specified vessels. This drastically
Through the various steps of sense-making, the reduced the option set by 90% to 72.
clients were assured that adequate variation and Based on these 72 options, we performed a ba-
“bad options” existed in the back-end of the algo- sic pareto front visualization to understand what op-
rithm. Thus we were able to proceed to engage them tions were determined “best-performing” by the al-
in the selection of the final design. As we recognize gorithm (Figure 8). The axes were a three-way com-
that visually the clients were unable to process large parison between the weighted score of the three lev-
number of options, we proposed filters to reduce the els (Land, Pier, Wharves and Vessels, and Activities)
large data set down to a small number of viable op- This further reduced the number of options to exam-
tions for selection and analysis. ine visually to 18 options. Based on these 18 options,
we were able to recombine design concepts and prin-
ciples identified and make recommendations to the
clients.
Figure 8
Three-way pareto
front comparison
between scores
from 3 levels
CHALLENGES municating with user groups.

Through this project, we acknowledge that obtain- For the analysis of data, a future development is
ing consensus from a multi-user group was indeed a the possible use of pre-trained convolutional neural
challenge. Prioritisation scoring was extremely hard networks as a feature extraction tool, to make up for
to lock in. Algorithmic logic was also heavy scru- tacit information that existed in the visual represen-
tinized to ensure all user concerns were accounted tation but was not apparent in numerical data. This
for. At times, due to preconceived notions and ideas, process may also validate the visual patterns we ob-
there was a degree of disbelief in the outcome as ex- served. As generated datasets only get larger, we
pectations were not reflected in top 10 options gen- must explore more complex ways to analyse data.
erated via genetic optimization. Hence, we saw the
need for making sense of the options generated. ACKNOWLEDGEMENTS
We also recognized the value of employing vi- The authors would like to express gratitude to Semb-
sual options to a non-expert user group. Despite be- corp Architects & Engineers and their clients for sup-
ing output and performance-driven, we found that porting this project. Any opinions, findings, and con-
the most useful method to obtain useful comments clusions or recommendations expressed in this mate-
from the users was to overlay operational informa- rial are those of the authors and do not necessarily re-
tion (generated from data) on top of design options. flect the views of the aforementioned organizations.
FUTURE DEVELOPMENTS REFERENCES

This paper presented a preliminary approach to en- Akins, O and Özkaya, I 2002 ’Models of Design Require-
gage the clients in computational design process and ment’, Sixth Design and Decision Support Systems in
outlined the challenges and methods used to con- Architecture and Urban Planning
vince them that the generative algorithm encom- Al-Douri, F 2018 ’The Employment of Digital Simulation
passed all their needs and requirements. This pro- in the Planning Departments in US Cities’, Proceed-
cess will continue to be refined in later parts of this ings of eCAADe 2018
Ashour, Y and Kolarevic, B 2015 ’Heuristic Optimization
project, where we will explore other means of com- in Design’, Proceedings of ACADIA 2015
Belém, C and Leitão, A 2019 ’Conflicting Goals In Archi-
tecture: A study on Multi-Objective Optimisation’,
Proceedings of CAADRIA 2019
Chen, KW, Janssen, P and Schlueter, A 2015 ’Analysing
Populations of Design Variants Using Clustering and
Archetypal Analysis’, Proceedings of eCAADe 2015
Evins, R 2015, ’A review of computational optimisation
methods applied to sustainable building design’, Re-
newable and Sustainable Energy Reviews, 22, pp. 230-
245
Gero, JS and Balachandran, MB 1986 ’Knowledge and
Design Processes’, Applications of Artificial Intelli-
gence in Engineering Problems
Han, JW, Kamber, MB and Pei, J 2011, Data mining: con-
cepts and techniques, Elsevier
Hemmersam, P, Martin, N, Westvang, E, Aspen, J and
Morrison, A 2015, ’Exploring Urban Data Visualisa-
tion and Public Participation in Planning’, Journal of
Urban Technology, 22, pp. 45-64
Rousseeuw, P 1987, ’Silhouettes: a Graphical Aid to
the Interpretation and Validation of Cluster Analysis’,
Computational and Applied Mathematics, 20, pp. 53-
65
Sevtsuk, A and Mekonnen, M 2012, ’Urban Network
Analysis Toolbox’, International Journal of Geomatics
and Spatial Analysis, 22, p. 287–305
Wilson, L, Danforth, J, Davila, CC and Harvey, D 2019
’How to Generate a Thousand Master Plans: A
Framework for Computational Urban Design’, Pro-
ceedings of SimAud 2019

Tan - Et Al.2020.exploration and Validation Making Sense Og Generated Data in Arge Options Sets

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tan - Et Al.2020.exploration and Validation Making Sense Og Generated Data in Arge Options Sets

Uploaded by

Copyright:

Available Formats

Exploration & Validation

Making sense of generated data in large option sets

Rachel Tan1 , Trevor Patt2 , Seow Jin Koh3 , Edmund Chen4

Keywords: generative design, validation, multi-objective optimisation, k-means,

INTRODUCTION Generating a large amount of information invites a

CHALLENGES municating with user groups.

FUTURE DEVELOPMENTS REFERENCES

You might also like