Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Building Confidence in the Model

Verification and Validation


Ruang L301-302
Gedung Laboratorium
Departemen Teknik Industri
Fakultas Teknik
Universitas Indonesia
systems.ie.ui.ac.id
Agenda

• Principles of Verification and Validation


• DEM Validation
• SD Validation
All Models are Wrong
..but some are useful

• So the “real” Validation and Verification are impossible to achieve


• No Model has ever been or ever will be thoroughly validated ..
• “useful”, “illuminating”, “convincing”, or “inspiring confidence” are more apt
descriptors applying to models than “valid” Greenberger, Crenson and Crissey
(1976)
• Falsity (Falsifications)
• Hypothesis: “All swans are white, because we never seen black swan”. It is true
until we find one swan black (false evidence)
• Only theory that have not yet been refuted should be accepted. However, that
acceptance is always conditional
• Therefore, if we wait for the ultimate false-proof theory, we never have a theory
then?

“No model can claim absolute objectivity, for every model carries in it the modeler’s worldview.
Models are not true or false, but lie in a continuum of usefulness”
(from Barlas & Carpenter, 1990)
The model will not tell you what to do, it only tells you what will
happen if you do it
• Rationale of Model result
Challenges of Verification and Validation

• There is no such thing as general validity


• There may be no real world to compare against (Which real world? Each
persons sees the world differently)
• Often the real-world data are inaccurate
• There is not enough time to verify and validate everything
• Confidence not validity, but modeler confidence can help model “subjective”
validity
Both should be conducted simultaneously

Validation Definition: The assurance that a


product, service, or system meets the needs
of the customer and other identified Are you building the right things?
stakeholders. It often involves acceptance
and suitability with external customers.

Verification Definition: The evaluation of


whether or not a product, service, or system
complies with evaluation criteria (a Are you building things right?
regulation, requirement, specification, or
imposed condition) of a good simulation
models

Real
World
validation validation

Simulation verification Conceptual


Models Model
The complete picture of verification and validation interaction in
models
Data Validation
• Model-Data Comparison with Real-Data

Validating the Existing System Validating the “First Time” Model


• In this approach, we use real-world inputs of • Consider we have to describe a proposed
the model to compare its output with that of the system which doesn’t exist at the present nor
real-world inputs of the real system. This has existed in the past. Therefore, there is no
process of validation is straightforward, historical data available to compare its
however, it may present some difficulties when performance with. Hence, we have to use a
carried out, such as if the output is to be hypothetical system based on assumptions.
compared to average length, waiting time, idle • Subsystem Validity − A model itself may not
time, etc. it can be compared using statistical have any existing system to compare it with, but
tests and hypothesis testing. Some of the it may consist of a known subsystem. Each of
statistical tests are chi-square test, that validity can be tested separately.
Kolmogorov-Smirnov test, Cramer-von Mises • Internal Validity − A model with high degree of
test, and the Moments test. internal variance will be rejected as a stochastic
system with high variance due to its internal
processes will hide the changes in the output
due to input changes.
• Sensitivity Analysis − It provides the
information about the sensitive parameter in the
system to which we need to pay higher attention.
• Face Validity − When the model performs on
opposite logics, then it should be rejected even if
it behaves like the real system.
Model Testing in Practice could be divided into 2 major groups…

Protective - explains Reflective - explore


• Prove a point • Promote Inquiry
• Keep assumptions hidden • Expose hidden assumptions
• Use data selectively • Motivate widest range of empirical test
• Support preconceptions and buttress • Challenge preconceptions and support multiple
preselected answers viewpoints
• ..and cover up the preselection • …and involve the widest community
• Promote the authority of the modeler • Promote the empowerment of the clients

Predictive Modeling Exploratory Modeling


Purpose, Suitability, and Boundary

• What is the purpose of the model?


• What is the boundary of the model?
• Is the level of aggregation consistent with the purpose?
• Are the issues important to the purpose treated endogenously?
• What important variables and issues are exogenous, or excluded?
• Are important variables excluded because there are no numerical data to
quantify them?
• What is the time horizon relevant to the problem?
• Does the model include the factors that may change significantly over the time
horizon as endogenous elements?
Physical and Decision-Making Structure

• Does the model conform to basic physical laws?


• Are equations dimensionally consistent without fudging?
• Is the stock-flow structure explicit/consistent with purpose?
• Does the model represent disequilibrium dynamics or does it assume the
system is in or near equilibrium all the time?
• Are time delays, constraints, and bottlenecks included?

Robustness and Sensitivity to Alternative Assumptions


• Is the model robust in the face of extreme variations in input conditions or
policies?
• Are the policy recommendations robust/sensitive to plausible variations in
assumptions, including assumptions
• about parameters, aggregation, and model boundary
Policy Models: Pragmatics and Politics of Model Use

• Is the model documented? Is the • How much does it cost to run the
documentation publicly available? model? Does the budget permit
adequate sensitivity testing?
• Can you run the model on your own
computer? • How long does it take to revise and
update the model?
• What types of data were used to
develop and test the model? • Is the model being operated by its
• (e.g., aggregate statistics collected by designers or third parties?
third parties, primary data sources,
observation and field-based qualitative • What are the biases, ideologies and
data, archival materials, interviews)? political agendas of the modelers and
clients?
• Are processes used to test and build
confidence described? • How might these biases affect the
results, both deliberately and
• Did independent critics and third parties inadvertently?
review the model?
• Are the results of the model
reproducible?
• Are the results fudged by the modeler?
Policy Models: Model Test Categorization
• Since there is so many technique and approach on verification and validation you can group
them into 3 generic categoru

Model Tests

Model Process
Questions
Structure
Purpose
Suitability Model
Consistency Output
Utility and
Behavior
Effectiveness
Policy Impacts
Implications
Measuring Model Success

• Quality of the Content


• the extent to which the technical work within
the modelling process conforms to the
requirements of the study.
• Quality of the Process:
• the extent to which the process of the
delivery of the work conforms to the clients’
expectations.
• Quality of the Outcome:
• the extent to which the simulation study is
useful within the wider context for which it is
intended.
Example Table for Validation

Category Technique Used Justification for Reference to Validation


Technique Use Supporting Results/
Report/Paper Conclusions

Theories Accepted Approach

Theoretical
Deviation

Assumptions Derived of Empirical


Data

Model Face Validity


Representation

Historical Data
The Importance of Model Documentation

• Create a standards of model documentation so that different people can


understand each other notes
• Document so other people can replicate the same experiment
• Write the Reports of any problems and model results that is not what you/client
wanted
• Build a single repository so all relevant people can access and comments
Agenda

• Principles of Verification and Validation


• DEM Validation
• SD Validation
Verifikasi Logic Model

• Tujuan :
• Untuk menghindari terjadinya kesalahan logik yang mungkin timbul
• Verifikasi model komputer  Proses debugging
• Correct model coding
• Equation per equation
• Isolate model subsections & test with controlled inputs
• Dimensional analysis
• Go beyond automated testing as in softwares
• Numerical errors
• ‘Numerical Method’ dependent errors
• Appropriate numerical method? App. step size?
• ‘Model’ dependent errors
Proses Verifikasi Logic Model

• Pelajari kembali bahasa • Trace Command


pemrogramannya
• Tujuannya : memeriksa kesalahan dan
ketidakkonsisten suatu perilaku
• Memeriksa output / keluaran yang logis
• Bisa dengan membuat data input
dummy yang outputnya telah
diverifikasi secara matematis (excel,
mathlab dsb)
• Mengamati animasi terhadap tingkah
laku yang benar
• Menggunakan fasilitas ‘trace and
debug’ yang disediakan pada software
• Verifikasi Statis atau Dinamis
• Statis = Trace, walkthrough of logic
• Dynamic = Run with changes
Verifikasi Variabel

• Model mungkin saja mencangkup variabel-variabel yang kurang penting


• Model mungkin saja tidak melibatkan suatu variabel yang signifikan
• Model bisa saja gagal menunjukkan ketelitian hubungan antara variabel-
variabel dan ukuran ketidakefektifan yang digunakan
• Nilai-nilai numerik yang digunakan bisa saja tidak benar dan karenanya dapat
mengurangi kemampuan model dalam mencerminkan perilaku sistem
Mengatasi kendala ketidaktersediaan data untuk verifikasi

• Dilakukan ‘running’ pada model yang telah dikomputerisasi untuk periode


waktu yang pendek sehingga hasilnya dapat dibandingkan dengan kalkulator
tangan
• Dilakukan ‘running’ pada masing-masing bagian dari model yang kompleks
sehingga hasilnya dapat diverifikasi
• Mengganti distribusi kemungkinan yang kompleks dengan distribusi yang lebih
mudah sehingga hasilnya lebih mudah diverifikasi
• Jika memungkinkan dapat dibuat situasi percobaan yang mudah dan dicoba
beberapa kombinasi dari keadaan-keadaan dalam model
Validasi Model Diskrit

• Validasi penampakan (face validity) • Extreme Condition Test


• Meninjau kebenaran diagram alir model • Historical Data Validation
atau mekanisme logika model
• Validasi penelusuran (trace validity) • Internal Validity
• Menelusuri kebenaran logika model dan • Uji stochastic dengan menjalankan
program komputer secara manual replikasi berkali-kali. Variabilitas yang
(debugging) terlalu tinggi bisa meragukan model

• Validasi rasional (rational validity) • Operational Graphics (Results)


• Menguji kebenaran asumsi-asumsi • Parameter Variability (Sensitivity
yang digunakan pada struktur model Analysis)
• Animation Validity • Predictive Validation
• Event Validity • Validate with the current accepted
forecast or prediction of system
• Kejadian dalam model sama dengan behavior
dunia nyata
• Turing Test
• Comparisons to Other Model
• Can the people with knowledgeable
• Simulasi di komparasi dengan Analytic about operations of a system are asked
models if they can discriminate the outputs of
• Ke model lain yang juga telah divalidasi the model with the real system
Agenda

• Principles of Verification and Validation


• DEM Validation
• SD Validation
Main validation steps in Continuous Event Modeling

Structural validity tests:


• Is the structure of the model adequate representation of the real structure?
• Comparison of model equations;
• With real system relationships (empirical structural validation)
• With available theory (theoretical structural validation)

Behavior validity tests:


• Is the model capable of producing an acceptable output behavior?
• Comparison of model generated behavior pattern with the real system
behavior
• Model generated patterns are close enough to real ones?
Test for Assesment of Dynamic Models

Behavior Oriented Structure Oriented


• Boundary Adequacy • Behavior Reproduction
• Structure Assessment • Behavior Anomaly
• Dimensional Consistency • Family Member
• Parameter Assesment • Surprise Behavior
• Extreme Conditions • Sensitivity Analysis
• Integration Error • System Improvement
Boundary Adequacy Test

What How
• Are the important concepts addressing the • Use model boundary charts, subsystem
problem endogenous to the model? diagrams, causal diagrams, stock and flow
maps, and direct inspections of the model
• Does the behavior of the model change equations
significantly when boundary assumptions
relaxed? • Use interviews, workshops to solicit expert
opinion, archival materials, review of literature,
• Do the policy recommendations change when direct inspection/participation in system
the model boundary is extended? processes, etc.
• Modify model to include plausible additional
structure; make constants and exogenous
variables endogenous, then repeat sensitivity
and policy analysis
Showing the model assumption and boundaries in front of
everything shall increase trust in the modeling effort
• Bull’s Eye Diagram
Structure Assessment Test

How
What
• Is the model structure consistent with relevant • Use policy structure diagram, causal diagrams,
descriptive knowledge of the system? stock and flow maps and direct inspection of
model equations
• Is the level of aggregation appropriate?
• Use interviews, workshops to solicit expert
• Does the model conform to basic physical laws opinion, archival materials, review of literature,
such as conservation laws? direct inspection/participation in system
• Do the decision rules capture the behavior of processes, etc.
the actors in the system? • Conduct partial model tests of the intended
rationality of decision rules
• Conduct lab. experiments to elicit mental
models and decision rules of system
participants
• Develop aggregate sub-models and compare
behavior to aggregate formulations
• Dis-aggregate suspect structures, then repeat
sensitivity and policy analysis
Dimensional Consistency Test
• How each variable has the right dimension, so the final functions is in correct dimension

What How
• Is each equations dimensionally consistent • Use dimensional analysis software or in
without the use of parameters having no real PowerSIM this is Built-In
world meaning
• Inspect model equations for suspects
parameters
Parameter Assesment Test

What How
• Are the parameter values consistent with the • Use statistical methods to estimate parameters
relevant descriptive and numerical knowledge (wide range of methods available)
of the system?
• Use partial model tests to calibrate
• Do all parameters have real world subsystems
counterparts?
• Use judgmental methods based on interviews,
expert opinion, focus groups, archival material,
direct experience, etc.
• Develop disaggregate submodels to estimate
relationships for use in more aggregate models
Extreme ConditionsTest

What How
• Does each equations make sense when its • Inspect each equation
inputs take on extreme values?
• Test response to extreme values of each input,
• Does the model respond plausibly when alone and incombination
subjected to extreme policies, shocks and
parameters? • Subject model to large shocks and extreme
conditions. Implement test that examine
conformance to basic laws (eg. no inventory,
no shipments, no labor, no production)
Integration Error Test

What How
• Are the results sensitive to the choice of time • Cut the time step in half and test for changes in
step or numerical integration method? behavior. Use different integration methods
and test for changes in behavior
Behavior Reproduction Test

What How
• Does the model reproduce the behavior of • Compute statistical measures of the
interest in the system (qualitatively and correspondence between model and data:
quantitatively)? descriptive statistics (R2, MAE), time domain
methods (autocorrelation functions), frequency
• Does it endogenously generate the symptoms domain methods (spectral analysis) and many
of difficulty motivating the study? others
• Does the model generate the various modes of • Compare model output and data qualitatively
behavior observed in the real system? including modes of behavior, shape of
• Do the frequencies and phase relationships variables, asymmetries, relative amplitudes
among the variables match the data? and phasing, unusual events
• Examine response of model to test inputs,
shocks, and noise
Behavior Anomaly Test

What How
• Do anomalous behavior results when • Zero out key effects (loop knockout analysis)
assumptions of the model are changed or
deleted • Replace equilibrium assumptions with
disequilibrium structures
Family Member Test

What How
• Can the model generate the behavior observed • Calibrate the model to the widest possible
in other instances of the same systems range of related systems
Surprise Behavior Test

What How
• Does the model generate previously • Keep accurate, complete and dated records of
unobserved or unrecognized behavior? model simulation. Use model to simulate likely
future behavior of system
• Does the model successfully anticipate the
response of the system to novel conditions • Resolve all discrepancies between model
behavior and your understanding of the real
system
• Document participation and client mental
model prior to the start of the modeling effort
Sensitivity Analysis Test

What How
• Numerical Sensitivity: do the numerical values • Perform univariate and multivariate sensitiviy
change significantly.. analysis
• Behavior Sensitivity: do the modes of behavior • Use analytic methods (linearization, local and
generated by the model change significantly.. global stability analysis, etc.)
• Policy Sensitivity: do the policy implications • Conduct model boundary and aggregation test
change significantly.. above
• ..when ssumptions about parameters, • Use optimization methods to find parameters
boundary and aggregation are varied over the combinations that generate implausible results
plausible range of uncertainty or reverse policy outcomes
System Improvement Test

What How
• Did the modeling process help to change the • Design instruments in advance to assess the
system for the better? impact of the modeling process on mental
models, behavior, and outcomes
• Design controlled experiments with treatment
and control groups, random assignment, pre-
intervention and post-intervention assessment

You might also like