Professional Documents
Culture Documents
CSL COINS Finse Select Topic3
CSL COINS Finse Select Topic3
2
Representational Development Data
• The structure in the data reflects the structure in the
system under study.
– If we have collected the right data
• Measure wood brightness and grain prominence
6
ANN-8
Radial Basis Function (RBF) Nets
Complex Model
7
1 RBF Neuron
Proof of Principle Testing
• Create data that adheres to your model structure
9
Publicly Available Data Sets
• ML Research
– “Toy” Data Sets
• The Iris data set is widely used in general ML studies.
11
Synthetic Data is a Solution
• The synthetic dataset can be generated according to
the requirements for different test scenarios.
– Even if your model passes all of the tests you set before it, you
may still discover new aspects of your model that you haven’t
even considered.
12
Data Simulation:
A Well Established and Accepted Method in
Research
13
The Daubert Standard!
Learning to Synthesize
a Fraud Model
14
Recall The Data Sim Presentation
But there is analysis of real data to extract the “aggregated
information” about the fraud scenario. This serves to model
the behaviour of the essential components of the internal
structure of the system under study: the financial fraud.
• Daubert Implications?
15
Enables Incremental Modeling
• The synthetic dataset has the benefit that can be
generated according to the researcher needs to study
how certain fraud might affect a specific scenario.
• Start with simplest scenario (POP)
• Incrementally add complexity
•
•
• Target complexity
16
Enables Increasing Model Complexity
17
Avoid “Pushbutton _______”!
18
Brain Computer Interface Research
• Electroencephalogram Data (EEG)
– Brainwaves
https://upload.wikimedia.org/wikipedia/commons/thumb/ https://upload.wikimedia.org/wikipedia/commons/
19
b/bf/EEG_cap.jpg/220px-EEG_cap.jpg thumb/2/26/Spike-waves.png/420px-Spike-waves.png
POP: Detecting Changes in Brain State
Training Data
Non-Alcoholic Alcoholic
Data Data
20
Testing Data
Alcoholic vs Non-alcoholic EEG
64 Channels of Data
21
Generalizing Beyond POP
Training Data
22
Testing Data
New Results Are Terrible
Successful POP Experiment
Unsuccessful extension
23
24
My Expectation: Generally Recognize Alcoholic vs Non-
Alcoholic
Non-Alcoholic Alcoholic
Data Data
26
Thank You!
• Questions
• Comments
• Feedback
• Improvements
carl.leichter@ntnu.no
27