Professional Documents
Culture Documents
Ishida Ou Funkel
Ishida Ou Funkel
Active Learning in
Astronomy
Artificial Intelligence in Astronomy - ESO, Germany
22 June 2019
Emille E. O. Ishida
Laboratoire de Physique de Clermont - Université Clermont-Auvergne
Clermont Ferrand, France
Supervised Learning
Ideal data scenario
r n
Lea l y
Training sample
A pp
Features Target
sample
+ Features
Labels
2
Supervised Learning
Learn by example
r n
Lea l y
Training sample
App
Features Target
sample
+ Features
Labels
Why is this
allowed?
3
Supervised ML model
data training, target Hyp
othe
Trai sis:
χ set of all samples, x repr n ing
esen is
Y set of possible labels, y of ta tative
rget
htrain learner: yest;i = htrain(xi)
L Loss function
Data generation model:
xi ~ Pχ
f → true labeling function, yi = f(xi)
Ldata,f (h) ≡ Px~data (htrain (x)≠ f(x))
Shai and Shai, Understanding ML: From Theory to Algorithms, 2014, CUP
Supervised ML model
data training, target Hyp
χ
Machine Learning
set of all samples, x
algorithm
Trai
n
othe
sis:
repr ing
esen is
Y set of possible labels, y of ta tative
rget
htrain learner: yest;i = htrain(xi)
L Loss function
Data generation model:
xi ~ Pχ
f → true labeling function, yi = f(xi)
Ldata,f (h) ≡ Px~data (htrain (x)≠ f(x))
Shai and Shai, Understanding ML: From Theory to Algorithms, 2014, CUP
Supervised ML model
data training, target Hyp
othe
Trai sis:
χ set of all samples, x repr n ing
esen is
Y set of possible labels, y of ta tative
rget
htrain learner: yest;i = htrain(xi)
L Loss function
Data generation model:
xi ~ Pχ
f → true labeling function, yi = f(xi)
Ldata,f (h) ≡ Px~data (htrain (x)≠ f(x))
Shai and Shai, Understanding ML: From Theory to Algorithms, 2014, CUP
How often
does your
data fulfill
these
requirements?
7
Ideal Supervised learning
situation
In astronomy
r n
Lea Training sample l y
App
Images, colors, Target
light curves, etc. sample
Images, colors,
+ light curves,
etc.
Classes
(spectra)
8
Astro: supervised learning
situation
In astronomy, labels ⇒ spectra
Ap
ply
Target sample
Learn
Training
sample
9
Similar examples
Labels are often far too expensive!
Similar examples
Labels are often far too expensive!
Given limited
resources, we need
recommendation
systems!
Active Learning
Optimal classification, minimum training
12
Optimal Experiment Design
In Statistics literature
● Pool based
● Generative
● Sequential
Active Learning in Astronomy:
Supernova photometric
classification
17
From COIN Residence Program #4, Ishida et al., 2019, MNRAS, 483 (1), 2–18
AL for Supernova classification
A strategy
Parametric Fit
5 parameters
LC+labels
Random
LC
Forest
Active Learning
Passive Learning
Canonical
strategy
19
From COIN Residence Program #4, Ishida et al., 2019, MNRAS, 483 (1), 2–18
Complications: SNe are transients
Not everything is available for labelling
1. Feature extraction
done daily with available
observed epochs until
then.
20
From COIN Residence Program #4, Ishida et al., 2019, MNRAS, 483 (1), 2–18
Do we even need a training set?
Active Learning
Canonical strategy
Passive Learning
The arrow
shows
traditional
Full light-curve
For more on The Cosmostatistics Initiative see results with full
Rafael de Souza’s talk on Friday! SNPCC spec
1103 spectra21
From COIN Residence Program #4, Ishida et al., 2019, MNRAS, 483 (1), 2–18
What comes next?
The Large Synoptic Survey Telescope
Photometric obs:
~minute
Spectroscopic obs:
>= 1 hour (e.g. SDSS)
Multi-fiber spec.
Pointing is not trivial
Camera: 3.2 Giga pixels and 1.65m
Primary mirror: 8.4m
Field of view: 3.5 deg, 40x full moon
Data production :15 TB/night
(3yr LSST=internet today)
~10 million alerts/night
30.000 type Ia SN/yr (today ~1000)
Expected ~ 1000 spectra/yr (~ 3%)
22
https://www.kaggle.com/c/PLAsTiCC-2018
Photometric redshift
AL for Photo-Z
Vilalta, Ishida et al., 2017 IEEE Symposium Series on Computational Intelligence (SSCI)
AL for Photo-Z
s!
r ic error
e t
photom
s s i n
ive ne
t at
r e s en
r ep
re is
th e
ell I F
k s w
W or
Vilalta, Ishida et al., 2017 IEEE Symposium Series on Computational Intelligence (SSCI)
Take home message 1:
Astronomy needs
optimized samples and
algorithms for
Machine Learning
applications
This means
Target sample
Adaptive
Learn Learning
Training
sample
designed for
astronomical
data
Unsupervised Learning
Clustering and Anomaly Detection
anomalies
“An anomaly is an observation which deviates so much from the other observations
as to arouse suspicions that it was generated by a different mechanism”
For more on unsupervised methods see Alberto Krone-Martins’ talk on Tuesday
and Dalya Baron on Wednesday!
Hawkins, 1980
Anomaly Detection:
Isolation Forest
https://donghwa-kim.github.io/iforest.html
Anomaly Detection:
Isolation Forest
Spectroscopic
Observations 4
Organise follow-up
Cross-Match
3 Correlate to other experiments
and catalogs
Process Alerts
2
Broker core algorithms
1
LSST raw alert data
Difference image analysis
~1TB/night expected.
Cross-Match 3
Confirmed
5
sources
Process Alerts 2
LSST Raw
1
alert data
https://fink-broker.readthedocs.io/en/latest/
Take home message 2:
This means
40
From COIN Residence Program #4, Ishida et al., 2019, MNRAS, 483 (1), 2–18
Batch Mode
Partial LC, no initial training, time domain
The arrow
shows
traditional
Full light-curve
results with full
SNPCC spec
41
From COIN Residence Program #4, Ishida et al., 2019, MNRAS, 483 (1), 2–18
Photometric redshifts
Sample A → training
Beck et al.,
astro-ph:1701.08748,
MNRAS 2017
Beck et al., astro-ph:1701.08748, MNRAS 2017