Professional Documents
Culture Documents
Fooled by Statistics & Machine Learning by Dr. Tanujit (1)
Fooled by Statistics & Machine Learning by Dr. Tanujit (1)
Fooled by Statistics & Machine Learning by Dr. Tanujit (1)
2/50
N ORMALITY IS A MYTH !
3/50
Normality & Beyond Normality
4/50
Few Famous Quotations
Normality is a myth; there never was, and there never will be a normal
distribution — Roy C. Geary (1947; Biometrika, vol. 34, 248).
... the statisticians knows ... that in nature there never was a normal
distribution, there never was a straight line, yet with normal and linear
assumptions, known to be false he can often derive results which match
to a useful approximation, those found in real world — George W. Box
(1976, Journal of American Statistical Association, vol. 71, 791-799).
5/50
Normal Distribution
1 (x−µ)2
−
f (x; µ, σ) = √ e 2σ2 ; −∞ < x < ∞
2πσ
6/50
Galton Board
7/50
How it has started?
j
X n
pk (1 − p)n−k
k
k=i
8/50
A Brief History
9/50
Solution
10/50
Solution
11/50
The Breakthrough
12/50
Normal Approximation
j ! !
X n j − np i − np
pk (1 − p)k ≈ Φ −Φ
k
p p
k=i
np(1 − p) np(1 − p)
where Z z
1 2 /2
Φ(z) = √ e−x dx
2π −∞
13/50
Error Modeling
To read more about the evolution of normal distribution: Saul Stahl (2006), “The
evolution of normal distribution”, Mathematics Magazine, vol. 79, no. 2, 96 - 113.
14/50
Central Limit Theorem
Lindeberg-Levy CLT:
CLT in Practice
15/50
Some Drawbacks
16/50
Possible Remedy
In Distribution Theory:
1 Skew Normal Distribution (A Azzalini, Scandinavian
Journal of Statistics 1985)
2 Power Normal Distribution (RD Gupta, Test 2008)
3 Geometric Skew-Normal Distribution (D Kundu, Sankhya
2014), etc.
In Regression Theory:
1 Box-Cox Transformation (Box, Cox, JRSS Series-B 1964)
2 Generalized linear model (Nelder, Wedderburn, JRSS
Series-A 1972)
3 Semiparametric and Nonparametric Approaches (see
ESLR/ISLR Book), etc.
17/50
C ORRELATION DOES NOT IMPLY CAUSATION !
18/50
Correlation
Correlation may indicate any type of association. Correlation implies
association, but not causation. Conversely, causation implies
association, but not correlation1
19/50 1
Altman, Krzywinski, "Association, correlation and causation", Nature Methods (2015)
Causality: What is it?
20/50
Causality in Philosophy
21/50
Causality in Statistics
22/50
Causality in Statistics
23/50
A Paradigm Shift: Basic Contributions
24/50
ML techniques are impacting our life
25/50
Now we are stepping into risk-sensitive
areas
Shifting from Performance Driven to Risk Sensitive...
26/50
Problems of today’s ML - Explainability
27/50
Problems of today’s ML - Stability
28/50
Problems of today’s ML - Stability
29/50
Problems of today’s ML - Stability
30/50
A plausible reason: Correlation
31/50
Correlation is not explainable
32/50
Correlation is "unstable"
33/50
Correlation Vs. Causation
It’s not the fault of correlation, but the way we use it...
34/50
A Practical Definition of Causality
35/50
Benefits of bringing causality into learning
36/50
Causality everywhere
37/50
Correlation does not imply causation
38/50
“Correlation = Causation” is a cognitive bias
39/50
Then, what does imply causation?
Source: https://www.bradyneal.com/causal-inference-course
40/50
Then, what does imply causation?
Source: https://www.bradyneal.com/causal-inference-course
41/50
Languages for Causality
Using potential
Using structural
outcomes /
Using graphs: equations:
counterfactuals:
• 1921 Wright • 1921 Wright • 1923 Neyman
(genetics); (genetics); (statistics);
• 1988 Pearl (computer • 1943 Haavelmo • 1973 Lewis
science “AI”); (econometrics); (philosophy);
• 1993 Spirtes, • 1975 Duncan (social • 1974 Rubin (statistics);
Glymour, Scheines sciences);
(philosophy).
• 2000 Pearl (computer • 1986 Robins
science). (epidemiology);
Reference: The Book of Why: The New Science of Cause and Effect by Judea Pearl and
Dana Mackenzie (2019).
42/50
A LL MODELS ARE WRONG , BUT SOME ARE USEFUL !
43/50
The Science of Forecasting
44/50
Random futures
45/50
Which is easiest to forecast? (Easy to Tough)
46/50
Forecastability factors
• The forecasts cannot affect the thing we are trying to forecast (say,
Warren Buffett, CEO of Berkshire Hathaway, make some comment that
stock price may change!).
• When should we give up? When there is insufficient data? When the
models give implausible forecasts?.
47/50
Various Forecasting Models
49/50
Textbook and References
50/50