Professional Documents
Culture Documents
Lecture 3
Lecture 3
Lecture 3
CHE F315
Recap
26 January 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Descriptive statistics
(Univariate)
Central tendency
Mean, median
Dispersion
Range, variance, standard deviation, quartiles and
interquartile range
Distribution
Skewness, kurtosis
26 January 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Quartile-based identifier and boxplots:
Uses the interquartile distance Q as the scale parameter
Q = Q3 – Q1
where Q1 is the lower quartile, x0.25 and Q3 is the upper
quartile, x0.75
Med = (Q1+ Q3)/2
For a symmetric data distribution, the following condition to
detect outliers:
|xk -med| >2Q
A boxplot is used as a graphical demonstration
of the quartile-based detector
In the plot, any point that lies outside the
upper or lower fences, is considered as an
outlier.
26 January 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Statistics
(Bivariate/Multivariate)
Scatter plot
Covariance
Correlation
Heatmap
26 January 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Multivariate outlier detection
Mahalanobis distance
Minimum covariance
determinant (MCD)
estimator
Minimum volume ellipsoid
(MVE) estimator
Smallest half volume
Useful References
https://www.machinelearningplus.com/statistics/mahalanobi
s-distance/
Chiang, L. H., Pell, R. J., & Seasholtz, M. B. (2003).
Exploring process data with the use of robust outlier
detection algorithms. Journal of Process Control, 13(5),
437-449.
26 January 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
26 January 2024
10 BITS Pilani, Pilani Campus