Lecture 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Machine Learning for Chemical Engineers

CHE F315

Ajaya Kumar Pani


BITS Pilani Department of Chemical Engineering
B.I.T.S-Pilani, Pilani Campus
Pilani Campus
Lecture-3
17-01-2024
BITS Pilani
Pilani Campus
Data Preprocessing
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Recap

What is machine learning


Scope of ML in Chemical Engineering
Data Preprocessing
missing value
outlier detection
univariate methods

26 January 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Descriptive statistics
(Univariate)
Central tendency
Mean, median
Dispersion
Range, variance, standard deviation, quartiles and
interquartile range
Distribution
Skewness, kurtosis

26 January 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing
Quartile-based identifier and boxplots:
Uses the interquartile distance Q as the scale parameter
Q = Q3 – Q1
where Q1 is the lower quartile, x0.25 and Q3 is the upper
quartile, x0.75
Med = (Q1+ Q3)/2
For a symmetric data distribution, the following condition to
detect outliers:
|xk -med| >2Q
A boxplot is used as a graphical demonstration
of the quartile-based detector
In the plot, any point that lies outside the
upper or lower fences, is considered as an
outlier.

26 January 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Statistics
(Bivariate/Multivariate)
Scatter plot
Covariance
Correlation
Heatmap

26 January 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Data preprocessing
Multivariate outlier detection

Mahalanobis distance
Minimum covariance
determinant (MCD)
estimator
Minimum volume ellipsoid
(MVE) estimator
Smallest half volume

Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement


clinker quality using multivariate statistics and Takagi-Sugeno fuzzy-
inference technique. Control Engineering Practice, 57, 1-17.
26 January 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

Useful References
https://www.machinelearningplus.com/statistics/mahalanobi
s-distance/
Chiang, L. H., Pell, R. J., & Seasholtz, M. B. (2003).
Exploring process data with the use of robust outlier
detection algorithms. Journal of Process Control, 13(5),
437-449.

26 January 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers

26 January 2024
10 BITS Pilani, Pilani Campus

You might also like