Professional Documents
Culture Documents
Notes PDF ML Day 15
Notes PDF ML Day 15
Feature
Engineering
Missing Values
Univariate Multivariate
Imputation Imputation
Integrative
- Mean/Median Imputer
- Mode (Most Frequent)
- Arbitrary Values
- Missing Value
- End of Distribution
- Missing Indicator
- Automatic select value
sklearn.metrics.pairwise.nan_euclidean_distances
https://scikit-
learn.org/stable/modules/generated/sklearn.metrics.pairwise.nan_euclidean_distances.html#sklear
n.metrics.pairwise.nan_euclidean_distances
2 Step We follow:
2 Step We follow:
#Import Library
import numpy as np
import pandas as pd
#Import Dataset
df =
pd.read_csv('train.csv')[['Age','Pclass','Fare','Survi
ved']]
----
df.sample(10)
df.isnull().mean() * 100
#Define X & Y
X = df.drop(columns=['Survived'])
y = df['Survived']
X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.2,random_state=2)
----
X_train
knn = KNNImputer(n_neighbors=1,weights='distance')
X_train_trf = knn.fit_transform(X_train)
X_test_trf = knn.transform(X_test)
pd.DataFrame(X_train_trf,columns=X_train.columns)
lr = LogisticRegression()
lr.fit(X_train_trf,y_train)
y_pred = lr.predict(X_test_trf)
accuracy_score(y_test,y_pred)
https://www.bluebash.co/blog/ai-healthcare-future-trends-2024/