This document provides a cheat sheet summarizing key machine learning concepts in scikit-learn including data preprocessing techniques, supervised and unsupervised learning algorithms, model evaluation metrics, and model tuning. It lists common classification and regression algorithms like linear regression, support vector machines, naive bayes. It also covers preprocessing steps like standardization, normalization, encoding, imputation and dimensionality reduction using PCA. Model evaluation metrics include accuracy, classification report, MSE, R2 score. Model tuning is demonstrated using GridSearchCV.
This document provides a cheat sheet summarizing key machine learning concepts in scikit-learn including data preprocessing techniques, supervised and unsupervised learning algorithms, model evaluation metrics, and model tuning. It lists common classification and regression algorithms like linear regression, support vector machines, naive bayes. It also covers preprocessing steps like standardization, normalization, encoding, imputation and dimensionality reduction using PCA. Model evaluation metrics include accuracy, classification report, MSE, R2 score. Model tuning is demonstrated using GridSearchCV.
This document provides a cheat sheet summarizing key machine learning concepts in scikit-learn including data preprocessing techniques, supervised and unsupervised learning algorithms, model evaluation metrics, and model tuning. It lists common classification and regression algorithms like linear regression, support vector machines, naive bayes. It also covers preprocessing steps like standardization, normalization, encoding, imputation and dimensionality reduction using PCA. Model evaluation metrics include accuracy, classification report, MSE, R2 score. Model tuning is demonstrated using GridSearchCV.
from sklearn.decomposition import PCA Classification Report
Normalization pca = PCA(n_components=2) from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred)) from sklearn.preprocessing import Normalizer K Means norm = Normalizer() Mean Squared Error norm_X_train = norm.fit_transform(X_train) from sklearn.cluster import KMeans norm_X_test = norm.transform(X_test) kmeans = KMeans(n_clusters=5, random_state=0) from sklearn.metrics import mean_squared_error mean_squared_error(y_test, y_pred) Binarization R2 Score from sklearn.preprocessing import Binarizer Scikit-learn is an open-source Python library for all kinds binary = Binarizer(threshold=0.0) Model Fitting from sklearn.metrics import r2_score of predictive data analysis. You can perform binary_X = binary.fit_transform(X) r2_score(y_test, y_pred) classification, regression, clustering, dimensionality reduction, model tuning, and data preprocessing tasks. Encoding Categorical Features Adjusted Rand Index Supervised Learning from sklearn.preprocessing import LabelEncoder from sklearn.metrics import adjusted_rand_score lab_enc = LabelEncoder() lr.fit(X_train, y_train) adjusted_rand_score(y_test, y_pred) y = lab_enc.fit_transform(y) svm_svc.fit(X_train, y_train) Loading the Data Imputer Unsupervised Learning Cross-Validation from sklearn.impute import SimpleImputer model = pca.fit_transform(X_train) Classification imp_mean = SimpleImputer(missing_values=0, kmeans.fit(X_train) strategy='mean') from sklearn import datasets imp_mean.fit_transform(X_train) X, y = datasets.load_wine(return_X_y=True) from sklearn.model_selection import cross_val_score cross_val_score( lr, X, y, cv=5, scoring='f1_macro') Regression Prediction diabetes = datasets.load_diabetes() Supervised Learning Model X, y = diabetes.data, diabetes.target Supervised Learning Model Tuning Linear Regression y_pred = lr.predict_proba(X_test) y_pred = svm_svc.predict(X_test)
Training And Test Data from sklearn.linear_model import LinearRegression
Unsupervised Learning from sklearn.model_selection import GridSearchCV lr = LinearRegression() parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} model = GridSearchCV(svm_svc, parameters) Support Vector Machines y_pred = kmeans.predict(X_test) model.fit(X_train, y_train) print(model.best_score_) from sklearn.model_selection import train_test_split from sklearn.svm import SVC print(model.best_estimator_) X_train, X_test, y_train, y_test = train_test_split( svm_svc = SVC(kernel='linear') X, y, random_state=0 ) Evaluation Naive Bayes Subscribe to KDnuggets News from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() Accuracy Score Preprocessing the Data