Professional Documents
Culture Documents
Ch01 06 Exercise
Ch01 06 Exercise
혼자 공부하는
머신러닝+딥러닝
Chapter 01 나의 첫 머신러닝
01-1 인공지능과 머신러닝,
딥러닝
https://www.youtube.com/watch?v=J6wehCO_c58&list=PLJN246lAkhQjoU0C4v8FgtbjOIXxSs_4
Q&index=1
Strong AI
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
Weak AI
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
머신러닝은 인공지능의 하위 분야입니다.
https://scikit-learn.org/
딥러닝(==인공신경망)은 머신러닝의 하위 분야입니다.
Development Environment
You need to create a Google Drive account individually.
https://www.youtube.com/watch?v=0l0g7wk9wv4&list=PLJN246lAkhQjoU0C4v8FgtbjOIXxSs_4Q&index=2
01-3 마켓과 머신러닝
첫 번째 머신러닝 프로그램
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
전통적인 프로그램(conventional
programming)
Rule-based Programming
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
도미 vs 빙어 (Bream & Smelt)
2개의 클래스(class)
분류(classification)
이진 분류(binary classification)
도미 데이터 (bream data)
Features
산점도(scatter plot)
빙어 데이터 (Smelt data)
도미와 빙어 합치기 (combining data)
sample
sample
feature
feature
리스트 내포 (List comprehension)
정답 준비 (Ground-truth preparation)
bream smelt
k-최근접 이웃 (K-nearest neighbor)
model
Training
evaluation
새로운 생선 예측 (prediction for new sample)
bream
smelt
Ground Truth
무조건 도미
Hyperparameter settings are important.
Task
• Run all the examples and submit the results.
Chapter 02 데이터 다루기
02-1 훈련 세트와
테스트 세트
완벽한 보고서
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
지도 학습과 비지도 학습
K-nearest neighbor
Supervised learning
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
훈련 세트와 테스트 세트 Training set and test set
Features
Separating data using Python slicing techniques
Samples
Training set
Test set
테스트 세트에서 평가하기 Evaluation with the test set
https://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
?highlight=kneighborsclassifier#sklearn.neighbors.KNeighborsClassifier
https://numpy.org/
Self-study
https://numpy.org/learn/
https://ml-ko.kr/homl2/tools_numpy.html
데이터 섞기 data shuffling
to ensure reproducibility
data shuffling
데이터 나누고 확인하기 Split and check data
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
나는 누구인가? Who am I?
Am I a bream or a smelt?
150
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
넘파이로 데이터 준비 Preparing Data with NumPy
https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html?highligh
t=train_test_split#sklearn.model_selection.train_test_split Please refer to the API documentation.
수상한 도미 Suspicious sea bream
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
농어의 무게를 예측하라
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
회귀
k-최근접 이웃 회귀
k - nearest neighbor classification k - nearest neighbor regression
http://muntermag.com/2016/09/her-y-el-amor-en-la-era-tecnologica/
농어의 길이만 사용
Use only the length of perch
훈련 세트 준비 Prepare the training dataset
reshape(-1,1)
회귀 모델 훈련
https://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.K
NeighborsRegressor.html?highlight=kneighborsregressor# training a regression model
sklearn.neighbors.KNeighborsRegressor
number of neighbors
overfitting underfitting
과대적합 이웃의 개수 과소적합
03-2 선형 회귀
박해선
지난 시간에… work done last time
Find a line
LinearRegression
https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
?highlight=linearregression#sklearn.linear_model.LinearRegression
perch weight
y-intercept
slope of a straight line
perch length
학습한 직선 그리기 Learned straight line drawing
다항 회귀 polynomial regression
모델 다시 훈련 retrain the model
학습한 직선 그리기 Draw learned straight lines
03-3 특성 공학과 규제
박해선
지난 시간에… work done last time
다중 회귀 Multiple/multinomial regression
P151
판다스로 데이터 준비 Prepare data with pandas
Tutorials
• 넘파이 튜토리얼: http://ml-ko.kr/homl2/tools_numpy.html
• 판다스 튜토리얼: http://ml-ko.kr/homl2/tools_pandas.html
P152
다항 특성 만들기 Create polynomial features
https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.Pol
ynomialFeatures.html?highlight=polynomialfeatures#sklearn.pr
eprocessing.PolynomialFeatures
P154
LinearRegression
P156
더 많은 특성 만들기 Create more features
Overfitted
Regularization
P159
릿지 회귀 Ridge Regression L2 regularization
https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html?
highlight=ridge#sklearn.linear_model.Ridge
P160
적절한 규제 강도 찾기
Finding the right regularization strength
Train score
Test score
P161
라쏘 회귀
https://scikit-
learn.org/stable/modules/generated/sklearn.linear_mode
l.Lasso.html?highlight=lasso#sklearn.linear_model.Lasso
Lasso regression L1 regularization
Train score
Test score
P163
Chapter 04 다양한 분류
알고리즘
04-1 로지스틱 회귀
박해선
지난 시간에… work done last time
럭키 백 lucky bag
Probability of bream
probability of smelt
P176
확률 계산하기 Calculate Probabilities
10 Samples of Neighbors
Target Features
P178-179
k-최근접 이웃의 다중 분류
Multiple classification of k-nearest neighbors
alphabetical order
first sample
7 generated probabilities
P180-182
로지스틱 회귀 logistic regression
sigmoid function
logistic function
Negative Positive
P183
로지스틱 회귀(이진 분류)
Logistic regression (binary classification)
5 samples
https://scikit-
Negative Positive learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=l
P185-186 ogisticregression#sklearn.linear_model.LogisticRegression
로지스틱 회귀 계수 확인
Check logistic regression coefficients
Z calculation
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.expit.html?hi
ghlight=expit#scipy.special.expit
P187
로지스틱 회귀(다중 분류)
Logistic Regression (Multiple Classification)
5 samples
7 generated probabilities
P189-190
소프트맥스 함수 softmax function
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html?highlight=sof
P191 tmax#scipy.special.softmax
04-2 확률적 경사 하강법
박해선
지난 시간에… work done last time
𝑧
= 𝑎 × 무게 + 𝑏 × 길이 + 𝑐 × 대각선 + 𝑑 × 높이 + 𝑒 × 두께
+𝑓
럭키 백 대박!
P199
Optimization Methodology
Training sets Sample Take out one by one (stochastic gradient descent)
move along the slope
little by little
Are your legs too long? Take out several (mini-batch gradient descent)
(learning rate)
iteration
No
must be differentiable
P203-204
로지스틱 손실 함수 logistic loss function
(binary cross entropy loss function)
P205-206
데이터 전처리 Data preprocessing
https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html?
P207 highlight=standardscaler#sklearn.preprocessing.StandardScaler
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html?highlight=sgdclassifier#sklearn.linear_model.SGDClassifier
SGDClassifier
P208
에포크와 과대/과소적합
Epochs and over/underfitting
underfitting overfitting
optimal
training Training set
accuracy
Test set
Epochs
P209
조기 종료 early stopping
P210-211
Chapter 05 트리 알고리즘
05-1 결정 트리
박해선
지난 시간에… work done last time
레드 와인과 화이트 와인
red wine and white wine
P220
데이터 준비하기 Prepare your data
P222-223
로지스틱 회귀 logistic regression
https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logisticregression#skl
P224-225 earn.linear_model.LogisticRegression
결정 트리 decision tree
Root node
Leaf node
https://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html?highlight=decisiontreeclassifier#sklearn.
P227 tree.DecisionTreeClassifier
결정 트리 분석 Decision tree analysis
Root node
P228-229
지니 불순도 gini impurity
Pure node
P230
가지치기 pruning
P232
Using Unscaled Attributes
P234
05-2 교차 검증과
그리드 서치
박해선
지난 시간에… work done last time
검증 세트 validation set
test set
training set
parameter tuning
P243-244
model training
교차 검증 cross validation
model evaluation
https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html?highlight=cr
P245-246 oss_validate#sklearn.model_selection.cross_validate
Cross-validation using dividers
분할기를 사용한 교차 검증
https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html?highlight=stra
tifiedkfold#sklearn.model_selection.StratifiedKFold
P247
https://scikit-
그리드 서치
learn.org/stable/modules/generated/sklearn.model_s
election.GridSearchCV.html?highlight=gridsearchcv#sk grid search
learn.model_selection.GridSearchCV
P249-250
확률 분포 선택 Probability distribution selection
P253
https://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.Rando
랜덤 서치
mizedSearchCV.html?highlight=randomizedsearchcv#sklearn.model_s
election.RandomizedSearchCV random search
P254-255
05-3 트리의 앙상블
박해선
지난 시간에… work done last time
structured and unstructured data
P264
랜덤 포레스트 random forest
random forest
decision tree
P265
Random Forest Training Method
랜덤 포레스트 훈련 방법
bootstrap sample
random
sampling decision tree training
P266
https://scikit-
https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.h
tml?highlight=randomforestclassifier#sklearn.ensemble.RandomForestClassifier
P268-269
엑스트라 트리
https://scikit-
learn.org/stable/modules/generated/sklearn.ense
mble.ExtraTreesClassifier.html?highlight=extratree
extra tree
sclassifier#sklearn.ensemble.ExtraTreesClassifier
P270
그레이디언트 부스팅 gradient boosting
https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifie
r.html?highlight=gradientboostingclassifier#sklearn.ensemble.GradientBoostingCl
assifier
P271-272
히스토그램 기반 그레이디언트 부스팅
Histogram-Based Gradient Boosting
https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBo
ostingClassifier.html?highlight=histgradientboostingclassifier#sklearn.e
nsemble.HistGradientBoostingClassifier
P273
특징의 중요도 평가, Assessing the importance of features
Permutation Importance
https://scikit-
learn.org/stable/modules/generated/sklearn.inspection.permutation_importanc
e.html?highlight=permutation_importance#sklearn.inspection.permutation_imp
ortance
P274
XGBoost vs LightGBM
P275
앙상블 보고서
P254-255
Chapter 06 비지도 학습
06-1 군집 알고리즘
박해선
지난 시간에… work done last time
비지도 학습 unsupervised learning
P286
과일 데이터 준비하기 Preparing fruit data
100x100 image
300 samples
P288
샘플 확인 sample check
P289-290
샘플 차원 변경하기 Changing the sample dimension
1D Array
2D Array
P292
샘플 평균의 히스토그램 histogram of sample mean
100 apple
samples
10000 Pixels
P293-294
픽셀 평균의 히스토그램 histogram of pixel mean
mean
pixels
P295
평균 이미지 그리기 average image drawing
P296
평균과 가까운 사진 고르기
Pick photos that are close to average
P297
06-2 k-평균
박해선
지난 시간에… work done last time
군집 clustering
P303
k-평균 K-means
k=3
P304
https://scikit-
모델 훈련
learn.org/stable/modules/generated/sklearn.cluster.KMe
ans.html?highlight=kmeans#sklearn.cluster.KMeans
model training
P305-306
첫 번째 클러스터 first cluster
91 pictures
P307
두 번째, 세 번째 클러스터
second and third cluster
P309-310
최적의 k 찾기 Find the best k
elbow method
P312
06-3 주성분 분석
박해선
지난 시간에… work done last time
차원 축소 dimensionality reduction
axis 1
axis 0
dimension 5
axis 2
P318-319
주성분 principal component
P320-321
https://scikit-
learn.org/stable/modules/generated/sklearn.decomposition
.PCA.html?highlight=pca#sklearn.decomposition.PCA
PCA principal component analysis
P322-323
재구성 reconstruction
P325
설명된 분산 explained variance
P326
분류기와 함께 사용하기 Use with classifiers
P327-328
군집과 함께 사용하기 Use with clusters
P329-330
시각화 visualization
PCA 2
P331 PCA 1
End (ch01 ~ ch06)