Professional Documents
Culture Documents
Cheatsheet Supervised Learning
Cheatsheet Supervised Learning
Cheatsheet Supervised Learning
edu/~shervine/l/ar/
rاﻟﻔﺮﺿﻴﺔ ) – (Hypothesisاﻟﻔﺮﺿﻴﺔ ،وﻳﺮﻣﺰ ﻟﻬﺎ ﺑـ ،hθﻫﻲ اﻟﻨﻤﻮذج اﻟﺬي ﻧﺨﺘﺎره .إذا ﻛﺎن ﻟﺪﻳﻨﺎ اﻟﻤﺪﺧﻞ ) ،x(iﻓﺈن
اﻟﻤﺨﺮج اﻟﺬي ﺳﻴﺘﻮﻗﻌﻪ اﻟﻨﻤﻮذج ﻫﻮ ) ).hθ (x(i
اﻟﻤ َﻮ ﱠﺟﻪ ّ
ﻟﻠﺘﻌﻠﻢ ُ ﻣﺮﺟﻊ ﺳﺮﻳﻊ
rداﻟﺔ اﻟﺨﺴﺎرة ) – (Loss functionداﻟﺔ اﻟﺨﺴﺎرة ﻫﻲ اﻟﺪاﻟﺔ L : (z,y) ∈ R × Y 7−→ L(z,y) ∈ Rاﻟﺘﻲ ﺗﺄﺧﺬ
ﻛﻤﺪﺧﻼت اﻟﻘﻴﻤﺔ اﻟﻤﺘﻮﻗﻌﺔ zواﻟﻘﻴﻤﺔ اﻟﺤﻘﻴﻘﻴﺔ yوﺗﻌﻄﻴﻨﺎ اﻻﺧﺘﻼف ﺑﻴﻨﻬﻤﺎ .اﻟﺠﺪول اﻟﺘﺎﻟﻲ ﻳﺤﺘﻮي ﻋﻠﻰ ﺑﻌﺾ
دوال اﻟﺨﺴﺎرة اﻟﺸﺎﺋﻌﺔ: اﻓﺸﯿﻦ ﻋﻤﯿﺪی و ﺷﺮوﯾﻦ ﻋﻤﯿﺪی
اﻻﻧﺘﺮوﺑﻴﺎ اﻟﺘﻘﺎﻃﻌﻴﺔ ﺧﺴﺎرة ﻣﻔﺼﻠﻴﺔ ﺧﺴﺎرة ﻟﻮﺟﺴﺘﻴﺔ ﺧﻄﺄ أﺻﻐﺮ ﺗﺮﺑﻴﻊ ١٤رﺑﻴﻊ اﻟﺜﺎﻧﻲ١٤٤١ ،
)(Cross-entropy )(Hinge loss )(Logistic loss )(Least squared error
[ ] 1
)− y log(z) + (1 − y) log(1 − z )max(0,1 − yz ))log(1 + exp(−yz (y − z)2 ﺗﻤﺖ اﻟﺘﺮﺟﻤﺔ ﺑﻮاﺳﻄﺔ ﻓﺎرس اﻟﻘﻨﻴﻌﻴﺮ .ﺗﻤﺖ اﻟﻤﺮاﺟﻌﺔ ﺑﻮاﺳﻄﺔ زﻳﺪ اﻟﻴﺎﻓﻌﻲ.
2
ّ
ﻟﻠﺘﻌﻠﻢ اﻟﻤُ ﻮَ ﱠﺟﻪ ﻣﻘﺪﻣﺔ
إذا ﻛﺎن ﻟﺪﻳﻨﺎ ﻣﺠﻤﻮﻋﺔ ﻣﻦ ﻧﻘﺎط اﻟﺒﻴﺎﻧﺎت } ) {x(1) , ..., x(mﻣﺮﺗﺒﻄﺔ ﺑﻤﺠﻤﻮﻋﺔ ﻣﺨﺮﺟﺎت } )،{y (1) , ..., y (m
ﺼ ﱢﻨﻒ ﻳﺘﻌﻠﻢ ﻛﻴﻒ ﻳﺘﻮﻗﻊ yﻣﻦ .x
ﻧﺮﻳﺪ أن ﻧﺒﻨﻲ ﻣُ َ
ّ
اﻟﺘﻮﻗﻊ اﻟﻤﺨﺘﻠﻔﺔ ﻣﻮﺿﺤﺔ ﻓﻲ اﻟﺠﺪول اﻟﺘﺎﻟﻲ: ّ
اﻟﺘﻮﻗﻊ – أﻧﻮاع ﻧﻤﺎذج rﻧﻮع
اﻟﺸﺒﻜﺎت اﻟﻌﺼﺒﻴﺔ آﻟﺔ اﻟﻤﺘﺠﻬﺎت اﻟﺪاﻋﻤﺔ اﻻﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ ّ
اﻟﺨﻄﻲ اﻻﻧﺤﺪار
)(Neural Network )(SVM )(Logistic regression )(Linear regression اﻟﺘﺼﻨﻴﻒ اﻻﻧﺤﺪار
)(Classification )(Regression
ﺗﻮﺿﻴﺢ
،GDAﺑﺎﻳﺰ اﻟﺒﺴﻴﻂ )(Naive Bayes اﻻﻧﺤﺪار ) ،(Regressionآﻟﺔ اﻟﻤﺘﺠﻬﺎت اﻟﺪاﻋﻤﺔ )(SVM أﻣﺜﻠﺔ
اﻟﺘﺼﻨﻴﻒ واﻻﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ ﻣﻼﺣﻈﺔ :ﻓﻲ اﻟﻨﺰول اﻻﺷﺘﻘﺎﻗﻲ اﻟﻌﺸﻮاﺋﻲ )) (Stochastic gradient descent (SGDﻳﺘﻢ ﺗﺤﺪﻳﺚ اﻟﻤُ ﻌﺎﻣﻼت
اﻟﺤ َﺰﻣﻲ )(batch gradient descent ً
ﺑﻨﺎءا ﻋﻠﻰ ﻛﻞ ﻋﻴﻨﺔ ﺗﺪرﻳﺐ ﻋﻠﻰ ﺣﺪة ،ﺑﻴﻨﻤﺎ ﻓﻲ اﻟﻨﺰول اﻻﺷﺘﻘﺎﻗﻲ ُ )(parameters
rداﻟﺔ ﺳﻴﺠﻤﻮﻳﺪ ) – (Sigmoidداﻟﺔ ﺳﻴﺠﻤﻮﻳﺪ ،gوﺗﻌﺮف ﻛﺬﻟﻚ ﺑﺎﻟﺪاﻟﺔ اﻟﻠﻮﺟﺴﺘﻴﺔ ،ﺗﻌﺮّ ف ﻛﺎﻟﺘﺎﻟﻲ:
ﻳﺘﻢ ﺗﺤﺪﻳﺜﻬﺎ ﺑﺎﺳﺘﺨﺪام ُﺣ َﺰم ﻣﻦ ﻋﻴﻨﺎت اﻟﺘﺪرﻳﺐ.
1
∀z ∈ R, = )g(z [∈]0,1 rاﻷرﺟﺤﻴﺔ ) – (Likelihoodﺗﺴﺘﺨﺪم أرﺟﺤﻴﺔ اﻟﻨﻤﻮذج ) ،L(θﺣﻴﺚ أن θﻫﻲ اﻟﻤُ ﺪﺧﻼت ،ﻟﻠﺒﺤﺚ ﻋﻦ اﻟﻤُ ﺪﺧﻼت θ
1 + e−z
ً
ﻋﻤﻠﻴﺎ ﻳﺘﻢ اﺳﺘﺨﺪام اﻷرﺟﺤﻴﺔ اﻟﻠﻮﻏﺎرﻳﺜﻤﻴﺔ )(log-likelihood اﻷﺣﺴﻦ ﻋﻦ ﻃﺮﻳﻖ ﺗﻌﻈﻴﻢ ) (maximizingاﻷرﺟﺤﻴﺔ.
)) ℓ(θ) = log(L(θﺣﻴﺚ أﻧﻬﺎ أﺳﻬﻞ ﻓﻲ اﻟﺘﺤﺴﻴﻦ ) .(optimizeﻓﻴﻜﻮن ﻟﺪﻳﻨﺎ:
rاﻻﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ ) – (Logistic regressionﻧﻔﺘﺮض ﻫﻨﺎ أن ) .y|x; θ ∼ Bernoulli(ϕﻓﻴﻜﻮن ﻟﺪﻳﻨﺎ:
ﻣﻼﺣﻈﺔ :ﻟﻴﺲ ﻫﻨﺎك ﺣﻞ رﻳﺎﺿﻲ ﻣﻐﻠﻖ ﻟﻼﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ. rﺧﻮارزﻣﻴﺔ ﻧﻴﻮﺗﻦ ) – (Newton’s algorithmﺧﻮارزﻣﻴﺔ ﻧﻴﻮﺗﻦ ﻫﻲ ﻃﺮﻳﻘﺔ ﺣﺴﺎﺑﻴﺔ ﻟﻠﻌﺜﻮر ﻋﻠﻰ θﺑﺤﻴﺚ ﻳﻜﻮن
.ℓ′ (θ) = 0ﻗﺎﻋﺪة اﻟﺘﺤﺪﻳﺚ ﻟﻠﺨﻮارزﻣﻴﺔ ﻛﺎﻟﺘﺎﻟﻲ:
rاﻧﺤﺪار ﺳﻮﻓﺖ ﻣﺎﻛﺲ ) – (Softmaxوﻳﻄﻠﻖ ﻋﻠﻴﻪ اﻻﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ ﻣﺘﻌﺪد اﻷﺻﻨﺎف (multiclass logistic
)ℓ′ (θ
) ،regressionﻳﺴﺘﺨﺪم ﻟﺘﻌﻤﻴﻢ اﻻﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ إذا ﻛﺎن ﻟﺪﻳﻨﺎ أﻛﺜﺮ ﻣﻦ ﺻﻨﻔﻴﻦ .ﻓﻲ اﻟﻌﺮف ﻳﺘﻢ ﺗﻌﻴﻴﻦ ،θK = 0 θ←θ−
)ℓ′′ (θ
ﺑﺤﻴﺚ ﺗﺠﻌﻞ ﻣُ ﺪﺧﻞ ﺑﻴﺮﻧﻮﻟﻠﻲ ) ϕi (Bernoulliﻟﻜﻞ ﻓﺌﺔ iﻳﺴﺎوي:
ﻣﻼﺣﻈﺔ :ﻫﻨﺎك ﺧﻮارزﻣﻴﺔ أﻋﻢ وﻫﻲ ﻣﺘﻌﺪدة اﻷﺑﻌﺎد ) ،(multidimensionalﻳﻄﻠﻖ ﻋﻠﻴﻬﺎ ﺧﻮارزﻣﻴﺔ ﻧﻴﻮﺗﻦ�راﻓﺴﻮن
)exp(θiT x
= ϕi ) ،(Newton-Raphsonوﻳﺘﻢ ﺗﺤﺪﻳﺜﻬﺎ ﻋﺒﺮ اﻟﻘﺎﻧﻮن اﻟﺘﺎﻟﻲ:
∑K
( )−1
)exp(θjT x )θ ← θ − ∇2θ ℓ(θ )∇θ ℓ(θ
j=1
ّ
اﻟﺨﻄﻲ )(Linear regression اﻻﻧﺤﺪار
اﻟﻨﻤﺎذج اﻟﺨﻄﻴﺔ اﻟﻌﺎﻣﺔ )(Generalized Linear Models - GLM
ﻫﻨﺎ ﻧﻔﺘﺮض أن ) y|x; θ ∼ N (µ,σ 2
ﺳﻴﺔ ) – (Exponential familyﻳﻄﻠﻖ ﻋﻠﻰ ﺻﻨﻒ ﻣﻦ اﻟﺘﻮزﻳﻌﺎت ) (distributionsﺑﺄﻧﻬﺎ ﺗﻨﺘﻤﻲ إﻟﻰ ُ
rاﻟﻌﺎﺋﻠﺔ اﻷ ّ rاﻟﻤﻌﺎدﻟﺔ اﻟﻄﺒﻴﻌﻴﺔ�اﻟﻨﺎﻇﻤﻴﺔ ) – (Normalإذا ﻛﺎن ﻟﺪﻳﻨﺎ اﻟﻤﺼﻔﻮﻓﺔ ،Xاﻟﻘﻴﻤﺔ θاﻟﺘﻲ ﺗﻘﻠﻞ ﻣﻦ داﻟﺔ اﻟﺘﻜﻠﻔﺔ ﻳﻤﻜﻦ
ﻛﺎف (sufficient
ٍ اﻷﺳﻴﺔ إذا ﻛﺎن ﻳﻤﻜﻦ ﻛﺘﺎﺑﺘﻬﺎ ﺑﻮاﺳﻄﺔ ﻣُ ﺪﺧﻞ ﻗﺎﻧﻮﻧﻲ ) ،η (canonical parameterإﺣﺼﺎء
ّ اﻟﻌﺎﺋﻠﺔ ً
رﻳﺎﺿﻴﺎ ﺑﺸﻜﻞ ﻣﻐﻠﻖ ) (closed-formﻋﻦ ﻃﺮﻳﻖ: ﺣﻠﻬﺎ
) ،T (y) statisticوداﻟﺔ ﺗﺠﺰﺋﺔ ﻟﻮﻏﺎرﻳﺜﻤﻴﺔ ) ،a(ηﻛﺎﻟﺘﺎﻟﻲ:
θ = (X T X)−1 X T y
))p(y; η) = b(y) exp(ηT (y) − a(η
ً ّ
اﻟﺘﻌﻠﻢ ،αﻓﺈن ﻗﺎﻧﻮن اﻟﺘﺤﺪﻳﺚ ﻟﺨﻮارزﻣﻴﺔ أﺻﻐﺮ rﺧﻮارزﻣﻴﺔ أﺻﻐﺮ ﻣﻌﺪل ﺗﺮﺑﻴﻊ – LMSإذا ﻛﺎن ﻟﺪﻳﻨﺎ ﻣﻌﺪل
ﻛﺜﻴﺮا ﻣﺎ ﺳﻴﻜﻮن .T (y) = yﻛﺬﻟﻚ ﻓﺈن )) exp(−a(ηﻳﻤﻜﻦ أن ﺗﻔﺴﺮ ﻛﻤُ ﺪﺧﻞ ﺗﺴﻮﻳﺔ )(normalization ﻣﻼﺣﻈﺔ:
ﻟﻠﺘﺄﻛﺪ ﻣﻦ أن اﻻﺣﺘﻤﺎﻻت ﻳﻜﻮن ﺣﺎﺻﻞ ﺟﻤﻌﻬﺎ ﻳﺴﺎوي واﺣﺪ. ﻣﻌﺪل ﺗﺮﺑﻴﻊ )) (Least Mean Squares (LMSﻟﻤﺠﻤﻮﻋﺔ ﺑﻴﺎﻧﺎت ﻣﻦ mﻋﻴﻨﺔ ،وﻳﻄﻠﻖ ﻋﻠﻴﻪ ﻗﺎﻧﻮن ﺗﻌﻠﻢ وﻳﺪرو�ﻫﻮف
) ،(Widrow-Hoffﻛﺎﻟﺘﺎﻟﻲ:
ً
اﺳﺘﺨﺪاﻣﺎ ﻓﻲ اﻟﺠﺪول اﻟﺘﺎﻟﻲ: اﻷﺳﻴﺔ
ّ ﺗﻢ ﺗﻠﺨﻴﺺ أﻛﺜﺮ اﻟﺘﻮزﻳﻌﺎت
∑
m
[ ] )(i
∀j, θj ← θj + α y (i) − hθ (x(i) ) xj
)b(y )a(η )T (y η اﻟﺘﻮزﻳﻊ i=1
( )
1 ))log(1 + exp(η y log ϕ
ِﺑﺮﻧﻮﻟﻠﻲ )(Bernoulli ﻣﻼﺣﻈﺔ :ﻗﺎﻧﻮن اﻟﺘﺤﺪﻳﺚ ﻫﺬا ﻳﻌﺘﺒﺮ ﺣﺎﻟﺔ ﺧﺎﺻﺔ ﻣﻦ اﻟﻨﺰول اﻻﺷﺘﻘﺎﻗﻲ ).(Gradient descent
1−ϕ
( )
2
η2 ﻣﺤﻠﻴ ًﺎ ) ،(Locally Weighted Regressionوﻳﻌﺮف ﺑـ ،LWR
ّ ﻣﺤﻠﻴ ًﺎ ) – (LWRاﻻﻧﺤﺪار اﻟﻤﻮزون
ّ rاﻻﻧﺤﺪار اﻟﻤﻮزون
√1 exp − y2 y µ ﺟﺎوﺳﻲ )(Gaussian
2π 2
ﻫﻮ ﻧﻮع ﻣﻦ اﻻﻧﺤﺪار اﻟﺨﻄﻲ َﻳ ِﺰن ﻛﻞ ﻋﻴﻨﺔ ﺗﺪرﻳﺐ أﺛﻨﺎء ﺣﺴﺎب داﻟﺔ اﻟﺘﻜﻠﻔﺔ ﺑﺎﺳﺘﺨﺪام ) ،w (xاﻟﺘﻲ ﻳﻤﻜﻦ ﺗﻌﺮﻳﻔﻬﺎ
)(i
)(1 )y|x; θ ∼ ExpFamily(η )(2 ]hθ (x) = E[y|x; θ )(3 η = θT x
ﻣﻼﺣﻈﺔ :أﺻﻐﺮ ﺗﺮﺑﻴﻊ ) (least squaresاﻻﻋﺘﻴﺎدي و اﻻﻧﺤﺪار اﻟﻠﻮﺟﺴﺘﻲ ﻳﻌﺘﺒﺮان ﻣﻦ اﻟﺤﺎﻻت اﻟﺨﺎﺻﺔ ﻟﻠﻨﻤﺎذج
اﻟﺨﻄﻴﺔ اﻟﻌﺎﻣﺔ.
ّ
ﻣﻼﺣﻈﺔ :ﻧﻘﻮل أﻧﻨﺎ ﻧﺴﺘﺨﺪم ”ﺣﻴﻠﺔ اﻟﻨﻮاة” ) (kernel trickﻟﺤﺴﺎب داﻟﺔ اﻟﺘﻜﻠﻔﺔ ﻋﻨﺪ اﺳﺘﺨﺪام اﻟﻨﻮاة ﻷﻧﻨﺎ ﻓﻲ
آﻟﺔ اﻟﻤﺘﺠﻬﺎت اﻟﺪاﻋﻤﺔ )(Support Vector Machines
اﻟﺤﻘﻴﻘﺔ ﻻ ﻧﺤﺘﺎج أن ﻧﻌﺮف اﻟﺘﺤﻮﻳﻞ اﻟﺼﺮﻳﺢ ،ϕاﻟﺬي ﻳﻜﻮن ﻓﻲ اﻟﻐﺎﻟﺐ ﺷﺪﻳﺪ اﻟﺘﻌﻘﻴﺪ .وﻟﻜﻦ ،ﻧﺤﺘﺎج أن ﻓﻘﻂ أن
ﻧﺤﺴﺐ اﻟﻘﻴﻢ ).K(x,z ﺗﻬﺪف آﻟﺔ اﻟﻤﺘﺠﻬﺎت اﻟﺪاﻋﻤﺔ ) (SVMإﻟﻰ اﻟﻌﺜﻮر ﻋﻠﻰ اﻟﺨﻂ اﻟﺬي ﻳﻌﻈﻢ أﺻﻐﺮ ﻣﺴﺎﻓﺔ إﻟﻴﻪ:
∑
l
)h(x) = sign(wT x − b
L(w,b) = f (w) + )βi hi (w
i=1 ﺣﻴﺚ (w, b) ∈ Rn × Rﻫﻮ اﻟﺤﻞ ﻟﻤﺸﻜﻠﺔ اﻟﺘﺤﺴﻴﻦ ) (optimizationاﻟﺘﺎﻟﻴﺔ:
ﻣﻼﺣﻈﺔ :اﻟﻤﻌﺎﻣِ ﻼت ) βi (coefficientsﻳﻄﻠﻖ ﻋﻠﻴﻬﺎ ﻣﻀﺮوﺑﺎت ﻻﻏﺮاﻧﺞ ).(Lagrange multipliers 1
min ||w||2 ﺑﺤﻴﺚ أن y (i) (wT x(i) − b) ⩾ 1
2
اﻟﺘﻌﻠﻢ اﻟﺘﻮﻟﻴﺪي )(Generative Learning
اﻟﻨﻤﻮذج اﻟﺘﻮﻟﻴﺪي ﻓﻲ اﻟﺒﺪاﻳﺔ ﻳﺤﺎول أن ﻳﺘﻌﻠﻢ ﻛﻴﻒ ﺗﻢ ﺗﻮﻟﻴﺪ اﻟﺒﻴﺎﻧﺎت ﻋﻦ ﻃﺮﻳﻖ ﺗﻘﺪﻳﺮ ) ،P (x|yاﻟﺘﻲ ﻳﻤﻜﻦ
ﺣﻴﻨﻬﺎ اﺳﺘﺨﺪاﻣﻬﺎ ﻟﺘﻘﺪﻳﺮ ) P (y|xﺑﺎﺳﺘﺨﺪام ﻗﺎﻧﻮن ﺑﺎﻳﺰ ).(Bayes’ rule
rاﻹﻃﺎر – ﺗﺤﻠﻴﻞ اﻟﺘﻤﺎﻳﺰ اﻟﺠﺎوﺳﻲ ﻳﻔﺘﺮض أن yو x|y = 0و x|y = 1ﺑﺤﻴﺚ ﻳﻜﻮﻧﻮا ﻛﺎﻟﺘﺎﻟﻲ:
)y ∼ Bernoulli(ϕ
rاﻟﺘﻘﺪﻳﺮ – اﻟﺠﺪول اﻟﺘﺎﻟﻲ ﻳﻠﺨﺺ اﻟﺘﻘﺪﻳﺮات اﻟﺘﻲ ﻳﻤﻜﻨﻨﺎ اﻟﺘﻮﺻﻞ ﻟﻬﺎ ﻋﻨﺪ ﺗﻌﻈﻴﻢ اﻷرﺟﺤﻴﺔ ):(likelihood
ﻣﻼﺣﻈﺔ :ﻳﺘﻢ ﺗﻌﺮﻳﻒ اﻟﺨﻂ ﺑﻬﺬه اﻟﻤﻌﺎدﻟﺔ . wT x − b = 0
b
Σ )µbj (j = 0,1 b
ϕ
∑m rاﻟﺨﺴﺎرة اﻟﻤﻔﺼﻠﻴﺔ ) – (Hinge lossﺗﺴﺘﺨﺪم اﻟﺨﺴﺎرة اﻟﻤﻔﺼﻠﻴﺔ ﻓﻲ ﺣﻞ SVMوﻳﻌﺮف ﻋﻠﻰ اﻟﻨﺤﻮ اﻟﺘﺎﻟﻲ:
1 ∑
m
1
}i=1 {y (i) =j
)x(i 1 ∑m
(x )(i
− µy(i) )(x )(i
) )− µy(i T ∑ m }1{y(i) =1 )L(z,y) = [1 − yz]+ = max(0,1 − yz
m }1{y(i) =j m
i=1 i=1 i=1
rاﻟﻨﻮاة ) – (Kernelإذا ﻛﺎن ﻟﺪﻳﻨﺎ داﻟﺔ رﺑﻂ اﻟﺨﺼﺎﺋﺺ ) ،ϕ (featuresﻳﻤﻜﻨﻨﺎ ﺗﻌﺮﻳﻒ اﻟﻨﻮاة Kﻛﺎﻟﺘﺎﻟﻲ:
ﺑﺎﻳﺰ اﻟﺒﺴﻴﻂ )(Naive Bayes
)K(x,z) = ϕ(x)T ϕ(z
rاﻻﻓﺘﺮاض – ﻳﻔﺘﺮض ﻧﻤﻮذج ﺑﺎﻳﺰ اﻟﺒﺴﻴﻂ أن ﺟﻤﻴﻊ اﻟﺨﺼﺎﺋﺺ ﻟﻜﻞ ﻋﻴﻨﺔ ﺑﻴﺎﻧﺎت ﻣﺴﺘﻘﻠﺔ ):(independent
∏
n ( )
،K(x,z) = exp −وﻳﻄﻠﻖ ﻋﻠﻴﻬﺎ اﻟﻨﻮاة اﻟﺠﺎوﺳﻴﺔ
||x−z||2 ً
ﻋﻤﻠﻴﺎ ،ﻳﻤﻜﻦ أن ُﺗﻌَ ﺮﱠ ف اﻟﺪاﻟﺔ Kﻋﻦ ﻃﺮﻳﻖ اﻟﻤﻌﺎدﻟﺔ
= P (x|y) = P (x1 ,x2 ,...|y) = P (x1 |y)P (x2 |y)... )P (xi |y 2σ 2
i=1 ) ،(Gaussian kernelوﻫﻲ ﺗﺴﺘﺨﺪم ﺑﻜﺜﺮة.
rاﻟﺤﻞ – ﺗﻌﻈﻴﻢ اﻷرﺟﺤﻴﺔ اﻟﻠﻮﻏﺎرﻳﺜﻤﻴﺔ ) (log-likelihoodﻳﻌﻄﻴﻨﺎ اﻟﺤﻠﻮل اﻟﺘﺎﻟﻴﺔ إذا ﻛﺎن ]]:k ∈ {0,1},l ∈ [[1,L
ﻣﻼﺣﻈﺔ :ﺑﺎﻳﺰ اﻟﺒﺴﻴﻂ ﻳﺴﺘﺨﺪم ﺑﺸﻜﻞ واﺳﻊ ﻟﺘﺼﻨﻴﻒ اﻟﻨﺼﻮص واﻛﺘﺸﺎف اﻟﺒﺮﻳﺪ اﻹﻟﻜﺘﺮوﻧﻲ اﻟﻤﺰﻋﺞ.
ً
ﻛﺒﻴﺮا ﻣﻦ أﺷﺠﺎر اﻟﻘﺮار ً
ﻋﺪدا rاﻟﻐﺎﺑﺔ اﻟﻌﺸﻮاﺋﻴﺔ ) – (Random forestﻫﻲ أﺣﺪ اﻟﻄﺮق اﻟﺸﺠﺮﻳﺔ اﻟﺘﻲ ﺗﺴﺘﺨﺪم
ﻣﺒﻨﻴﺔ ﺑﺎﺳﺘﺨﺪام ﻣﺠﻤﻮﻋﺔ ﻋﺸﻮاﺋﻴﺔ ﻣﻦ اﻟﺨﺼﺎﺋﺺ .ﺑﺨﻼف ﺷﺠﺮة اﻟﻘﺮار اﻟﺒﺴﻴﻄﺔ ﻻ ﻳﻤﻜﻦ ﺗﻔﺴﻴﺮ اﻟﻨﻤﻮذج ﺑﺴﻬﻮﻟﺔ،
وﻟﻜﻦ أداﺋﻬﺎ اﻟﻌﺎﻟﻲ ﺟﻌﻠﻬﺎ أﺣﺪ اﻟﺨﻮارزﻣﻴﺔ اﻟﻤﺸﻬﻮرة.
rﻣﺘﺮاﺟﺤﺔ ﻫﻮﻓﺪﻳﻨﺞ ) – (Hoeffdingﻟﻨﺠﻌﻞ Z1 , .., Zmﺗﻤﺜﻞ mﻣﺘﻐﻴﺮ ﻣﺴﺘﻘﻠﺔ وﻣﻮزﻋﺔ ﺑﺸﻜﻞ ﻣﻤﺎﺛﻞ )(iid rاﻟﺘﻌﺰﻳﺰ ) – (Boostingﻓﻜﺮة ﺧﻮارزﻣﻴﺎت اﻟﺘﻌﺰﻳﺰ ﻫﻲ دﻣﺞ ﻋﺪة ﺧﻮارزﻣﻴﺎت ﺗﻌﻠﻢ ﺿﻌﻴﻔﺔ ﻟﺘﻜﻮﻳﻦ ﻧﻤﻮذج ﻗﻮي.
ﻣﺄﺧﻮذة ﻣﻦ ﺗﻮزﻳﻊ ِﺑﺮﻧﻮﻟﻠﻲ ) (Bernoulli distributionذا ﻣُ ﺪﺧﻞ .ϕﻟﻨﺠﻌﻞ b
ϕﻣﺘﻮﺳﻂ اﻟﻌﻴﻨﺔ ) (sample meanو اﻟﻄﺮق اﻷﺳﺎﺳﻴﺔ ﻣﻠﺨﺼﺔ ﻓﻲ اﻟﺠﺪول اﻟﺘﺎﻟﻲ:
γ > 0ﺛﺎﺑﺖ .ﻓﻴﻜﻮن ﻟﺪﻳﻨﺎ:
∑ 1
m
= )bϵ(h } )1{h(x(i) )̸=y(i
m
i=1
• ﻋﻴﻨﺎت اﻟﺘﺪرﻳﺐ ﺗﺆﺧﺬ ﺑﺸﻜﻞ ﻣﺴﺘﻘﻞ. ﻣﻼﺣﻈﺔ :ﻛﻠﻤﺎ زاد اﻟﻤُ ﺪﺧﻞ ،kﻛﻠﻤﺎ زاد اﻻﻧﺤﻴﺎز ) ،(biasوﻛﻠﻤﺎ ﻧﻘﺺ ،kزاد اﻟﺘﺒﺎﻳﻦ ).(variance
rﻣﺒﺮﻫﻨﺔ اﻟﺤﺪ اﻷﻋﻠﻰ ) – (Upper bound theoremﻟﻨﺠﻌﻞ Hﻓﺌﺔ ﻓﺮﺿﻴﺔ ﻣﺤﺪودة )(finite hypothesis class
ﺑﺤﻴﺚ ،|H| = kو δوﺣﺠﻢ اﻟﻌﻴﻨﺔ mﺛﺎﺑﺘﻴﻦ .ﺣﻴﻨﻬﺎ ﺳﻴﻜﻮن ﻟﺪﻳﻨﺎ ،ﻣﻊ اﺣﺘﻤﺎل ﻋﻠﻰ اﻷﻗﻞ ،1 − δاﻟﺘﺎﻟﻲ:
rﺑُﻌْ ﺪ ﻓﺎﺑﻨﻴﻚ – ﺗﺸﺮﻓﻮﻧﻴﻜﺲ ) (Vapnik-Chervonenkis - VCﻟﻔﺌﺔ ﻓﺮﺿﻴﺔ ﻏﻴﺮ ﻣﺤﺪودة (infinite hypothesis
) ،H classوﻳﺮﻣﺰ ﻟﻪ ﺑـ ) ،VC(Hﻫﻮ ﺣﺠﻢ أﻛﺒﺮ ﻣﺠﻤﻮﻋﺔ ) (setاﻟﺘﻲ ﺗﻢ ﺗﻜﺴﻴﺮﻫﺎ ﺑﻮاﺳﻄﺔ .(shattered by H) H
ﻣﻼﺣﻈﺔ :ﺑُﻌْ ﺪ ﻓﺎﺑﻨﻴﻚ�ﺗﺸﺮﻓﻮﻧﻴﻜﺲ VCﻟـ = { Hﻣﺠﻤﻮﻋﺔ اﻟﺘﺼﻨﻴﻔﺎت اﻟﺨﻄﻴﺔ ﻓﻲ ﺑُﻌﺪﻳﻦ} ﻳﺴﺎوي .۳