AI&ML UNIT-4

You might also like

Download as pdf
Download as pdf
You are on page 1of 16
onl V ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING 4d ENSEMBLE TECHNIQUES Gzsemble techniques combines multiple base leaming models to produce one optimal predictive model. In all the lea ming models, noise, bias and variance contribute techniques may apply a single algorithm in base inall base learners. These are called homoger different base learners, giving rise to heterogens 2 (uséfication of ensemble techniques: Sequential ensemble techniques: The base learners Process the data in sequential manner. The sequential generation of base learners fosters the dependence between the base leamers. The performance of the model is then improved by assigning higher “eights to previously misrepresented learners. Euample: Adaboost learning, which results in homogeneity ‘teous ensembles. Other methods apply ous ensembles. od Leaning vise r Tnniques: The ese Fearne process the data in paralel ya Parallel ensemble techniques: This independ ier, ‘pase learers lence of ‘ se between the tion Av This enourges indents ue te apieaton of averages. eae earners significantly reduces 4.2 Ensemble Techniques and Uns Example: Random forest) aM ERS 4.2 COMBINING MULTIPLE LEARN! learn from th ‘The rationale for combining itp Hears arise from the No Free 1, rational theorem of Machine Learning. is no single learning algorith ¥ ates that there is m0 Sit = No Fre Lanch Thre inate most acute earner tat in ary “The most common approach is to try many and choose the one that performs ty ¥ est on a separate validation set. Each learning algorithm dictates a certain model ih comes with a set of assumptions. This inductive bias leads to eror if the assumption cor wold forthe given data Leaming such type of il-posed problem and with finite > ach algorithm converges to 2 different solution and fails under different circumstance ‘The performance ofa leamer may be fine-tuned to get the highest possible accuracy ony validation set, But this is again a more complex task and there is no guarantee that it i smprove the performance. To combat ths challenge, combine multiple base leamers i explore the domain space in diferent perspectives. But again, combining the base leamey thas some design challenges: v 1. How to generate base-leamers that complement each other? 2. How to integrate the outputs of base-leamers for maximum accuracy? 4.2.1 Generating Diverse Learners Find a set of diverse leamers who differ in their decisions so that they complement each other. Performance improvement cannot be achieved unless the leamers are accurate. Few methods to generate diverse learners are: © Different Algorithms: Use different learning algorithms to train different base leamers. These algorithms make different assumptions about the data thus leading to different classifiers. Different Hyperparameters: Use same learning algorithm but with diffees hyperparameters. For examples change the number of hidden units in a multilaye individual learners, Carian of eae Th by were same i a5 ; different {PES Of sensorsimeaga et wy be using different make different characteristics wee sourees of information, aceuracy in prediction, Artificiat Jnlelligence and Machine Learning 4.3 s sae Making it possible to integrate beter fans NS: Different representations Allows to extag ® eMtitcation. Fusing multiple Mote information and ahve Wigs Sensor fusion: In speech reeg, addition to the acoustic in nition, to pu Tecognize the , Use the rian uttered we shape of the mouth as the words the video rords, in _. ate spoken, a °t He speaker's lips and Multiview learnin Veetor concatenation: Conca # Concatenate all ecto ffom a single sre lta vecos nd emits one large er systems more complex and require 32 input dimensionalites make the accurate. Random subspace method: Choosing ra representation. This gives a change or ges sm snl deta vance for different leamers to Look 1 athe problem from different points of view and hence it will be much robust ito, this reduces the curse of dimensionality to a great extent, + Different Training Sets: Train different base-learners by different subsets ofthe training set. This is done by drawing random training sets from the given sample or the learners can be trained serially so that instances on which the preceding bbase-leamers are not accurate are given more emphasis in training later base- earners. The major design decision in generating base learners is that the multiple base lamers must be reasonably accurate, but they need not be individually accurate. The tain property of base leamers is its simplicity with diversity. The base leamers must plore the given problem domain space from different perspectives, each one ‘eealizing in subdomains of the problem, ins Artificial imeltigen sand Unsupervised Lear tiple base lamers ae wed eg Migence and Machine tearing 4.5 4.4 Ensemble Techniques Gen el petition is obuinst eg ination Schemes fi ut = Fd: 4.2.2 Model Combination Se ers to generate te bette final output can be Ag y= Fd +t |) pining: multiple base-learm ranods are discussed here, Here, the base-leamer Mi; acts on arbitrar cove sone ofthe importa nt input Fepresentation x). The funcign 7 MeMSional input x. Each My uses a in many ways. So jer ranictets, "or F() is the combining function with & thods: ie its pal (tuiti-expert combination me! in parallel. These methods can tine an Malti-exper their base leamers to work i" PAF oe rar cpus lee ach leamer there are d(x), where s ods have - ‘These methé L. Combining them, K value, +. Kand i jolene . sar be divided into: er fasion: Here sven IMPUL al basco, fj 1 sale in classification, choose clase 3 where i= 1, K and then + Global approach or learn 1s are used. forex th the maximum y, value: generate an output and all these OuPH ing, sacking. , Example: voting, r-selection: Fromamixture of experts, there 2 alg \a" Jism D + Local approach ore input and chooses one (VET fev) ofthe leary 0 ding ) model, which looks &! responsible for generating the output Choose CG if y; = lax, Ve a age combination methods: ( Mutts oc As bination base-leamer is traineg ial approach where the next com ede Ths oe er ree he previous base-Leaners are not sfficiently accurate Thebarton oles 0 hao comple een i Pes way to combine multpe easton, no used unless the es omplex preceding simpler base-Ieamers a not good, stich coresponds to taking a linear combination ofthe kamen v= L wid where w; 20,0 wj=1 4 I + This is also known as ensembles and linear opinion pools. + All leamers are given equal weight and simple voting eoresponds to taking an average. + Altematively, computing weighted sum of outputs of classifiers can also be done. Table 4.1 shows the most commonly used voting functions Table 4.1; Voting functions Fusion function FU Rule ‘Sum Mem Toads ] Weighted sum | y; = Sy iydyay = 0,5) 07j=1 | Median medianydj. Fig 4.1: Combining base learners IE god Lear ned Unsere 46 Ensemble Tecigi ‘in prtlee ly use iets: st 19 out roel equal power: reBardless Of the othe, put goes to 0. ‘Sum rae is most wide! mii and mani yy re 2 san ae ic moe vad chs wehbe pi r overall ou if one leamer has an output the eon essa L After the ily sum up to spinaton rules, yi do not nee be fier t it emer ss Cr and) 8th Weigh oy In weighted sum, d isthe vote vate sal case where all voters have equal weight, namely SMe plurality voting where the class hay; Simple voting isa spec Ne W/L. In classification, this 18 See er of votes is maximum number o ity voting where the winning clas. classes this is major Bes ‘When there are two ‘of the votes. 7 ee ne supply the additional information ti ah they Vote, tach class like posterior probability), then after normalization, these can be yey, as weights in a weighted voting scheme. / In the ease of regression, simple ot weighted averaging or median can be used ig fuse the outputs of base-regressors. Median is more robust to noise than tig average. Altematively, canbe used to assess the accuracies of the leamers on a separa validation set and use that information to compute the weights, so that mor ‘Weights are assigned to more accurate learners. Sometimes, the weights can also be learned from data. Voting schemes can be viewed s approximations under a Bayesian framework with weights approximating prior model probabilities, and model decisions approximating model-conditional likelihoods. MGidn= PUC\IX, My)P(M,) all models The voting ensemble is not guaranteed to provide better performance than any single model used in the ensemble. irany given model used in the en, 8 that model should probably be used ing of the vot ting ensemble voting ensemble can offer peel lower individual models, Variance in the predictions made over i In a lowe which might be desirable given the hither mean performance of the ensemble, stability or confides e However Vong Tuts in beter pe, nce of the model. ensemble with relatively lower vatiance than any ea ‘Model used in the el ass:90% class 1:10 fo da cass: ase: 80% class: 85% si lass: 2% Cle 3:20% Glantz ae a Fig 4.2: Average voting classifier Another particularly useful case for voting fits of the same hyperparameters. ‘ensembles is when combining multiple machine leaning algorithm with slightly different ‘The most effective use cases are: » Combining multiple fits of a model tained using stochastic leaming algorithms, reper ied eats. model wi je fis of ra ambient al a ‘eA limitation of the voting ens ane aoa uontibue equally 1 the PS vines " fels are good in SO" i 48 Ensemble Techniques and Un a caikrememnaee, > Combining multi! + Thisis a problem ifsome mod! ‘Types of Voting: Hard Voting: ‘+ Inhard voting also known ‘a class, and the majority wins _— ss dicted target abe! of the ensemble the mode py, ‘* In statistical terms, the Pr i distribution of individually predicted label Hard voting is for models that predict class labels: ———— e 86 oe? st «ory ving, every nvidlCass fer yay fier) (cssier) {Cassar cosine cng a : 4 A 4 x T T faapie) (Senet) (Sample ae) (Se) ae ar r = a Ting Oa Fig 4.3: Hard voting Soft Voting: ‘+ In soft voting, every individual classifier provides a probability value that « specific data point belongs toa particular target class. ‘+ The predictions are weighted by the classifier’s importance and summed up. * Then the target label with the greatest sum of weighted probabilities wins the vote. ; Artificial mettigence «Soft voting is for models that preg, ‘mati "” ) \3aM (Stacked generaltzation isa techn ique extends vou output of the base-learners is combined, voting in that the way the They n ‘ learned through a combiner system wnjey gn neat bits Whose + The final model that is teamed us sing the intermedi: considered as stacked on top ofthe intermediate madets * Pesistions can be + The combiner learns what the corr certain output combination, feamers give a vain date vrai data utp ourpit value SS | ctossiter ive 9 | erster Pee Level eel Fig 4.6: Stacking * Combiner function cannot be tained on the training data because the base- lcamers may be memorizing the training et; the combiner system should actually eam how the base leamers make errors. * learning na ersned on dat Used in tein y aperied Learning ae lala means of esti ze the combi Artificial Invty . gence and Machine t ning. a fae 5 oevttal ca ne learning 4.17 nd cipro ing FOF the biases of the Zreness between SUPETvised and unsupervsey “1.16 Ensemble Tecinigus co tee | 6 a = he + Stacking oF ;_walves with building & mo evised Learning é ag involves wit mod tS | Th oe Feel jonal space Tr ean output based on | This involves finding the aruchure aa arts ceanew -dimensi in whieh g et an OUD one oF more | relationgh a de base eas 2006 21 the combiner funege 8S wei és outputs of he ne ion i —_ _ Tone acrininanregresi rx can be a different 8 Possible yy ‘jexplanaloTy and response varlabicg aa ization te bse Ne | Beef volved in prediction, in stacked generalization tM inglved in prediction, thy wil ompeent 2 ate continuous outputs i better. Lipa models predict new values by generate Combining classifies that 8% vgerstanding the relationship betweey planatory ad Tesponse variables, — aS TRGRNION Sad Rem | BS8eciations between variables -jypest Classification and Regression Types: Custerngand ae — se algorithms are more aceuraie as | These alse os inoreasesaccumey | ese algorit rate as | These algorithms are ion [Beet ee {Not robust to outliers | ~ _ the predictions hater ‘based on both | predictions are made based on Rakes raining second level machine input and output values, unlabelled data, 5 [Random resampling wih Step leaming algorithm to Jeans a SS i" replacement fused decorrelation | optimal combination of base) | 441K means Clustering leamers | eae ___§ Clustering is the most common exploratory data analys i Lomounsous Homogeneous weak | Weighted majority voting jauition about the structure of the data, Ysis technique used to get an Homogencous i uncorrelated earners | Ieamer (9 (Cuustering can be defined as the tack of Wendie “ vireo iS | Mea Teamers are weed) ch ae efined as the task of iden ‘subgroups in the data ‘Aggregation of results is | Aggregat sdinet | wnblng te cease ‘such that 1G Points in the same subgroup or cluster are very sim 2 he wel while de ints ir " through voting or | through | ile data points in different clusi i a | rvensing majority voting _) | iff fers are very different. DN 4 possible which are measured using a (A UNSUPERVISED LEARNING similarity measure such as Euclidean-based di The aim of supervised eaming i to ar a mapping from the input toan au fw the true labels t evaluate its performance whose comes aes ar provided by supervisor. In unsupervised Jeaming, there isro un by grouping the data pointe mes distinct subgroups, such supervisor The lsing algorithm is provided with only the input data. The sims = to find the regularities in the input. There isa structure to the input space such that certain patterns occur more often than others, and the al happens and what does not. This is ealled densi ensity estimation technique, SE means is an iterative algorithm that divides the unlabeled dataset no k gorithm want to see what generally different clusters in such a wa ty that each dataset belongs only one group that ity estimation. Clustering is a popul:r ‘has similar properties.) 9, \“\ to find the clusters or groupings of input. = sod Learn, ues and Unsapers ition the 4.18 Ensemble Techniques and ihm that est patton the data. mn iterative alg0r © Kmeans algorithm ist bs Kmeans algorit pest K pre-defined distinet non-over ‘belongs to only one group. es the unlabeled dat repeats the Prot predetermined : prints a6 similar a8 possible white cubyroups (users) WHEE EACH dg raset as input, divides the dataset in, ess unt i does ot find the best yaa” in this algorithm. ~ © The algorithm take ‘number of elusters, and ‘The value of k should be I tries to. make the inracluster dats keeping the clusters as different as Pos / user sch hat the sum ofthe squared distance bey ghuster’s centroid is atthe minimum. mean of all the data points that belong + Ieassigns datapoints to the data points and the cl + Centroid is calculated asthe arithmetic that cluster. «sThe less variation we have within clusters, the more homogeneous the data peng are within the same cluster. «The k-means clustering algorithm mainly performs two tasks: } Determines the best value for K center points or centroids by an iterating process. > Assigns each data point to its closest k-center. Those data points which ary near to the particular k-center, ereate a cluster, «Hence each cluster has datapoints with some commonalities, and it is away ftom other clusters. (Geps in K-means algorithm: \> 1. Provide the number of clusters, K, that need to be generated by this algorithm. 2. Choose K data points at random and assign each to a cluster. Briefly, categorize the data based on the number of data points. 3. The cluster centroids will now be computed. 4, Merate the steps below until we find the ideal centroid, which is the assigning of data points to clusters that do not vary. i) The sum of squared distances between data points and centroids would be calculated first. Atif ine ji) Atthis point, we need to allocate Mell and Machine learning 4.19 to the others (centroid, ach data point: ‘to the cluster that ‘is closest 7 lly, compute thi iii) Final the centroids _s a" for the ¢| cht lata points, lusters by averaging all of the Kemeans implements the Expec yam. The Expectation-step is used or zation SD SUSE Yo compute the centroid of, -ithm for K mea cach ol luster, 1, Specify the number of clusters assign 2, Randomly initialize k centroids Repeat: Expectation: Assign each point to its closest centoid 4, 4 5, Maximization: Compute the new centeid( 6. (mean) of each el Until the centroids do not change ms saample 1: Cluster the following eight coordinates into three , A4(5, 8), AS(7, 5), A6(6, 9), AX8, 4), » AG(6, 4), A7(|L, 2), AB(4, 9). As initiz centers are: A1(2, 10), A4(5, 8) and A7(1, 2), The distance ee Sse s-(xl, yl) and b = (x2, y2) is defined as Pfa, b) = ‘Algorithm to find the three cluster centers after the s clusters: A1(2, 10), A2(2, between two points x2 —xl| + ly2-yll. Use K-Means second iteration. Solution: ke3 (number of clusters) sssume C1=A1(2, 10); C2=A4(5, 8); C3 A7 (1,2) eration 1: Points | Distance of the | Distance of the | Distance of the | Point belongs | point from point from pointfrom | tocluster | cl a | a A1(2,10) | 0 5 5s cl | A2(2, 5) | 5 6 | 4 | C3 | (984) [12 7 9 Te | a | 10 9 (A849) 13 ew caster C1= A 10) ‘New cluster C2-Mean of al points in Cluster C2 HG 4547464 45.4484 =(6,6) 5+4+9V5) [New cluster C3~Mean of ll points in Custer C2 =(@+1)2,6+2)2) ee el te I po | on ‘one point in C1. = 015,35) Tisration 3 Distance ofthe | Point belongs | Bane point fr point from point from to cluster | ciei® C266) €3(2.5,3.5) ae [AI@.t0)/0 s : 2 [A2@, 5) [5 : > 7 | Bea |e 4 7 2 ‘AMS, 8) [5 3 li a ‘AS(7, 3) | 10 2 = AGG 4) 10 2 iS i AMG,2) 19 g 2 S ‘| ABG,5) 13 3 a . ‘New cluster C= Mean of all points in C1 (+42, (3,95) ster C2= Mean ofall points in 2 ~WB+547+ 674, 44545 5449) = (6.5, 5.25) yew ou sesetuster C3> Mean ofl points in c3 = (2+ 1)2, (5 +292) = (LS, 3.5) the new cluster points, pate oF advantages of K-Means: + Uvariables are huge, then. K-Means mostof thet i belay imes computationally faster than + K-Means produce tighter clusters than hierarchical cluster ally i i lly if the Sabena iia ing, especially if th Disadvantages of K-Means: «Difficult to predict K-Value. + Itdoes not work well with global clusters, + Different initial partitions can result in diferent final clusters 4 Itdoes not work well with clusters of different size and éifferent desi.) 3 45 INSTANCE BASED LEARNING Instance based learning or nonparametric estimation assumes that similar inputs tave estimation similar outputs. The algorithm is composed of finding the similar past instances from the training set using a suitable distance measure and interpolating from ‘hem to find the right output. Different nonparametric methods differ in the way they (efine similarity or interpolate from the similar training instances. ised Learns 4.22 Ensemble Techniques and UnSWPE™ rametric models pa Differences between paramere and n” » soldi lal Parametric models Linaecsining inaances af “DSS | Allthe training instances ‘estimate of the model jon parametric models ——— ) — no global model. Local mos ere 1s n0 BIO mode are estimated when needed, i models are impacted by only Nearby training instances. ___ No sazumption of rio paramcieg Ga Parameters are assumed based on the densities i at "Considers strong assumptions about Computationally faster and requires less data, Lazy learning algorithms. "Fast learning algorithms ee [ast lente 2 ap ner number | The model postpones the computation ‘The model is simple _ool memory | they are given a test instance. Hence] of parameters and more memory is required. >) requirement is low. —_————_ — Advantages of Nonparametric Machine Learning algorithms: ox «Flexibility: Capable of iting large number of facia forms. «Power: No assumptions about the underlying function. «Performance: Can result in higher performance models for prediction, Disadvantages of Nonparametric Machine Learning algorithms: «More data: Require lot more training data to estimate the mapping function. «Slower: A lot slower to train as they often have far more parameters to train, = Overfiting: More of e risk to overft the training data and it is harder to explin why specifi predictions are made) 4.5.1 K Nearest Neighbours The nearest neighbor class of estimators adapts the amount of smoothing tothe local density of data, The degree of smoothing is controlled by k, the number of neighbors a alah Artificial imei into a€20UNt Which is much smater wpa lt Wh ae fOr each xis given ac ANS 41%) = daly) = +S dytx) # NN can be used for either epreson or clsifcation Problems, + Kismore ofien used as clasitiaton algorithm, similar points can be found near one another” For classification Problems, a class label is, assigned on the basis of a ms vote (Ge. the label that is most fequenly represented eround sah i is used. Ths is otherwise termed as plurality voting . mm + Majority voting, requires a majority of greater than 50%, which primer works when there are only two categories, is ‘+ When there is multiple classes it is not neces sary to have 50% of the vote t a conclusion about a class. The class label mate ‘can be assigned with a vote of greater + In Regression problems, the average of k nearest neighbors is taken to make @ prediction about a classification, ‘The main distinction here is that classification is used for discrete values, whereas regression is used with continuous ones. However, before a classification can be made, the distance must be defined. KNN algorithm is a lazy learning model, as it only stores a training dataset that is undergoing a training stage. vised Learnits ud nse cation oF predictions being ya 4.24 Ensemble Techs Saal to computation occurs WHEY Py ys raining data it 1 A180 refer 1 Als ale computon ae alts iin : Sine iene an as an instance-based in KNN: an be any integer. (reps in KNN: eres dt pots. K we, the n¢ yl * Choose the value of K i " following: eo ‘© For each point in the test data do the et GE aco iba ea sa > Calculate the distance between Euclidean, Manhattan oF Hamyin amely: the help of any ofthe method Ma a calculate distance is Euclgn ‘The most commonly used me! an, distance. The most co! 1g order of the distance value. in > Son the points in ascen a) > Choose the top K rows from the sorted array sass to te test point based on most frequent class of these row, > Assign a class Advantages of KNN: ; oe igorthm’s simplicity and accuracy, itis one of, + Easy to implement: Given the a ini first classifiers that a new data scientist wil 7 i ded, the algorithm adjusy ‘ly: As new training samples are a Huss tp Few hyperparameters: KNN only requires @k value and a distance metre, which is low when compared to other machine learning algorithms. Disadvantages of KN! + Does not scale well: Since KNN is a lazy algorithm, ittakes up more memory ang data storage compared to other classifiers. + Curse of dimensionality: The KNN algorithm does not perform well with high. dimensional data inputs. Prone to overfitting: Due tothe curse of dimensionality, KNN is also more prone ‘o overfitting. While feature selection and dimensionality reduction techniques are leveraged to prevent this from occurring, the value of k can also impact the ‘model's Behavior. Lower values of k can overfit the data, whereas higher values of k tend to “smooth out” the prediction values since it is averaging the values ‘The factory produces a new paper i X2°7. Classify the new product using Artificial Imettigen Machine learning 4.25 gence an i 5 : and Machine learning jtcan underfit the data, | Distance between query instance and (X1, X2) 4 3 3 36 3. Sort the distance and determine the distance. Also find the Y label for ‘4 Since the majority decision is good, the given sample canbe classified as good, ‘nearest neighbour based on the K-th minimum. all the instances, 7 scozmumibitivinian: Ts it included <=) Distance in 3nearest | ¢, | Xiadd [x2 7 Dine neighbours? | C™*fca, | durabitity | strength query Gin seconds) | ag/squm) | sasgance and ‘rare a probabilistic model for representing remodels Gaussian ~— uted subpopulations within an overall population. WY normaliy distri ich subpopulation a data point he knowing which subpopu P . Minturemodse donot eu cam the subpopulations automaticaly an ee jgnment isnot know, thi 8a form of unsupervised Teaming subpopulation assign Estimating the parameters o' I distribution components j the individual normal , 7 : em in modeling data with Gaussian Mixture Model (GMNj, canonical prot nts is from speech data, and have also sed for feature extraction ? a . Se a diel tracking of multiple objects, where the number used extensive Toa ee tede mek prodic object location at sack frame in mixture compos ‘video sequence. Mixture models are generally used on multimodal data, (ie). there is mote than © Mixture cone peak in the distribution of data 1odal Gaussian distributions . Itimodal data as a mixture of many unm mina a ‘ofthe theoretical and computational benefits of Gaussian models ‘making them practical for efficiently modeling very large datasets. The Expectation-Maximizaion (EM) algorithm is an iterative method for | Jinding a local maximum of the likelihood, Mice eigen, ation maximization (EN), Eee ind is usually used when ies canbe called, thas he ee ily neeses Withee ¢ oF xia oF sade pong : ‘rt Machine learning 4.27 Feattetial technique d Sh expressions PY ehient property that {or updating the model Subsequent tration and ‘the maximum likelihood ' guaranteed to approach zation for mi cain 08 ‘mixture models consiss of two steps, Expectation step: Calculating the XPectation of t cach data point given the model paramere” €Ohponent assignment for Maximization step: Maximizin, ig the ‘©xpectations calc, lated wi respect to the model parametere ssetin the E step with This process repeals until the algorithm conver state . ‘The expectation step corresponds to ponds tothe former. Thus, by al bess wn, maximum likelihood estim oven too Bes, giving a maximum likelihood ‘the latter case while the maximization step lemating between which values ‘are assumed fixed, ‘ates of the non-fixed values can be caleulated in an pseudocode for EM algorithm for Gaussian model Input: Data samples 24,...,2,: number of chsters Output: Pio.--,Pa. probability distributions of samples 1 1 Sample random values 43,....p froma 2,....-1, see kDa Pe tyr lle — 22 tprj2itok do £ [Bo 7 over the k clusters ol mel + while no convergence do // Expectation s | fori=1tondo soo 10 forj=.tokdo a Py — eee HEHE) 2 sos+py o for j=1tokdo Leu py/s periged Lene 4.28 Ensemble Techniques ond WASH? | 77 aasimizarson forge te kde nad for /=1t0 0 do | ponte Wy Hy TPM by o/s goo tondo HS) tryin * The model has: > mean that defines its eentres Mare ¥ > covariance E that defines its width. abity that defines how big or smal the Gaussian function yay > mixing probs be. Cluster 2 Cluster 1 Fig 4.7: Gaussian mixture model The outcome is a soft clustering of the data samples, with pj the probabi sample i belongs to cluster k. A regular clustering can be obtained by selecting for each samy maximizing pj. ple i the cluster} sritatization Steps Randomly assign samples without replacement fom xs) tothe component mean esimats page 4 Setall component variance estimates to the sample varian ap i. N vy 1 Where Zis the sample mean estimated as Lay N De Bi Set all component distribution prior estimates tothe uniform distribution the dataset X= 2 oR 52 By aos (wi — a)? z Oryx = h expectation step: Calculate the probability that x;is. for all values fiand k, generated from the component Cy, GN (as | has 6x) Ta oN | f;,85) Am Maximization Step: I Np Ww Vier Vir — of Gaussian Mixture Model: > Does not assume clusters to be of any geometry. Works well with non-linear geometric distributions as well. Yoh Fanti — Aa)? > Does not bias the cluster sizes to have specific structures as does by K-Means, Disadvantages of Gaussian Mixture Model: > Uses all the components it has access to, so initialization of clusters will be difficult when dimensionality of data is high. > Difficult to interpret.) 1”

You might also like