Exercises 1 Bayasian Decision Theory

You might also like

Download as pdf
Download as pdf
You are on page 1of 13
University of Crete - Computer Science Department CS-473: Pattern recognition Bayesian Decision Theory Exercise 1: Detecting the Hepatitis C virus based on an antibody test In hospitals, patients that may have the Hepatitis C virus are submitted in tests in order to detect the virus in their blood. Let: «Hi denote the event that a patient ts infected with the virus, + nH: denote the event that a patient is not infected with the virus. «+ Pos: denote the event that a patient's test (o the virus is positive + Neg: denote the event that a patient's test to the virus is negative. Practically, the following probabilities have been determined: P(H) = 0.15, P(Pos|H) = 0.95 P(Pos|ni) = 0.02 ‘Compute the probability that a patient is infected with the virus, given that his test is positive. Solution We want to compute the probability P(#|Pos). P(Pos|H) P(E) P(Pos) — P(H) =1-0.15 = 0.85 According to Bayes Rule: P(H) Pos) Moreover, P(H) = 0.15 + P(n# P(Pos|H) = 0.95 P(Pos|nH) = 0.02 P(Pos) = P(Pos|H)P(H) + P(Pos|nH)P(nH) = 0.95 -0.15 + 0.02 0.85 = So, P(Pos\H)P(H) _ 0.95 0.15 PUH\Poe) = (Pos) Od = 0.893417 Pattern Recognition - Bayesian Decision Theory 2 Exercise 2: Classification of candidate students based on their FUT score According to the University of Tahiti, the probability that a student will graduate within 5 years since he began his studies is 0.8. The score in FUT (French University Testing), which takes values in [0,35], of the students who graduate within 5 years follow a normal distribution with mean value of 26 and standard deviation of 2. On the contrary, the score of the students who do not graduate within 5 years also follow ‘a normal distribution with mean value of 22 and standard deviation of 3. Let G denote the event that a student graduates within 5 years and let nG denote the event that a student does not graduate within 5 years. Finally, let « denote the score of a student in FUT. (a) If we know that 80% of the students that enter the University of Tahiti graduate within 5 years, what is the probability that a student will graduate within 5 years, given that his FUT score is 21? (b) What is the minimum FUT score that a student must have, so as to maximize the possibility of graduating within 5 years? Solution (a) We know the following: P(G) = 08+ P(nG) —P(G) = 0.2 P(x|G) ~.N(26,4) P(a|nG) ~ NV (22,9) We want to compute the probability P(G|r) for a FUT score of r v(x = 21|G)P(G) PUG = 21) = aw @ 3 Pe a Pattern Recognition - Bayesian Decision Theory 3 So, we have (2) & (8) & (4) > ple = 21) = 0.0088 - 0.8 + 0.1258 - 0.2 = 0.0822 6 0.0088 0.8 (1) & @) & 6) >| P(Gle = 21) = “TE = ois (b) We have to compute the minimum value of « so that the following relationship between the posterior probability holds: P(G\x) > P(nG\z) ‘The minimum value of x corresponds to the case where the two probabilities are equal. So: P(G\z) = P(nG\e) px|@)p(G) plx|nG)P(nG) vr) pa) pl2|@)(G) = pleinG)P(nG) Lgl? 1 ppm? 08 02. 08-3 0.2-2 = 0 Iné = Ting 5a? -292r +4019 = 0 By solving this second order equation, we get 1 = 22.21 and x = 36.19. Therefore, the solution that we seek Is.» = 22.21 since we want the minimum FUT score. Furthermore, the second solution Is not in the range of acceptable values for the FUT score, which was defined to be in [0,35]. Graphically, the solution Is shown Is Fig. 1. Pattern Recognition - Bayesian Decision Theory 4 02 0.15), oat 0.05). 6 10 1% 20 2 30 95 Figure 1: The a-priori densities P(r|G) and P(x|nG) scaled by their corresponding probabilities P(G) and P(nG) and the point = 22.21 where they become equal. Exercise 3: Classification of candidate students based on their FUT score and class ranking Last year, the University of Tahiti decided to use another metric in addition to the FUT score, This second metric is the class ranking y of the student during the last class of High School. This ranking is expressed. as the percentage of classmates with grade less than or equal to the candidate student. Assume that the probability density functions p(x, y|@) and p(r,yln@) are bi-variate Gaussian P(x, 9\G) ~ N (Hg, Ea) P(t. ylnG) ~ N(p44@, Ena) with, 46 Ze= 6 40 and 22 9 12 Png = > Ene = 70 12 58 (a) What is the probability that a student graduates within 5 years given that his FUT score is 21.5 and this class ranking is 7296? (b) Determine the optimal decision boundary between the two classes Gand nG. If x € [0,35] and y Pattern Recognition - Bayesian Decision Theory 5 (50,100), draw contours of constant probability density for the two classes, Also, draw the optimal decision boundary. |@ Inverse of a square matrix ‘The inverse of a 2 x 2 matrix, can be very easily calculated based on the following formula: o a a A=|P = (Mo ae aei(A) é where det(A) = ad— ch is the determinant of A, sundion i A] ewan compete aati Por = [as oo) = Benl@)PG) : P(Gla) = EO” o p(@) = p(a|G)P(G) + planG)P(nG) a pl@|@) = be He He" Baa) © on /Bal [Bc] =4-40- 6-6 = 124, o ; a P1528) e.an26 ao ju [215-26 72-85 = 6.33 -6 4|| 72-85 (@ = Hg)" EG! (@ = We) = From (8) & (9) & (10) = p(21.5,72|G) = 6.0558e — 004 ay Pattern Recognition - Bayesian Decision Theory ple|nG) 1 He-ine)" Bete) 2nV/Enal 58-12 [Buc] =9-58— 12-12 129 r-1 L 58 -12| |21.5- 22] (@ Bae)" Bnal® ~ Hac) = xq |21-5-22 72-70) = 0.1971 -12 9 | | 72-70 From (12) & (13) & (14) = p(21.5, 72\nG) = 0.0074 Finally, applying (6) we get) P(G|21.5,72) = 0.2466 a2) a3) aay as) (b) We first compute the discriminant funetion for each class. Since, the probability density functions are Gaussian with different covariances, the discriminant function takes the form: g(t) = 2" Wie + wet wio where: -6| |-0.1613 0.0242 4 0.0242 0.0161 0.0767 0.0159 0.0159 —0.0119) we = BG! te = 40-6] |26) 4.2742 1.4839 waa = Drake = 37g . -l2 8 70 0.9683 1 * 12] |22] _ | 11534 Pattern Recognition - Bayesian Decision Theory 7 40-6) }26} 4 —Ein(124)+in(0.8) -6 4} fas] 2 22 ‘ca 121.2623 Ley, gub%g' wa~zln|Zo|+InP(G) a ai |?6 In| Ene|+nP(nG) = inc Eng Hua —Lingrs)+in6. 2) = 1.15 70 Hence, gee) = a Woet+wee+weo 0.1613 0.0242 | |x . = le | + [tore asso] = 121.2623 0.0212 0.0161] | y u 0.1613? + 0.0242xry + 0.0242ry — 0.016 ly? + 4.27420 + 1.4839y — 121.2623 = a Wace + whet + wae -ooror 0059} fe = [ro] + [Lasse oss |") ~s1.1590 0.0159 —0.0119] Jy ¥ = 0.076727 + 0.0159zy + 0.0159ry — 0.0119y7 + 1.5342 + 0.9683y ~ 51.1536 ‘The equation for the optimal decision boundary is given by: 9a() = Gnc(w) (0042y? + 0.0166ry + 3.1208x + 0.5156y — 70.1087 Fig. 2 depicts the contours of constant probability density for the two classes and the optimal decision boundary that we have found Pattern Recognition - Bayesian Decision Theory 100 90) 80) 70) Percentile rank. 60) So, 5 10 15. 20 25 30 35 FUT soore Figure 2: Contours of constant probability density for the two classes and the optimal decision boundary. Exercise 4: Classification with rejection option In many pattern classification problems one has the option either to assign the pattern to one of ¢ classes or to reject it as being unrecognizable. If the cost for rejects is not too high, rejection may be a desirable action, Let: ij “ » etl As. otherwise where A, is the loss incurred for choosing the (¢ + 1)-th action of rejection and A, {s the loss incurred for making a substitution error. Show that the minimum risk is obtained: # if we decide ui if P(uilar) > P(wj|er) for all j ani if P(wila) > 1- 3%, « reject otherwise What happens if \, = 0? What happens if \, = A, and what happens if , > A.? Solution We have a classification problem with a choice for rejection: Nails) =) Av Ay otherwise Pattern Recognition - Bayesian Decision Theory ‘The risk of taking action a; is: ‘So we have Rlaile) = Y>oile;)rlw;l) ii = ¥ rvwle) ee =a DS pl; sane = Ad(l— plate) For taking the rejection decision we have: Rlacsile) = > Alaer1|ej)p(wyle) mi = +d plwsle) ia = Action ay is taken if its risk is smaller than the risk of taking another action a). i # j: Raila) < Rajlz), Vj =I... 5 AIS As(L— plwile)) < As(1— plwyl@)), VWe=1.- ei Ais Also, the risk of action a; has to be less than the risk of rejection. That is: R(ajle) < Races |x) > As(1 = p(wile)) < Av > Plwila) 21 55 6) an Pattern Recognition - Bayesian Decision Theory 10 IF, =0, then for all values of, the system always rejects the decision, since this action has a eost of zero. If, = As then eq. (17) is always true and we choose the class with the maximum posterior probability. according to 4. (16) ‘The more A; is greater than A,, the less chances are that a feature be rejected by the system. If, is much greater than A,, then the rejection decision will never be taken, Exercise 5: Classification with multiple classes Consider a classification problem with two-dimensional patterns, d = 2, and three classes, ¢ = 3, where: i) ~N (uj Bi), 1 1,2,3 and and suppose that the prior probabilities are equal P(w1) = P(w2) = Plus). (a) Draw contours of constant probability p(ux) in R? for each class. o Calculate the discriminant function giz) for each class. a Express gj(«r) as a linear function, for each class. @ Calculate and draw the decision boundaries. How many are they? What is their form? Solution (a) The contours of constant probability density are shown in Fig. 3. The samples fall in ellipsoidal clusters of equal size and shape, the cluster for the ith class being centered around the mean vector #;. Since © is a diagonal matrix, the axes of the ellipsoids are parallel to the Cartesian coordinate system. © In this exercise, we will start from the general expression for a discriminant function, which for our case is: sale) = Inp(a ey) + nPws) Moreover, valu Hew) B en) 2n|Bil Pattern Recognition - Bayesian Decision Theory ul Figure 3: Contours of constant probability density for the three classes. and thus the discriminant function becomes: He ~ a) Ez (2n) - Fini +InP(wi) ‘ 1 where the terms 5]n(2r), 5ln)3;|,and InP(w) can be ignored since they are common in all the three classes. Hence, the discriminant function becomes: and we have: g(x) g(t) = Pattern Recognition - Bayesian Decision Theory 12 {c) Expanding the quadratic form (z:—1,)" ©; '(a—j1,) gives us a sum that includes the term «'E; !e which is common to the three classes. since they have the same covariance matrix. It can thus be ignored, and the discriminant functions become linear: (a) = wi Ste Said a: and we have 1 0} }m] 4 1 0] Jo nies = [oa] fo aft | [] cone 0 3} |ro 0 3] {2 rol fu) a 1 ol [3 ate) = fs] if] 0 3} |ro 0 3} ft (2) [ | 1 0} Jar 3 | 1 0} fj 1 wz) = fi o -5]10 1-5 0 a 0 0 2 1 as) (@) As the discriminant functions are linear, the decision boundaries are half-lines. ‘The directions and origins of these half-lines can be found from the solution of the three equations that give us the decision boundaries for each one of the 3 pairs of classes. The three half-lines have an intersection point a(x) = gale) > 629 - 6 = 31, +3 6 =m a9) 1 ag 1 a(t) = 99(2) > 622 ~ 6 = 2 ~ 5 (19) 6m -6=4~5>[ a1 =11 (20) 1 mn(2) = gala) + 32, +31y—6 = 21 ~ 5 ev For the point [1.1, 1-1] eq. (21) also holds, hence the intersection point of the three lines define their origin. ‘The decision boundaries are shown in Fig. 4. Pattern Recognition - Bayesian Decision Theory Figure 4: Contours of constant probability density and decision boundaries for the three classes.

You might also like