Professional Documents
Culture Documents
CS212 Unit 5
CS212 Unit 5
CS212 Unit 5
Contents
1 CS212 Unit 5......................................................................................................................................................1/14 1.1 1. 01 Welcome Back...........................................................................................................................1/14 1.2 2. 02 Porcine Probability .....................................................................................................................2/14 1.3 3. 03 q The State of Pig .......................................................................................................................2/14 1.4 4. 03 s The State of Pig.......................................................................................................................3/14 1.5 5. 04 l Concept Inventory ....................................................................................................................3/14 1.6 6. 05 p Hold and Roll..........................................................................................................................3/14 1.7 7. 05 s Hold and Roll..........................................................................................................................4/14 1.8 8. 06 l Named Tuples..........................................................................................................................4/14 1.9 9. 07 p Clueless...................................................................................................................................4/14 1.10 10. 07 s Clueless...............................................................................................................................4/14 1.11 11. 08 p Hold At Strategy.................................................................................................................4/14 1.12 12. 08 s Hold At Strategy ..................................................................................................................5/14 1.13 13. 09 p Play Pig...............................................................................................................................5/14 1.14 14. 09 s Play Pig ................................................................................................................................5/14 1.15 15. 10 l Dependency Injection..........................................................................................................6/14 1.16 16. 11 p Loading the Dice.................................................................................................................6/14 1.17 17. 11 s Loading the Dice.................................................................................................................6/14 1.18 18. 12 q Optimizing Strategy............................................................................................................6/14 1.19 19. 12 s Optimizing Strategy............................................................................................................7/14 1.20 20. 13 l Utility...................................................................................................................................7/14 1.21 21. 14 q Game Theory......................................................................................................................7/14 1.22 22. 14 s Game Theory.......................................................................................................................8/14 1.23 23. 15 q Break Even Point................................................................................................................8/14 1.24 24. 15 s Break Even Point.................................................................................................................9/14 1.25 25. 16 q Whats your Crossover.........................................................................................................9/14 1.26 26. 17 l Optimal Pig..........................................................................................................................9/14 1.27 27. 18 l Pwin.....................................................................................................................................9/14 1.28 28. 19 p Maxwins ..............................................................................................................................9/14 1.29 29. 19 s Maxwins............................................................................................................................10/14 1.30 30. 20 l Impressing Pig Scouts ........................................................................................................10/14 1.31 31. 21 p Maximizing Differential...................................................................................................10/14 1.32 32. 21 s Maximizing Differential....................................................................................................10/14 1.33 33. 22 l Being Careful.....................................................................................................................10/14 1.34 34. 23 p Legal Actions....................................................................................................................11/14 1.35 35. 23 s Legal Actions....................................................................................................................11/14 1.36 36. 24 l Using Tools........................................................................................................................11/14 1.37 37. 25 l Telling A Story..................................................................................................................12/14 1.38 38. 26 q Simulation vs Enumeration...............................................................................................12/14 1.39 39. 26 s Simulation vs Enumeration...............................................................................................13/14 1.40 40. 27 l Conditional Probability......................................................................................................13/14 1.41 41. 28 q Tuesday.............................................................................................................................13/14 1.42 42. 28 s Tuesday.............................................................................................................................13/14 1.43 43. 29 l Summary............................................................................................................................14/14
1 CS212 Unit 5
Contents 1. 01 Welcome Back 2. 02 Porcine Probability 3. 03 q The State of Pig 4. 03 s The State of Pig 5. 04 l Concept Inventory 6. 05 p Hold and Roll 7. 05 s Hold and Roll 8. 06 l Named Tuples 9. 07 p Clueless 10. 07 s Clueless 11. 08 p Hold At Strategy 12. 08 s Hold At Strategy 13. 09 p Play Pig 14. 09 s Play Pig 15. 10 l Dependency Injection 16. 11 p Loading the Dice 17. 11 s Loading the Dice 18. 12 q Optimizing Strategy 19. 12 s Optimizing Strategy 20. 13 l Utility 21. 14 q Game Theory 22. 14 s Game Theory 23. 15 q Break Even Point 24. 15 s Break Even Point 25. 16 q Whats your Crossover 26. 17 l Optimal Pig 27. 18 l Pwin 28. 19 p Maxwins 29. 19 s Maxwins 30. 20 l Impressing Pig Scouts 31. 21 p Maximizing Differential 32. 21 s Maximizing Differential 33. 22 l Being Careful 34. 23 p Legal Actions 35. 23 s Legal Actions 36. 24 l Using Tools 37. 25 l Telling A Story 38. 26 q Simulation vs Enumeration 39. 26 s Simulation vs Enumeration 40. 27 l Conditional Probability 41. 28 q Tuesday 42. 28 s Tuesday 43. 29 l Summary
1/14
CS212 Unit 5
06/05/12 11:49:53
CS212 Unit 5
06/05/12 11:49:53
problem. So, we want to know the current state of the game. If we're thinking of search problems then we also have to know about actions we can take. We know that there are two actions: Roll and hold. So, here's some candidates for what's in the current state. First, the things that were on the scoreboard. The scoreboard, remember, had three things. Then the player whose turn it is, we might want that to be part of the state. The previous role of the dice, whether I just rolled a five or something else, that might be part of the state. The previous turn score, how much did the other player just make on their turn? So, all of these are possibilities. You might be able to think of other possibilities. I want you to tell me which one of these are necessary to describe the state of the game. I guess I should say here that we're assuming that the goal of the game, the number of points you need win, we're assuming that's constant and doesn't need to be represented in each individual state. We just represent it once for the whole game. Which of these are necessary for the current state?
CS212 Unit 5
06/05/12 11:49:53
1.9 9. 07 p Clueless
Now I'm going to talk about strategies for a minute. Remember a strategy is a function, which takes a state as input, and it's output is one of the action names, roll or hold. I want you to write a strategy function, which we're calling clueless. So its a function that takes a state as input, and it's going to return one of the possible moves, roll or hold. It does that by ignoring the state and just choosing one of the possible moves at random. So go ahead and write that.
CS212 Unit 5
06/05/12 11:49:53
strategy function. Rather, it's going to return a strategy function. So I've given you this outline of saying we're going to define a strategy function, then we're going to fix up it's name a little bit to describe it better. Then we're going to return it. You have to write the code within the strategy function. I should say, we're going to stick with the representations of states, where state is a four tuple of the player to move, zero one, me and you score, and the pending score.
5/14
CS212 Unit 5 3 state = (0,0,0,0) 4 while True: 5 (p, me, you, pending) = state 6 if me >= goal: 7 return strategies[p] 8 elif you >= goal: 9 return strategies[other[p]] 10 elif stategies[p](state) == 'hold': 11 state = hold(state) 12 else: 13 state = roll(state, random.randint(1,6)))
06/05/12 11:49:53
CS212 Unit 5
06/05/12 11:49:53
worst path? Well, obviously, we're looking for the best path and we can describe that and once we've got that description we've got to search it outward. Now we've gone beyond search in two ways. The most obvious is we're dealing with probability, so we've got dice or whatever other random element there is, and then in addition to that, for the big game, we introduced another complication, which is our opponent. And now this question of what each of these three are trying to do, and I want you to tell me, is our opponent trying to get the best, and that means best score for "me," or is the opponent trying to get the worst score for "me," assuming we're diametrically opposed. So the worst score for "me" would be the best score for the opponent, or is the opponent trying to come up with the outcome that is average? And tell me the same for the dice. Is the dice with "me" in trying to get the best result for "me?" Is the dice plotting against "me" in trying to get the worst result for "me?" Or is the dice going to average out? Go ahead and click the appropriate boxes there.
CS212 Unit 5
06/05/12 11:49:53
with abbreviation Q. So I'm going to define here a quality function that says, given a state and an action, what's my--and given utility, what state is worth to me that's going to tell me the value of that state action pair? And the actions available to me are holding and gambling. Let's go ahead and make that explicit. So in any state, the actions available are holding and gambling, where we're only going to deal with 1 state, but we make this perfectly general. And the state that I start with is, however many dollars I have in my pocket-- could be anything. And given that state, if I hold, my state is going to be increased by $1 million, and then there's some utility on that--how much do I value having what I have now plus 1 million. And if I gamble, there's a 50% chance that I get 3 million more than I have now. There's some utility for that. And a 50% chance that I get nothing more than I have now, and some utility for that. So that describes the quality of the state, but only describes it if I have a utility function. I have to know how much do I like money? Well, the simplest choice for utility function is the identity function. Say the identity function just takes any input x and returns x. It's the input itself, and so we could say, if I start with nothing, the value of the state of having nothing is 0, and the value of the state of having a million is a million. Now here's--the amazing thing is, I can just write out what the optimal strategy is, what the best action is for this state, and what it's going to be is the maximum over all the possible actions from the state, that was just hold and gamble, maximized by EU, which stands for Expected Utility. Expected meaning average. So what's the average utility of each of the actions, and I've defined the average utility as the quality of that state, given that state action pair under the utility function? And that means that the Q had to deal with the averaging, and it did that. It said, 50% this, 50% that. That's the value of gambling. Now this best_action function solves this particular problem. But the amazing thing is is that we can completely generalize this, so if we just add in parameters, now we're saying what's the best_action in a particular state if you tell me what the available actions are, what the quality of each state action pair is, and what the utility is over states, then I can tell you what the best_action is. That works for any possible domain that you can define. It's an amazing thing that we solved all the problems at once. Similarly to the way in search where we had 1 best_search algorithm that could solve all of the search problems. Now it doesn't mean that we're done, and we never have to program anything again because programming can be difficult. There's some problems that don't fit into this type of formulation, and there are many, many problems which you can describe, but which you can't solve in a feasible amount of time. So we haven't solved everything, but it is amazing how much we can do with just this 1 short function. Let's go ahead and solve it. Let's solve this problem, and let's say I start off with $100, what's my best_action? Then when I run that, it tells me the best_action is gamble. Now that doesn't sound quite right to me. If you are faced with that problem, assuming you had $100 to your name. Would you take the gamble--try to go for the 3 million, or would you hold with 1 million? And there's no right or wrong answer to this despite what the interface has to do. It has to tell you one answer is right or wrong, but you can ignore that. I just want to collect some data on how many people think that they would gamble in that situation and how many people think they would hold.
CS212 Unit 5
06/05/12 11:49:53
9/14
CS212 Unit 5
06/05/12 11:49:53
CS212 Unit 5
06/05/12 11:49:53
state, what's the utility function going to return? Well, it's going to return a number, and play_pig just says, "Well, does that number equal to hold?" No, it's not. No numbers are equal to hold. So then I'm just going to assume that you meant roll. And so the fact that I passed in a completely wrong function that's doing nothing related to strategy-- it's returning a number rather than an action-- went completely unnoticed, and instead what my strategy was-- the utility function that returns a number acted as if it was a strategy function that always said roll. Now, in general, that's one of the complaints that people have about Python is that it's too easy to make that mistake because you don't have to declare for each function what are its inputs and what are its outputs. In other languages, you would do that, and the program where you accidentally used a utility function where you expected a strategy function-- that program wouldn't even compile. You'd get an error message before you ran it. In Python, you don't have that protection, so you've got to build in the protection yourself.
CS212 Unit 5
06/05/12 11:49:53
behind 39-0 in a game to 40, and say he's accumulated 30 points, If he's trying to maximize the probability of winning, he would keep on rolling. He says, well, I don't have that good of a chance of winning, but all that counts is winning. If I stop now, the opponent's going to win on the next move, so I've got to keep rolling. Probably I'll pig out and only get 1 point, but it's worth it for that small chance of winning. That's what the maximize win probability strategy would do. The maximize differential strategy would say, hey, if I can get 30 points rather than 1, that cuts the differential way down, so that's worth doing. I'll sacrifice winning in order to maximize the differential. Now that's a suggestion of a story, but I don't know yet. Is that the right story? Let's find out.
06/05/12 11:49:53
CS212 Unit 5
06/05/12 11:49:53
Tuesday--is 13/27. And here's the reason--at least 1 boy, born on Tuesday, has 27 elements--and there they are-and of these, 13 are 2 boys--and there they are. And so, you can't really argue with that. You can go through and you can make sure that that's correct, and you can look at the other elements of the sample space and say no, we didn't miss any-- so that's got to be the right answer. It's not quite intuitive yet, and I'd like to define my report function so that it gives me that intuition but right now, I don't have the right visualization. So I've got to do some of the work myself. And here's what I came up with: We still have the four possibilities that we showed before but now we're interested, not just in boys-- we're interested in boys born on Tuesday. So there's going to be some others over here where there's, say, boy born on Wednesday, along with some other partner-- maybe a boy born on Saturday. But we're not even considering them; we're throwing all those out. We're just considering the ones that match here. And like before, we draw 2 circles: one of the right-hand side of the event-- of the conditional probability. And so how many of those are there? Well, there's 7 possibilities here because the boy has to be born on Tuesday-- there's only 1 way to do that--but there's 7 ways for the girls to be born. So there's 7 elements of the sample state there; likewise, 7 elements over here. Now how many elements over here? Well here, either one of the 2 can be a boy born on Tuesday. So really, we should draw this state as either a boy born on Tuesday, followed by another boy or a boy, followed by a boy born on Tuesday. And how many of those are there? Well, there's 7 of these by the same argument we used in the other case, and of these, there's also 7 but now I've double-counted because in one of these 14 cases is a boy born on Tuesday, followed by a boy born on Tuesday. So I'll just count 6 here. And so now it should be clear: 7, 14, 21, 6, 27. There's 27 on the right-hand side, and then what's the probability of 2 boys, given this event of at least 1 boy born on Tuesday? Well, 2 boys--that's here--so it's 13 out of the 27. So that's the result. Seems hard to argue with. Both the drawing it out with a pen and the computing worked out to the same answer. Now why is it that we have a strong intuition that, knowing the boy born on Tuesday shouldn't make any difference? I think the answer is because we're associating that fact with an individual boy. We're like taking that fact and nailing it on to him--and it's true. If we did that, that wouldn't make any difference. But, in this situation, that's not what we're doing. We're not saying anything about any individual boy. If we did that, the computation wouldn't change. Rather, we're making this assertion that at least one was born on Tuesday-- not about boys, but about pairs. And we just don't have very good intuitions about what it means to say something about a pair of people, rather than about an individual person, and that's what we did here-- and that's why the answer comes out to 13/27.
14/14