Professional Documents
Culture Documents
The Beer Game Slides 1196776986610634 3
The Beer Game Slides 1196776986610634 3
Goal
Minimize system-wide (chain) long-run average cost.
Timing
1. New shipments delivered. 2. Orders arrive. 3. Fill orders plus backlog. 4. Decide how much to order. 5. Calculate inventory costs.
Game Board
W holesaler's O rder
30
30
20
Order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Order
20
10
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
W eek
W eek
D istributor's O rder
40 40
Factory's O rder
30
30
Order
20
Order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
20
10
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
W eek
W eek
1-1 policy is optimal -- order whatever amount is ordered from your customer.
Additional assumptions:
Only the Retailer incurs penalty cost. Demand distribution is common knowledge. Fixed information lead time. Decreasing holding costs upstream in the chain.
Agent-Based Approach
Agents work as a team. No agent has knowledge on demand distribution. No information sharing among agents. Agents learn via genetic algorithms. Fixed or stochastic leadtime.
Research Questions
Can the agents track the demand? Can the agents eliminate the Bullwhip effect? Can the agents discover the optimal policies if they exist? Can the agents discover reasonably good policies under complex scenarios where analytical solutions are not available?
Flowchart
Experiment 1b
All four Agents learn under the environment of experiment 1a. ber rule for the team. All four agents find 1-1.
Result of Experiment 1b
All four agents can find the optimal 1-1 policy
9 8 7 6 R etai ler 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 W holeSaler D i s tr i b u te r F a c to r y
W eek
Artificial Agents Whip the MBAs and Undergraduates in Playing the MIT Beer Game
Accumulated Cost Comparison of MBAs and our agents
5000
4000
Accumulated Cost
MBA Group1 3000 MBA Group2 MBA Group3 Agent UnderGradGroup1 2000 UnderGradGroup2 UnderGradGroup3
1000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Week
Agents eliminate the Bullwhip effect. Agents find better policies than 1-1.
Artificial agents discover a better policy than 1-1 when facing stochastic demand with penalty costs for all players.
A cc u m u la te d C o st v s. W e e k
50 00
40 00
Accumulated Cost
30 00
A ge nt Co s t 1-1 Cos t
20 00
10 00
0 1 3 5 7 9 13 15 19 25 27 31 33 11 17 21 23 29 We e k 35
Agents find better policies than 1-1. No Bullwhip effect. The polices discovered by agents are Nash.
Artificial agents discover better and stable policies than 1-1 when facing stochastic demand and stochastic lead-time.
8000
6000
4000
2000
0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
Week
35
Artificial Agents are able to eliminate the Bullwhip effect when facing stochastic demand with stochastic leadtime.
25 20
15
10
0 1 3 5 7 9 11 13 19 21 15 17 23 25 27 29 31 33
Week
35
Agents learning
Generation Winner Strategies Retailer Wholesaler Distributor Manufacturer Total Cost
0 1 2 3 4 5 6 7 8 9 10
7380 7856 6987 6137 6129 3886 3071 2694 2555 2555 2555
Agents find the optimal policy: order whatever is ordered with time shift, i.e., Q1 = D (t-1), Qi = Qi-1 (t li-1).
Summary
Agents are capable of playing the Beer Game
Track demand. Eliminate the Bullwhip effect. Discover the optimal policies if exist. Discover good policies under complex scenarios where analytical solutions not available.