The Beer Game Slides 1196776986610634 3

Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs
Steven O. Kimbrough D.-J. Wu Fang Zhong

FMEC, Philadelphia, June 2000; file: beergameslides.ppt
The MIT Beer Game

Players
Retailer, Wholesaler, Distributor and Manufacturer.
Goal
Minimize system-wide (chain) long-run average cost.
Information sharing: Mail. Demand: Deterministic. Costs

Holding cost: $1.00/case/week. Penalty cost: $2.00/case/week.
Leadtime: 2 weeks physical delay
Timing
1. New shipments delivered. 2. Orders arrive. 3. Fill orders plus backlog. 4. Decide how much to order. 5. Calculate inventory costs.
Game Board
The Bullwhip Effect

Order variability is amplified upstream in the supply chain. Industry examples (P&G, HP).
Observed Bullwhip effect from undergraduates game playing

R etailer's O rder
40 40
W holesaler's O rder
30
30
20
Order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Order
20
10
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
W eek
W eek
D istributor's O rder
40 40
Factory's O rder
30
30
Order
20
Order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
20
10
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
W eek
W eek
Bullwhip Effect Example (P & G)

Lee et al., 1997, Sloan Management Review
Analytic Results: Deterministic Demand

Assumptions:
Fixed lead time. Players work as a team. Manufacturer has unlimited capacity.
1-1 policy is optimal -- order whatever amount is ordered from your customer.
Analytic Results: Stochastic Demand

(Chen, 1999, Management Science)
Additional assumptions:
Only the Retailer incurs penalty cost. Demand distribution is common knowledge. Fixed information lead time. Decreasing holding costs upstream in the chain.
Order-up-to (base stock installation) policy is optimal.
Agent-Based Approach
Agents work as a team. No agent has knowledge on demand distribution. No information sharing among agents. Agents learn via genetic algorithms. Fixed or stochastic leadtime.
Research Questions
Can the agents track the demand? Can the agents eliminate the Bullwhip effect? Can the agents discover the optimal policies if they exist? Can the agents discover reasonably good policies under complex scenarios where analytical solutions are not available?
Flowchart
Agents Coding Strategy

Bit-string representation with fixed length n. Leftmost bit represents the sign of + or -. The rest bits represent how much to order. Rule x+1 means if demand is x then order x+1. Rule search space is 2n-1 1.
Experiment 1a: First Cup

Environment:
Deterministic demand with fixed leadtime. Fix the policy of Wholesaler, Distributor and Manufacturer to be 1-1. Only the Retailer agent learns.
Result: Retailer Agent finds 1-1.
Experiment 1b
All four Agents learn under the environment of experiment 1a. ber rule for the team. All four agents find 1-1.
Result of Experiment 1b
All four agents can find the optimal 1-1 policy
9 8 7 6 R etai ler 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 W holeSaler D i s tr i b u te r F a c to r y
W eek
Artificial Agents Whip the MBAs and Undergraduates in Playing the MIT Beer Game
Accumulated Cost Comparison of MBAs and our agents
5000
4000
Accumulated Cost
MBA Group1 3000 MBA Group2 MBA Group3 Agent UnderGradGroup1 2000 UnderGradGroup2 UnderGradGroup3
1000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Week
Stability (Experiment 1b)

Fix any three agents to be 1-1, and allow the fourth agent to learn. The fourth agent minimizes its own long-run average cost rather than the team cost. No agent has any incentive to deviate once the others are playing 1-1. Therefore 1-1 is apparently Nash.
Experiment 2: Second Cup

Environment:
Demand uniformly distributed between [0,15]. Fixed lead time. All four Agents make their own decisions as in experiment 1b.
Agents eliminate the Bullwhip effect. Agents find better policies than 1-1.
Artificial agents eliminate the Bullwhip effect.

20 18 16 14 12 Order 10 8 6 4 2 0 1 3 5 7 9 29 31 11 13 15 17 19 21 23 25 27 33 Week 35 Retailer WholeSaler Factory Distributer
Artificial agents discover a better policy than 1-1 when facing stochastic demand with penalty costs for all players.
A cc u m u la te d C o st v s. W e e k
50 00
40 00
Accumulated Cost
30 00
A ge nt Co s t 1-1 Cos t
20 00
10 00
0 1 3 5 7 9 13 15 19 25 27 31 33 11 17 21 23 29 We e k 35
Experiment 3: Third Cup

Environment:
Lead time uniformly distributed between [0,4]. The rest as in experiment 2.
Agents find better policies than 1-1. No Bullwhip effect. The polices discovered by agents are Nash.
Artificial agents discover better and stable policies than 1-1 when facing stochastic demand and stochastic lead-time.
8000
6000
4000
1-1 cost Agent cost
2000
0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
Week
35
Artificial Agents are able to eliminate the Bullwhip effect when facing stochastic demand with stochastic leadtime.
25 20
15
Retailor Order WholeSaler Order Distributer Order Factory Order
10
0 1 3 5 7 9 11 13 19 21 15 17 23 25 27 29 31 33
Week
35
Agents learning
Generation Winner Strategies Retailer Wholesaler Distributor Manufacturer Total Cost
0 1 2 3 4 5 6 7 8 9 10
x0 x+3 x0 x1 x+0 x+3 x0 x+2 x+1 x+1 x+1
x1 x2 x+5 x+5 x+5 x+1 x+1 x+1 x+1 x+1 x+1
x+4 x+2 x+6 x+2 x0 x+ 2 x+2 x+2 x+2 x+2 x+2
x+2 x+5 x+3 x+3 x2 x+3 x+0 x+ 1 x+1 x+1 x+1
7380 7856 6987 6137 6129 3886 3071 2694 2555 2555 2555
The Columbia Beer Game

Environment:
Information lead time: (2, 2, 2, 0). Physical lead time: (2, 2, 2, 3). Initial conditions set as Chen (1999).
Agents find the optimal policy: order whatever is ordered with time shift, i.e., Q1 = D (t-1), Qi = Qi-1 (t li-1).
Ongoing Research: More Beer

Value of information sharing. Coordination and cooperation. Bargaining and negotiation. Alternative learning mechanisms: Classifier systems.
Summary
Agents are capable of playing the Beer Game
Track demand. Eliminate the Bullwhip effect. Discover the optimal policies if exist. Discover good policies under complex scenarios where analytical solutions not available.
Intelligent and agile supply chain. Multi-agent enterprise modeling.
A framework for multi-agent intelligent enterprise modeling

Pricing A gent Investm ent A gent
Executive C m om unity (StrategyFinder)
Production C m om unity (LivingFactory)
Supply C hain C m om unity (D ragonC hain)
E-M arketplace C m om unity (eB C A )
Factory A gent D istributor A gent
R etailer A gent W holesaler A gent C ontracting A gent
B idding A gent A uction A gent

The Beer Game Slides 1196776986610634 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Beer Game Slides 1196776986610634 3

Uploaded by

Copyright:

Available Formats

Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs

Steven O. Kimbrough D.-J. Wu Fang Zhong

The MIT Beer Game

Information sharing: Mail. Demand: Deterministic. Costs

Leadtime: 2 weeks physical delay

The Bullwhip Effect

Observed Bullwhip effect from undergraduates game playing

Bullwhip Effect Example (P & G)

Analytic Results: Deterministic Demand

Analytic Results: Stochastic Demand

Order-up-to (base stock installation) policy is optimal.

Agents Coding Strategy

Experiment 1a: First Cup

Result: Retailer Agent finds 1-1.

Stability (Experiment 1b)

Experiment 2: Second Cup

Artificial agents eliminate the Bullwhip effect.

Experiment 3: Third Cup

1-1 cost Agent cost

Retailor Order WholeSaler Order Distributer Order Factory Order

x0 x+3 x0 x1 x+0 x+3 x0 x+2 x+1 x+1 x+1

x1 x2 x+5 x+5 x+5 x+1 x+1 x+1 x+1 x+1 x+1

x+4 x+2 x+6 x+2 x0 x+ 2 x+2 x+2 x+2 x+2 x+2

x+2 x+5 x+3 x+3 x2 x+3 x+0 x+ 1 x+1 x+1 x+1

The Columbia Beer Game

Ongoing Research: More Beer

Intelligent and agile supply chain. Multi-agent enterprise modeling.

A framework for multi-agent intelligent enterprise modeling

Executive C m om unity (StrategyFinder)

Production C m om unity (LivingFactory)

Supply C hain C m om unity (D ragonC hain)

E-M arketplace C m om unity (eB C A )

Factory A gent D istributor A gent

R etailer A gent W holesaler A gent C ontracting A gent

B idding A gent A uction A gent

You might also like