Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

PlotMap: Automated Layout Design for Building Game Worlds

YI WANG, Autodesk Research, USA


JIELIANG LUO, Autodesk Research, USA
ADAM GAIER, Autodesk Research, Germany
EVAN ATHERTON, Autodesk Research, USA
HILMAR KOCH, Autodesk Research, USA
arXiv:2309.15242v1 [cs.AI] 26 Sep 2023

Fig. 1. We derive spatial constraints from a story and use Reinforcement Learning to layout locations mentioned in the story on a map to satisfy the constraints.

World-building, the process of developing both the narrative and physical 1 INTRODUCTION
world of a game, plays a vital role in the game’s experience. Critically- Landscapes in video games serve as more than just scenic backdrops;
acclaimed independent and AAA video games are praised for strong world-
they interact intimately with the unfolding narrative, defining and
building, with game maps that masterfully intertwine with and elevate the
narrative, captivating players and leaving a lasting impression. However,
shaping the player’s experience. This interaction is pivotal, under-
designing game maps that support a desired narrative is challenging, as it pinning the player’s journey. By giving game designers tools for
requires satisfying complex constraints from various considerations. Most better supporting narratives with game map design, they can create
existing map generation methods focus on considerations about gameplay games that are more cohesive and immersive for the player.
mechanics or map topography, while the need to support the story is typically Designing game maps is difficult, as it requires designers to con-
neglected. As a result, extensive manual adjustment is still required to design sider varied qualities such as realistic topography [Kelly and McCabe
a game world that facilitates particular stories. In this work, we approach 2017; Smelik et al. 2009] and game playability [van der Linden et al.
this problem by introducing an extra layer of plot facility layout design that is 2014] at the same time. Designing a map that supports a given story
independent of the underlying map generation method in a world-building adds more constraints, making the problem even more challenging.
pipeline. Concretely, we present a system that leverages Reinforcement
While the need to support an underlying story is typically ne-
Learning (RL) to automatically assign concrete locations on a game map
to abstract locations mentioned in a given story (plot facilities), following
glected in most existing map generation methods, some efforts
spatial constraints derived from the story. A decision-making agent moves have been made to develop the story first, then to codify it as plot
the plot facilities around, considering their relationship to the map and each points and graphs so that maps can be generated based on their
other, to locations on the map that best satisfy the constraints of the story. relations [Hartsook et al. 2011; Valls-Vargas et al. 2013]. However, as
Our system considers input from multiple modalities: map images as pixels, [Dormans and Bakkes 2011] pointed out, the principles that govern
facility locations as real values, and story constraints expressed in natural the design of the spatial and the narrative side of the game are differ-
language. We develop a method of generating datasets of facility layout ent, and thus these two processes should be independent. Methods
tasks, create an RL environment to train and evaluate RL models, and further for generating game maps from stories can appear artificially con-
analyze the behaviors of the agents through a group of comprehensive trived to fit a narrative, and it is not straightforward to combine
experiments and ablation studies, aiming to provide insights for RL-based
these methods with those that also take into account game design
plot facility layout design. We will release a dataset containing 10,000 tasks,
the RL environment, and the RL models.
and geographical considerations.
As a result, designing a game world that facilitates a story requires
extensive manual modification; and as the number of constraints
scale, the challenge of designing a map that satisfies all of the con-
Additional Key Words and Phrases: Reinforcement Learning, World Building,
straints of the story can become intractable, if not impossible, for a
Plot Facility Layout Design
designer to do by hand [Matsumoto 2022].
1:2 • Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, and Hilmar Koch

In this work, we approach this problem by introducing an ex- To further promote the research of facility layout design for game
tra layer of plot facility layout design that is independent to the map generation, we will release the data, models, and code necessary
underlying map generation method in a world-building pipeline. for training and evaluation.
Our work is inspired by the philosophy behind [Dormans and
Bakkes 2011], which distinguishes the abstract space defined by 2 RELATED WORK
the story (referred to as missions) and the concrete space defined Story and Game Map Generation. Though intertwined, the gener-
by the actual geometric layout of a game map. The story is accom- ation of stories and maps are typically investigated in isolation. A
modated by mapping the former into the latter. While [Dormans few notable exceptions do tackle them as a single system. [Hartsook
and Bakkes 2011] focuses on action adventure games with discrete et al. 2011] proposed a story-driven procedural map generation
“scenes” connected by “passages”, we impose very little assumption method where each event in the plot is associated with a location
on methods used for story and map generation, and in particular of a certain environment (e.g., castle, forest, etc.) and a linear plot
target workflows for modern open world games. line is translated to a constraint of a sequential occurrence of lo-
We introduce the concept of plot facilities, which are abstract loca- cations with corresponding environment type. Map generation is
tions mentioned in the given story. A set of constraints are derived formulated as an optimization problem finding a topological struc-
from the story in terms of the spatial relationships between plot ture of the map balancing requirements from a realistic game world
facilities and elements in the underlying map. Given an underlying and player preferences, subject to plot constraints. [Valls-Vargas
map, we arrange the layout of the plot facilities on top of the map et al. 2013] presents a procedural method that generates a story
to satisfy the constraints. Our method is compatible with any map and a map facilitating the story at the same time. The problem is
generation technique in the sense that the underlying map can be again formulated as an optimization problem to find a topological
hand-crafted, procedurally generated, or even from a Geographic structure of the map, but based on various metrics not only from
Information System (GIS) such as Google Maps. map playability perspective but also the space of possible stories
Figure 2 illustrates our world building approach. The focus of supported by the spatial structure, subject to the input plot points.
this work is on the plot facility layout design task. However, to Both [Hartsook et al. 2011] and [Valls-Vargas et al. 2013] gen-
demonstrate a concrete pipeline, we work with a specific procedural erate rectangular grid-based maps consisting of discrete “scenes”
map generation method described in Section 4.1, and in an end-to- connected by “passages”. The map structure is widely used in many
end example we show in Section 5.3, we extract story constraints classic games such as Rogue ([1980]) and early Zelda series ([1986]).
from a free-text story description using a large language model. However, many modern RPG games feature seamless world maps
We present a system that leverages Reinforcement Learning (RL) with continuous terrains and very few geographical barriers for
to automatically assign geometric locations on a game map to plot an immersive open world experience, such as Elden Ring ([2022]),
facilities following geographic and spatial constraints derived from Pokemon Legend: Arceus ([2022]) and The Legend of Zelda:
the story – such as being in a forest and far from another plot Breath of the Wild ([2017]). [Dormans and Bakkes 2011] use gen-
facility. A decision-making agent moves the plot facilities around, erative grammar based methods for both story (mission) generation
considering their relationship to the map and each other, to locations and map (space) generation. The story elements are then mapped to
on the map that best satisfy the constraints of the story. Our system spatial elements using heuristics specific to game genre. Our work
considers input from multiple modalities: map images as pixels, also establishes a mapping between narrative and spatial elements.
facility locations as real values, and story constraints expressed However, we adopt a more general constraint satisfaction process.
in natural language. We demonstrate that an RL approach to plot
facility layout design is fast in providing solutions, and can quickly Procedural Content Generation. Procedural Content Generation
adapt to user intervention, making it suitable for real-time human-AI (PCG) has become an essential component in video games, employed
co-design workflow. for the algorithmic creation of game elements such as levels, quests,
The paper also presents a dataset of facility layout tasks and a and characters. The primary objectives of PCG are to enhance the
Gym-like environment to train and evaluate the RL models. The replayability of games, reduce the burden on authors, minimize stor-
dataset contains 10,000 plot facility layout design tasks involving 12 age requirements, and achieve specific aesthetics [Hendrikx et al.
types of spatial constraints and maximum 10 plot facilities each task, 2013], [Smelik et al. 2009], [Kelly and McCabe 2017],[van der Linden
based on a set of procedurally generated terrain maps. We report the et al. 2014]. Game developers and researchers alike utilize meth-
results of applying different strategies to address the observation ods from machine learning, optimization, and constraint-solving
with an in-depth discussion. to address PCG problems [Togelius et al. 2011]. The primary aim
In summary, the paper’s contribution is threefold: of this work is to train RL agents capable of generalizing across a
wide range of environments and constraints. To achieve this goal,
• We propose plot facility layout design as a novel approach to
we employ a PCG approach to generate a diverse set of maps and
address the problem of supporting stories with game maps,
constraints to train and test our RL agents.
which is compatible with most story and map generation
methods. RL in Modern Video Games. The popularity of games in AI re-
• We provide a dataset of plot facility layout tasks and a search is largely attributed to their usefulness in the development
Gym-like environment to train and evaluate the RL models. and benchmarking of reinforcement learning (RL) algorithms [Belle-
• We provide baseline results on an RL approach to plot facil- mare et al. 2013; Berner et al. 2019; Jaderberg et al. 2019; Vinyals
ity layout design. et al. 2019]. However, overfitting remains a pervasive issue in this
PlotMap: Automated Layout Design for Building Game Worlds • 1:3

Fig. 2. Accommodating a story on a map with a plot facility layout design process

domain, as algorithms tend to learn specific policies rather than is essentially a mapping between the conceptual space defined by
general ones [Zhang et al. 2018]. To counteract overfitting and facil- the story and the geometric space defined by the game map.
itate policy transfer between different environments, researchers In the following subsections, we describe our RL-based method
have turned to PCG techniques [Baker et al. 2019; Risi and Togelius for this problem. To demonstrate a concrete map generation pipeline,
2020; Team et al. 2021]. in this study we specifically work with terrain maps consisting of
On the other hand, RL can also be used as a design tool for mod- polygons, where each polygon is assigned a biome.
ern games, especially for accessing and testing games. [Iskander
et al. 2020] develops an RL system to play in an online multiplayer 3.2 Plot Facility Layout Design
game alongside human players. The historical actions from the RL We define a (facility layout) task as a tuple
agent can contribute valuable insights into game balance, such as
highlighting infrequently utilized combination actions within the ⟨F , T , C⟩
where
game’s mechanics. [Bergdahl et al. 2020] uses RL as an augmenting
automated tool to test game exploits and logical bugs. [Chen et al. • F is the set of (plot) facilities. Each facility has an identifier
2023] releases a multi-agent RL environment to study collective to be referred to by the constraints.
intelligence within the context of real-time strategy game dynamics. • T is a set of polygons on the map, each associated with a
To the best of our knowledge, this is the first instance that uses a biome type (e.g., OCEAN, PLAINS, etc.).
learning-based approach to accommodate stories on game maps. • C is a set of spatial constraints over facilities in F and
terrain types, each associated with a natural language ut-
terance, such as “Fordlen Bay and Snapfoot Forest are sepa-
3 PROBLEM FORMULATION rated by the ocean.”
3.1 Overview An (optimal) solution to a task is an assignment of coordinates to
Put simply, we would like to assign every location mentioned in a all the facilities in F , so that a maximum number of the constraints
narrative to an appropriate location on the game map. Inspired by in C are satisfied considering their relations with each other and
[Dormans and Bakkes 2011], we view the narrative and the geomet- the polygons in T . Our goal is to train a general RL agent to be able
ric layout of a game map as independent of each other, except that to find solutions to any arbitrary task.
the geometric layout should accommodate the narrative. We intro-
duce the notion of plot facilities, which are conceptual “locations” 3.3 RL Formulation
mentioned in the story. These “locations” are abstract in the sense Essentially, each plot facility layout design task can be viewed as
that they don’t correspond to any concrete geometric locations (yet). sequentially moving a set of plot facilities F on a map, thus we
For example, the event “the hero finds an injured dwarf in the forest” define the plot facility layout design as a sequential Markov decision
happens at some place. There can be multiple locations on the map process (MDP), which consists of the following key elements:
where this “abstract location” can be “instantiated”, as long as it • Each state 𝑠 ∈ 𝑆 consists of three modalities: 1) a pixel-
does not contradict with the story - in this example it should be based image representing the map, 2) a real-valued vector
inside a forest area. representing essential information of the plot facilities, and
A set of constraints can be derived from the story for determining 3) a group of utterances representing the constraints. We
whether a particular instantiation of plot facilities is valid. The set of explore three strategies to derive the embeddings from the
all plot facilities and the constraints form a conceptual space defined image and the utterances, resulting in three state dimen-
by the story, which is at a higher abstraction level than a concrete sions: 1782, 4422, and 5702. More details about the state
game map. The problem is then to assign geometric locations on will be described in Section 5.1.
the map to the plot facilities such that the constraints are satisfied - • Each action 𝑎 ∈ 𝐴 is a 2 dimensional vector of real-valued
we call this problem plot facility layout design. A plot facility layout [Δ𝑥, Δ𝑦] for one plot facility. In each round, plot facilities
1:4 • Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, and Hilmar Koch

are moved one at a time in a fixed order, with the state and flooding. Tiles are designated as water if their borders ex-
rewards updated after each movement. ceed a predefined minimum water ratio. Four terrain types
• The reward (for each step) 𝑟𝑡 is +1 when all the constraints are assigned: ocean, coast, lake, and land, based on their
are satisfied, and is the average satisfaction score from relation to water edges and neighboring tiles.
all the constraints minus 1 when partial constraints are (3) Elevation assignment is determined by the distance from
satisfied: the coast, with elevation normalized to a given height distri-
bution, resulting in fewer high points and smoother terrain.
(4) Rivers are created along the downhill path from randomly

1, if all constraints are satisfied
𝑟𝑡 = 1 Í𝑛 selected mountain corners to the nearest lake or ocean.
𝑛 𝑖=1 𝑠𝑖 − 1, otherwise
(5) Moisture levels are assigned according to the distance from
where 𝑛 is the number of constraints and 𝑠𝑖 is the satis- freshwater sources, such as lakes and rivers.
faction score for each constraint. The satisfaction score for (6) Biome assignment for each polygon depends on the com-
each type of constraint is within [0, 1] and is defined based bination of moisture and elevation, as illustrated in the
on hand crafted heuristic function determining to what ex- Whittaker diagram in Figure 8.
tent the facility layout forms the corresponding geometric
formation. 1 The range of the reward 𝑟𝑡 is [−1, 0] ∪ {1}. For dataset generation, we produce 100 maps with 1,000 cells
• The transition function is deterministic, where 𝑠𝑡 +1 = 𝑓 (𝑠𝑡 , 𝑎𝑡 ). each. These maps are then converted to RGB images, suitable for
• Each episode is terminated when all the constraints are input to neural network-based reinforcement learning agents.
satisfied or at 200 timesteps.
4.2 Constraint Generation
We train an RL agent to learn an optimal policy 𝜋𝜃 in order to
maximize the expected discounted rewards: Synthetic facility layout tasks are generated by associating a set of
random constraints to randomly sampled maps. In this work, we con-
sider a list of 12 constraint types, listed in Table 1, along with their
"𝑇 #
∑︁
𝑡
max E𝜏∼𝜋𝜃 𝛾 𝑟 (𝑠𝑡 , 𝑎𝑡 ) , (1) number of occurrences in the 10, 000 task dataset. A constraint type
𝜋𝜃
𝑡 =0 𝐶𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑇𝑦𝑝𝑒 (𝑏 1, . . . , 𝑏𝑚 , 𝑝 1, . . . , 𝑝𝑛 ) is instantiated to become a
where trajectory 𝜏 = (𝑠 0, 𝑎 0, 𝑠 1, 𝑎 1, ..., 𝑠𝑇 , 𝑎𝑇 ), 𝜃 is the parameteri- constraint by substituting each of 𝑏 1, . . . , 𝑏𝑚 with a biome type, and
zation of policy 𝜋, and 𝛾 is the discounted factor. each of 𝑝 1, . . . , 𝑝𝑚 with a plot facility id (𝑚 ≥ 0, 𝑛 ≥ 0).
For each constraint type, we define a heuristic function for evalu-
4 TASK DATASET GENERATION ating an existing facility layout w.r.t. any instantiation of the con-
For training the RL model, we generated a dataset of 10, 000 facility straint type. The function returns a real number in [0.0, 1.0] with
layout tasks. Each task requires arranging the layout of maximum 1.0 meaning fully satisfied and 0.0 completely not satisfied. These
10 plot facilities on top of a procedurally generated map, w.r.t. a set functions are used to check if the randomly generated constrants
of maximum 10 spatial constraints. Maps consist of 9 biome types are satisfied by a random layout. They are also used in Section
and the constraints are generated based on 12 constraint types. On 3.3 for computing the reward. Tasks are then generated following
average, a random agent has around 30% success rate to solve a Algorithm 1. Note that, as the constraints are extracted from an
single task. example layout, this procedure guarantees that the generated tasks
are solvable.
4.1 Map Generation
We employ a procedural map generation approach adapted from the 5 EXPERIMENTS & ANALYSIS
work of Patel [2010] and its implementation by Dieu et al. [2023]. 5.1 Experiment Setup
Rather than beginning with basic elements, such as elevation and
State Space in Details. Our state space includes 3 types of inputs:
moisture levels, and subsequently deducing coastlines and river
locations, this method begins by creating rivers and coastlines and • the map is represented by a pixel-based image defined as
then adapts elevations and moisture to render them more plausible. (42 x 42 x 3).
The procedure is divided into several steps: • the constraints are represented by natural language utter-
ances or one-hot vectors, depending on the embedding
(1) A grid of Voronoi polygons is generated from random points
strategies described in the following paragraph.
with varying density in [0,1]2 space. We then replace each
• Each plot facility’s information is represented by a vector,
point with the centroid of its polygon region and use Lloyd
consisting of its position [𝑥, 𝑦] on the map, a binary motion
relaxation[Lloyd 1982] to ensure an even distribution.
indicator signifying if it is its turn to move or not, and a
(2) Coastline generation employs a flooding simulation. We
unique identifier.
initially mark edges touching the map borders and those in
random proximate areas as water edges. Flooding continues RL Training Details. All of our policies are trained using Proxi-
until a desired number of water edges are generated. Then, mal Policy Optimization (PPO) [Schulman et al. 2017]. The details
we select random inland edges as lake edges and continue of the training hyper-parameters are in Appendix D. To handle
1 E.g., closeTo(x,y) is negatively correlated to the distance between x and y, and reaches the multi-modal observation space, we employ pre-trained models
1 when their distance is less than a certain threshold. to independently extract embeddings from the map (image) and
PlotMap: Automated Layout Design for Building Game Worlds • 1:5

Table 1. Constraint types included in the 10, 000-task dataset (each 𝑝𝑖 Simulated Concurrent Movement. A truly concurrent movement
represents a plot facility) scheme updates the observation after all facilities move, which
Constraint Types Meaning Frequency makes it difficult to contribute a change in constraint satisfaction
AcrossBiomeFrom (𝑏 1 , 𝑝 1 , 𝑝 2 ) 𝑝 1 is across biome 𝑏 1 from 𝑝 2 24895 to an individual facility’s movement. Our exploratory experiments
Outside (𝑏 1 , 𝑝 1 ) 𝑝 1 is outside biome 𝑏 1 5090 also show that this results in undesired behaviors such as livelocks.
Inside (𝑏 1 , 𝑝 1 ) 𝑝 1 is inside biome 𝑏 1 1117 For example, two plot facilities on the same side of a lake that
AwayFrom (𝑏 1 , 𝑝 1 ) 𝑝 1 is away from biome 𝑏 1 773 are targeting being across the lake from each other might both
CloseTo (𝑏 1 , 𝑝 1 ) 𝑝 1 is close to biome 𝑏 1 2362 move simultaneously to the other side of the lake, resulting in them
ToTheSouthOf (𝑏 1 , 𝑝 1 ) 𝑝 1 is to the south of biome 𝑏 1 739 still being on the same side of the lake as one another. Therefore,
ToTheNorthOf (𝑏 1 , 𝑝 1 ) 𝑝 1 is to the north of biome 𝑏 1 746 we simulate concurrent movement with a turn-based movement
ToTheWestOf (𝑏 1 , 𝑝 1 ) 𝑝 1 is to the west of biome 𝑏 1 805
ToTheEastOf (𝑏 1 , 𝑝 1 ) 𝑝 1 is to the east of biome 𝑏 1 727 scheme at micro-level: at each round, facilities are moved one by
CloseTo (𝑝 1 , 𝑝 2 ) 𝑝 1 is close to 𝑝 2 103 one, each with a maximum length of movement, with observation
AwayFrom (𝑝 1 , 𝑝 2 ) 𝑝 1 is away from 𝑝 2 4215 updating after each facility is moved; At a macro level the movement
InBetween (𝑝 1 , 𝑝 2 , 𝑝 3 ) 𝑝 1 is between 𝑝 2 and 𝑝 3 416 is concurrent in the sense that all of the facilities make progress
OnSouth (𝑝 1 ) 𝑝 1 is on the south of the map 283
each round (see Appendix B).
OnNorth (𝑝 1 ) 𝑝 1 is on the north of the map 295
OnEast (𝑝 1 )
OnWest (𝑝 1 )
𝑝 1 is on the east of the map
𝑝 1 is on the west of the map
291
285
5.2 Quantitative Results
ToTheSouthOf (𝑝 1 , 𝑝 2 ) 𝑝 1 is to the south of facility 𝑝 2 2774 We carry out three groups of experiments to investigate different
ToTheNorthOf (𝑝 1 , 𝑝 2 ) 𝑝 1 is to the north of facility 𝑝 2 2715 aspects of the problem. Firstly, we inspect the performance of our
ToTheWestOf (𝑝 1 , 𝑝 2 ) 𝑝 1 is to the west of facility 𝑝 2 2775
ToTheEastOf (𝑝 1 , 𝑝 2 ) 𝑝 1 is to the east of facility 𝑝 2 2779
three proposed embedding strategies on four small task sets in
VisibleFrom (𝑝 1 , 𝑝 2 ) 𝑝 2 is visible from 𝑝 1 616
Table 2. Secondly, we examine the generalization of the RL agents
on a 10,000-task set in Table 4. Finally, we study the influence of
maps and constraints on generalization in Table 3.
constraint (natural language) inputs. These embeddings are subse- For all of the tables, we evaluate our trained policies on two
quently concatenated with the informational vector and provided as conditions: Random-initial-positions refers to the policies evaluated
inputs to the policy network. Specifically, we design three strategies with the same task sets for training but under different initial posi-
for deriving the embeddings: tions; 100-unseen-tasks reports success rates when the policies, each
trained on their respective task set, are tested on 100 unseen tasks.
• NL-based: using ResNet [He et al. 2016] for maps and Sen-
We consider success only when all the constraints are satisfied, and
tenceTransformer [Reimers and Gurevych 2019] for con-
we calculate the success rates over 1000 rollouts; each rollout takes
straints.
approximately 5 seconds to finish.
• CLIP-based: using CLIP [Radford et al. 2021] for both maps
Table 2 demonstrates that all three baselines have outstanding
and constraints.
performance on tackling one single hard task, with CLIP-based
• Relation-based: using ResNet for maps, and each constraint
and Relation-based methods markedly outperforming the NL-based.
is encoded as a one-hot vector, representing the constraint
The success rate for each baseline declines as the task set size in-
type, followed by three one-hot vectors indicating the spe-
creases, with the Relation-based method consistently surpassing
cific plot facilities to instantiate the constraint with.
the others in all four sets. The relatively low state dimension of the
In all of our experiments, we have set a limit of 10 for both Relation-based method might contribute to the success. However,
the number of plot facilities and the number of constraints. The when deployed on the 100-unseen-task set, all three baselines ex-
breakdown of the state dimension is listed in Appendix C. hibit a deficiency in generalization capability. We hypothesize that a
large task set might enhance generalization, and to examine this, we
conduct additional experiments, as reported in Table 4 2 . While the
ALGORITHM 1: Facility Layout Task Generation
success rates show improvement compared to the policies trained on
Input: A set of map 𝑀𝐴𝑃 , maximum number of facilities 𝑁 , a set of
constraint types C T , minimum and maximum number of constraints the 100-task set, the increase isn’t as significant as anticipated. No-
𝑀1 and 𝑀2 tably, the NL-based method slightly outperforms the Relation-based
Output: A facility layout task ⟨𝐹,𝑇 , 𝐶 ⟩ method, indicating that natural language embeddings’ contextual in-
1. Randomly sample a map 𝑇 from 𝑀𝐴𝑃 ;
2. Randomly assign a location to facilities 𝑜𝑏 𝑗1 . . . 𝑜𝑏 𝑗𝑁 on the map 𝑇 ; formation may offer an advantage when generalizing across diverse
3. For each constraint type in C T , generate all possible instantiations of it constraints. To investigate further the factors impeding generaliza-
w.r.t. 𝑜𝑏 𝑗1 . . . 𝑜𝑏 𝑗 𝑁 and biome types. Evaluate each of them against the tion, we study the influence of maps and constraints by training on
current map, adding the true ones to a set 𝐶 ′ ;
4. Sample a set of statements from 𝐶 ′ sized between 𝑀1 and 𝑀2 , obtaining 𝐶 ;
three distinct sets of 100 tasks as depicted in Table 3. The results
5. For each statement in 𝐶 , use large language model such as GPT [Brown et al. indicate that constraints pose a greater challenge to generalization,
2020] to rephrase it with a natural language sentence, resulting in a set of NL regardless of the encoding method employed.
utterances 𝐶 𝑁 𝐿 ;
6. return task ⟨ {𝑜𝑏 𝑗1 . . . 𝑜𝑏 𝑗 𝑁 },𝑇 , 𝐶 𝑁 𝐿 ⟩ .
2 Given
the similar performance but longer training time of the CLIP-based method
compared to the NL-based method in Table 2, we opt for the latter in future experiments.
1:6 • Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, and Hilmar Koch

Table 2. Success rate (%) comparison among three proposed baseline methods and a random agent across various task sizes. All training procedures in this
table have a limit of 200,000 steps.

Success Rate (%) Success Rate (%)


Method random initial positions 100 unseen tasks
1 task 5 tasks 50 tasks 100 tasks
Random Agent 23.9 30.3 30.5 30.9 32.2
1-task policy 5-task policy 50-task policy 100-task policy
NL-based 84.1 48.2 38.5 38.4 34.8 33.3 35.1 38.0
Baseline
CLIP-based 99.9 49.0 38.0 35.6 32.5 30.2 36.0 35.5
Relation-based 100.0 55.1 46.1 38.5 32.2 29.6 32.7 36.1

Table 3. Success rate (%) comparison among three 100-task datasets, each varying in map and constraint combinations. Each combination is paired with its
unique set of 100 unseen tasks. All training procedures in this table have a limit of 500,000 steps.

Success Rate (%) Success Rate (%)


Method random initial positions 100 unseen tasks
Varied Map Same Map Varied Map Varied Map Same Map Varied Map
Varied Constraints Varied Constraints Same Constraint Varied Constraints Varied Constraints Same Constraint
Random Agent 30.9 34.5 38.0 32.2 32.1 35.3
NL-based 38.4 42.8 79.2 38.0 39.9 70.4
Baseline
Relation-based 38.5 48.1 77.5 36.1 40.1 68.9

Table 4. Success rate (%) comparison among two proposed baseline methods relations on completely different geometric layouts, which aligns
and a random agent on a 10,000 task set. All training procedures in this
with the perspective described in [Dormans and Bakkes 2011]: “The
table have a limit of 2,000,000 steps.
same mission (story) can be mapped to many different spaces, and
Success Rate (%) one space can support multiple different missions (stories)”. This
Method capability enables designers to envision various interpretations of
random initial positions 100 unseen tasks
unspecified story details and potential story progressions.
Random Agent 38.7 32.2
In Figure 5, we show that our RL policies support accurate and
NL-based 46.1 42.4 fast re-adaptation after human intervention. Specifically, after we
Baseline
Relation-based 44.8 36.8 manually change the location of Marketown to the northeast part
of the map, Veilstead Kingdom and Hearthfire Hold can adjust
5.3 Qualitative Results their locations to continue to be across the lake from Marketown,
In this section, we demonstrate a complete pipeline of the system, while Aquafrost Garrison stays at the same location, so all of the
and highlight insightful agent behaviors through specific examples. constraints are still satisfied. This potentially enables an interactive
Figure 1 illustrates a complete story-to-facility-layout pipeline. design process with a mixed-initiative interface [Cook et al. 2021;
From the story on the left, 8 plot facilities and 6 constraints are Smith et al. 2010; Yannakakis et al. 2014].
extracted with a pre-trained large language model. We then add We also observe cooperative behaviors as shown in Figure 3
three additional constraints to make the task more challenging and that Marketown and Veilstead Kingdom must be across a lake
show the complete requirements of the task in Table 5. The map on from each other, while Aquafrost Garrison must be to the south
the right shows the resulting plot facility layout, along with motion of Marketown. Aquafrost Garrison was initially to the south of
trails indicating traces to the final locations. Marketown but as Marketown moves South to be across the lake from
Veilstead Kingdom, Aquafrost Garrison moves even further
Table 5. plot facilities and constraints derived from story in Figure 1. south to continue satisfying its south of Marketown constraint.
In many cases, we notice that the plot facilities don’t stop even
Veilstead Kingdom and Aquafrost Garrison are on opposing shores though all the constraints are satisfied. To investigate this phenome-
Pillar of Hope is away from mountains non, we compare the motion trails from an RL policy and a random
Mirestep Swamp is located south of Forgewind Citadel
agent, as shown in Figure 6. The agent presents a strong preference
Hearthfire Hold and Veilstead Kingdom are separated by a lake
Aquafrost Garrison is situated south of Marketown
for moving facilities to the edge of the map without any constraints,
Veilstead Kingdom is across the lake from Marketown which likely could be attributed to our task dataset’s imbalance. Ta-
Fountain of Solace and Forgewind Citadel are positioned across the coast ble 1 shows a significantly higher proportion of AcrossBiomeFrom
A great body of water separates Hearthfire hold and Marketown constraint. During training, AcrossBiomeFrom is satisfied as long as
Hearthfire hold and Aquafrost Garrison are on opposite coasts there is a biome between two plot facilities. If the two plot facilities
In Figure 4, we demonstrate the same story accommodated on 4 are on different edges of the map, it’s likely that they are across
different maps by rolling out a trained RL policy. We observe that several different biomes from each other, which renders going to the
the same set of plot facilities can still maintain their relative spatial
PlotMap: Automated Layout Design for Building Game Worlds • 1:7

The formulation and scalability of the RL approach, as well as


the employed embedding strategies, emerged as important aspects
in our study. The use of a single RL agent responsible for handling
all global information and managing all plot facilities constituted
a potential bottleneck for our approach, potentially limiting the
scalability of the model in larger applications. Future research could
investigate a distributed RL formulation, where each plot facility is
treated as an independent RL agent. This adjustment could not only
increase scalability but also enhance performance. Our embedding
strategies resulted in an excessively high dimensional state space,
with redundant static information in each episode, leading to sample
inefficiency and suboptimal generalization capabilities of the RL
agent. These findings point to the necessity of developing more
sophisticated methods that can better leverage the embedded prior
knowledge, consequently improving the generalization capabilities
of the RL agent. Further research in this direction could contribute
to improving both the efficiency and the performance of the RL
approach in narrative generation tasks.
In this work, unlike [Valls-Vargas et al. 2013] and [Hartsook et al.
Fig. 3. Cooperative behavior to satisfy constraints 2011], we do not assume a symbolic representation of the story,
edge an effective strategy. On the other hand, for human designers, and utilize large language models to derive spatial constraints from
this type of constraint usually implies that both facilities are close stories in free-form text. It remains to be investigated how effective
to the biome (e.g, “A and B are across a lake” generally implies that this approach is, in regards to the reasoning capability of large
A and B are on the shore of the lake). In this sense, this moving- language models.
towards-edges behavior can be seen as a form of reward hacking. The generality of the RL model is crucial to our application, how-
We also visualize the motion trails from the RL policy at different ever it is also extremely challenging. Comparing our setting with
training stages in Figure 7. At earlier stages, the policy tends to existing works on using RL to play games, adapting to different maps
have more random routes circling back and forth, while the policy is analogous to adapting to different levels of a game, while learn-
at later stages tends to show a clearer direction of progression. We ing for different constraints is analogous to mastering games with
notice that the tendency of going to the edge is established at a very different rules. Automatic curriculum learning [Portelas et al. 2021]
early training stage. could provide a way of reaching this high level of generalization
In addition to the reward hacking behavior, the imbalance of capability. Maps and constraints could be dynamically generated
the dataset also contributes to several other failure cases. Since or selected rather than only training on a fixed set of scenarios.
the constraints are generated based on random facility layouts, the By systematically increasing the difficulty of scenarios, and their
probability is low to sample constraints requiring rare geometric dissimilarity to those already encountered during training, we could
configurations. For example, mountains usually take up only a very gradually expand the agent’s abilities to tackle diverse challenges.
small portion of the map, which means it’s hard to sample an “X is
inside mountain” type of constraint. Similarly, it’s more likely for 7 CONCLUSION
two random facilities to be spawned away from each other than In this work we introduced new tools to support stories with game
close to each other, which explains why we have significantly more maps through an automated plot facility layout design process. We
AwayFrom than CloseTo constraints (see Table 1 ). demonstrated that by employing plot facility layout design, we can
utilize existing story and map generation techniques, and help the
6 LIMITATION AND FUTURE WORK designer visualize the narrative potential of a story domain. The
The use of handcrafted reward functions present challenges in both RL-based approach introduced can rapidly provide solutions and
accurately reflecting the preferences of human designers and pro- adapt to user intervention, making it suitable for a real-time human-
viding the right signal for training. A promising solution for better AI co-design workflow. This approach has potential in many game
understanding the kinds of solutions which designers prefer is to ap- design applications, such as map design, playtime quest genera-
ply Reinforcement Learning from Human Preference, which offers tion/adaptation and story debugging; but also potential applications
potentially more accurate reflections of human intent [Christiano in other domains involving spatial layouts subject to constraints,
et al. 2017]. Moreover, we considered the satisfaction of all con- such as the design of large office buildings or manufacturing plants.
straints as the benchmark for successful task resolution. In practice,
a suboptimal solution that satisfies most of the constraints might be REFERENCES
acceptable, and situations where the constraints are unsatisfiable Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew,
are completely possible. In these cases knowledge of designer pref- and Igor Mordatch. 2019. Emergent tool use from multi-agent autocurricula. arXiv
preprint arXiv:1909.07528 (2019).
erences, or a mixed-initiative approach which allowed editing of Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade
the map, would allow desirable solutions to be found. learning environment: An evaluation platform for general agents. Journal of Artificial
1:8 • Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, and Hilmar Koch

Intelligence Research 47 (2013), 253–279. Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie
Joakim Bergdahl, Camilo Gordillo, Konrad Tollmar, and Linus Gisslén. 2020. Aug- Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu,
menting automated game testing with deep reinforcement learning. In 2020 IEEE et al. 2021. Open-ended learning leads to generally capable agents. arXiv preprint
Conference on Games (CoG). IEEE, 600–603. arXiv:2107.12808 (2021).
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne.
Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE
2019. Dota 2 with large scale deep reinforcement learning. arXiv preprint Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172–186.
arXiv:1912.06680 (2019). Josep Valls-Vargas, Santiago Ontanón, and Jichen Zhu. 2013. Towards story-based con-
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla tent generation: From plot-points to maps. In 2013 IEEE Conference on Computational
Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Inteligence in Games (CIG). IEEE, 1–8.
2020. Language models are few-shot learners. Advances in neural information Roland van der Linden, Ricardo Lopes, and Rafael Bidarra. 2014. Procedural Generation
processing systems 33 (2020), 1877–1901. of Dungeons. IEEE Transactions on Computational Intelligence and AI in Games 6, 1
Hanmo Chen, Stone Tao, Jiaxin Chen, Weihan Shen, Xihui Li, Sikai Cheng, Xiaolong Zhu, (2014), 78–89. https://doi.org/10.1109/TCIAIG.2013.2290371
and Xiu Li. 2023. Emergent collective intelligence from massive-agent cooperation Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew
and competition. arXiv preprint arXiv:2301.01609 (2023). Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforce-
2017. Deep reinforcement learning from human preferences. Advances in neural ment learning. Nature 575, 7782 (2019), 350–354.
information processing systems 30 (2017). Georgios N. Yannakakis, Antonios Liapis, and Constantine Alexopoulos. 2014. Mixed-
Michael Cook, Jeremy Gow, Gillian Smith, and Simon Colton. 2021. Danesh: Interactive initiative co-creativity. In International Conference on Foundations of Digital Games.
tools for understanding procedural content generators. IEEE Transactions on Games Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. 2018. A study on
14, 3 (2021), 329–338. overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893 (2018).
A.I. Design. 1980. Rogue.
Dawid Dieu, Mateusz Markiewicz, Kuba Grodzicki, and SWi98. accessed 2023.
Polygonal Map Generation for Games. https://github.com/TheFebrin/
Polygonal-Map-Generation-for-Games.
Joris Dormans and Sander Bakkes. 2011. Generating missions and spaces for adaptable
play experiences. IEEE Transactions on Computational Intelligence and AI in Games
3, 3 (2011), 216–228.
Game Freak. 2022. Pokémon Legends: Arceus.
FromSoftware. 2022. Elden Ring.
Ken Hartsook, Alexander Zook, Sauvik Das, and Mark O Riedl. 2011. Toward support-
ing stories with procedurally generated game worlds. In 2011 IEEE Conference on
Computational Intelligence and Games (CIG’11). IEEE, 297–304.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning
for image recognition. In Proceedings of the IEEE conference on computer vision and
pattern recognition. 770–778.
Mark Hendrikx, Sebastiaan Meijer, Joeri Van Der Velden, and Alexandru Iosup. 2013.
Procedural content generation for games: A survey. ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM) 9, 1 (2013), 1–22.
Nancy Iskander, Aurelien Simoni, Eloi Alonso, and Maxim Peter. 2020. Reinforcement
Learning Agents for Ubisoft’s Roller Champions. arXiv preprint arXiv:2012.06031
(2020).
Max Jaderberg, Wojciech M Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Anto-
nio Garcia Castaneda, Charles Beattie, Neil C Rabinowitz, Ari S Morcos, Avraham
Ruderman, et al. 2019. Human-level performance in 3D multiplayer games with
population-based reinforcement learning. Science 364, 6443 (2019), 859–865.
George Kelly and Hugh McCabe. 2017. A Survey of Procedural Techniques for City
Generation. The ITB Journal 7, 2 (May 2017). https://doi.org/10.21427/D76M9P
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information
theory 28, 2 (1982), 129–137.
Ryu Matsumoto. 2022. Introduction of Case Studies of Engineer Efforts to Respond
to the Open Field of Elden Ring. Computer Entertainment Developers Conference
(2022).
Nintendo. 1986. The Legend of Zelda.
Nintendo. 2017. The Legend of Zelda: Breath of the Wild.
Amit Patel. 2010. Polygonal map generation for games. Red Blob Games 4 (2010).
Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-yves Oudeyer.
2021. Automatic Curriculum Learning For Deep RL: A Short Survey. In IJCAI
2020-International Joint Conference on Artificial Intelligence.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini
Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021.
Learning transferable visual models from natural language supervision. In Interna-
tional conference on machine learning. PMLR, 8748–8763.
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using
siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
Sebastian Risi and Julian Togelius. 2020. Increasing generality in machine learning
through procedural content generation. Nature Machine Intelligence 2, 8 (2020),
428–436.
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017.
Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
Ruben Smelik, Klaas Jan de Kraker, Saskia Groenewegen, Tim Tutenel, and Rafael
Bidarra. 2009. A Survey of Procedural Methods for Terrain Modelling.
Gillian Smith, Jim Whitehead, and Michael Mateas. 2010. Tanagra: A mixed-initiative
level design tool. In Proceedings of the Fifth International Conference on the Founda-
tions of Digital Games. 209–216.
PlotMap: Automated Layout Design for Building Game Worlds • 1:9

Fig. 4. Same story accommodated on 4 different maps by rolling out a trained RL policy. Arrows indicate directional relations. Note that the bottom right
layout failed to fulfill “Hearthfire Hold and Veilstead Kingdom should be seperated by a lake”. This shows typical failure examples from our policies: the
RL model only manages to satisfy a subset of the constraints.
1:10 • Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, and Hilmar Koch

Fig. 5. Plot facility re-adaptation after the user moves Marketown to a dif-
ferent location. After we manually changed the location of Marketown to
Fig. 7. Motion trails at different training stages: (a) random agents; (b) after
the northeast part of the map. Veilstead kingdom and Hearthfire Hold
30 iterations; (c) after 90 iterations; (d) after 133 iterations. Earlier stage
adjusted their location to continue to be across the lake from Marketown,
models tend to have more random routes with a lot of circling back-and-
while Aquafrost Garrison stays at the same location, as all the constraints
forth, where later stage models tend to show clearer direction of progression.
mentioning it are still satisfied.

Fig. 6. (a) Agents with no constraints; (b) Random agents. It can be seen that
the agents have a strong preference of going towards the edge of the map
when there are no constraints provided in the environment. Note that the
map, constraints, and initial locations are all the same for the two settings.

Fig. 8. Whittaker Diagram for Biome Types


PlotMap: Automated Layout Design for Building Game Worlds • 1:11

A APPENDIX OVERVIEW C STATE DIMENSION IN DETAILS


We organize this appendix as follows: Section B reports a toy ex-
periment showing better performance from simulated concurrent Table 7. The state dimension in details from the three embedding strategies
plot facility movement compared to actual concurrent movement
(related to Section 5.1). Section C shows state dimension for different Embedding Dimension
embedding strategies (see Section 5.1). Section D gives details on Embedding Strategy
total terrain constraints plot facilities
hyper-parameters (again see Section 5.1), Section E shows a set of
NL-based 4422 512 10 * 387 10 * 4
plot facility motion trails with different initial locations, where the CLIP-based 5702 512 10 * 512 10 * 4
constraints and map are the same as in Figure 1. Section F shows a Relation-based 1782 512 10 * 123 10 * 4
selection of sample maps used in our dataset.

B ACTUAL CONCURRENT VS. SIMULATED


CONCURRENT MOVEMENT: A TOY EXPERIMENT D HYPER-PARAMETER DETAILS
We perform an additional toy experiment to compare a true con- Table 8 presents the major hyper-parameters used for training the
current movement scheme and a simulated concurrent movement policies in the paper. For the hidden layers, we run each set of
scheme of plot facilities. The experiment involves a single plot facil- experiments on the two options in the table and choose the one
ity layout task with the map shown in Figure 9, Two plot facilities 𝑝 1 with the higher reward to calculate the success rates. The rest of the
and 𝑝 2 and two constraints requiring 𝑝 1 and 𝑝 2 to be inside the lake, hyper-parameters are the same as the default values from RLlib 3 .
respectively. The agent with actual concurrent movement scheme
moves all the plot facilities at a single step, and the observation is Table 8. Major hyperparameters used for training the policies in the paper.
updated only when all the plot facilities are moved, while the agent
with simulated concurrent movement scheme only moves one plot Hyperparameter Default Value
facility for a small distance at one step. lr 1e-4
In Table 6, we report the success rate from a random agent, an RL gamma 0.99
model trained with actual concurrent movement and an RL model lambda 0.95
trained with simulated concurrent movement. A possible explana- clip_param 0.2
tion for the poor performance of the actual concurrent movement num_sgd_iter 30
setting is that in the setting it’s hard to contribute global constraint sgd_minibatch_size 128
satisfaction change to individual facility’s movement. train_batch_size 2000
num_workers 7
fcnet_hiddens [1024, 512] or [1024, 512, 512]
Table 6. Success rate (%) comparison among random agent, an RL model
trained with actual concurrent movement and an RL model trained with
simulated concurrent movement
E PLOT FACILITY MOTION TRAIL WITH DIFFERENT
Random Actual Concurrent Simulated Concurrent INITIAL LOCATIONS
7.29% 0.14% 93% Figure 10 shows our main example story initialized on the same
map, but with different initial locations for each plot facility.

F A SELECTION OF SAMPLE MAPS


Figure 11 shows a selection of maps generated using the method
described in Section 4.1, used to create our plot facility layout task
dataset.

Fig. 9. Map used for toy experiment


3 RLlib: https://docs.ray.io/en/latest/rllib/index.html
1:12 • Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, and Hilmar Koch

Fig. 10. Main story on the same map with different initial locations
PlotMap: Automated Layout Design for Building Game Worlds • 1:13

Fig. 11. Selection of maps in 3D

You might also like