Professional Documents
Culture Documents
CSD311: Artificial Intelligence
CSD311: Artificial Intelligence
I When the number of (s, a) pairs is very large the above is not
feasible and the only option is to use a function approximator
that learns to predict Ep [st , at ] for (s, a) pairs.
I In a particular state s the outcome for each action ai after
rollout using the -greedy policy is used to estimate the value
Rp [s, ai ] for each action and the (s, ai ) pairs are fed to a
function approximator, typically a neural network (often a
convolutional neural n/w). The neural n/w is trained with the
(s, a) and Rp [s, a] data.
I In most adversarial games two artificial agents play each other
a large number of times and the generated data is
continuously used to train the neural n/w as play progresses.