Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

DEFINITION

reinforcement learning

Reinforcement learning is a training method based on rewarding desired behaviors


and/or punishing undesired ones. The learning method has been adopted in artificial
intelligence (AI) as a method of directing unsupervised machine learning through
rewards and penalties. Reinforcement learning is used in operations
research, information theory, game theory, control theory, simulation-based
optimization, multi-agent systems, swarm intelligence, statistics and
genetic algorithms.

Where supervised learning algorithms are typically trained with a body of known
correct answers, an agent learning by reinforcement is not. A reinforcement
learning agent learns from the environment where it performs its task. First, a method
of rewarding desired behaviors and punishing negative behaviors is devised. Positive
values are assigned to desired behaviors to provide positive reinforcement and
negative values to undesired behaviors for negative reinforcement.

The agent is programmed to seek long-term and maximum overall reward to achieve
an optimal solution. Long-term goals help prevent the agent from stalling on lesser
goals while avoiding risk. Also of note is the addition of mechanisms to encourage
exploration. Markov decision processes are sometimes used in exploration decisions
where an agent might ignore a reward in order to explore; to that end, developers
might add an effect, like curiosity, that aids in making discoveries.

A learning algorithm playing Pac Man might have the ability to move in one of four
possible directions, barring obstruction. From pixel data an agent might be given a
numeric reward for the result of a unit of travel: 0 for empty space, 1 for pellets, 2 for
fruit, 3 for a power pellet, 4 for a ghost post-power pellet, 5 for collecting all pellets
and completing a level but being deducted 5 points for collision with a ghost. The
agent starts from randomized play to sophisticated, learning the goal of getting all
pellets to complete the level. Given time, an agent might even learn tactics like
conserving power pellets till needed for self-defense.

Because it’s based on an understanding of biological systems, reinforcement learning


is a part of bio-inspired computing. As a psychological principle, reinforcement
learning hails from the school of behavioral psychology.

You might also like