Unit 5 - Notes

KIET Group of Institutions, Ghaziabad
Department of Computer Science and Information Technology

(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Class Note of Artificial Intelligence KCS-071
Unit-5
Language modeling (LM) is the use of various statistical and probabilistic techniques to
determine the probability of a given sequence of words occurring in a sentence. Language
models analyze bodies of text data to provide a basis for their word predictions. They are
used in natural language processing (NLP) applications, particularly ones that generate text as
an output. Some of these applications include machine translation and question answering.
How language modeling works

Language models determine word probability by analyzing text data. They interpret this data
by feeding it through an algorithm that establishes rules for context in natural language. Then,
the model applies these rules in language tasks to accurately predict or produce new
sentences. The model essentially learns the features and characteristics of basic language and
uses those features to understand new phrases.
There are several different probabilistic approaches to modeling language, which vary
depending on the purpose of the language model. From a technical perspective, the various
types differ by the amount of text data they analyze and the math they use to analyze it. For
example, a language model designed to generate sentences for an automated Twitter bot may
use different math and analyze text data in a different way than a language model designed
for determining the likelihood of a search query.
Some common statistical language modeling types are:
 N-gram. N-grams are a relatively simple approach to language models. They

create a probability distribution for a sequence of n The n can be any number, and
defines the size of the "gram", or sequence of words being assigned a probability.
For example, if n = 5, a gram might look like this: "can you please call me." The
model then assigns probabilities using sequences of n size. Basically, n can be
thought of as the amount of context the model is told to consider. Some types of
n-grams are unigrams, bigrams, trigrams and so on.
 Unigram. The unigram is the simplest type of language model. It doesn't look at
any conditioning context in its calculations. It evaluates each word or term
independently. Unigram models commonly handle language processing tasks such
as information retrieval. The unigram is the foundation of a more specific model
variant called the query likelihood model, which uses information retrieval to
examine a pool of documents and match the most relevant one to a specific query.
 Bidirectional. Unlike n-gram models, which analyze text in one direction

(backwards), bidirectional models analyze text in both directions, backwards and
forwards. These models can predict any word in a sentence or body of text by
using every other word in the text. Examining text bidirectionally increases result
accuracy. This type is often utilized in machine learning and speech generation
applications. For example, Google uses a bidirectional model to process search
queries.
 Exponential. Also known as maximum entropy models, this type is more

complex than n-grams. Simply put, the model evaluates text using an equation
that combines feature functions and n-grams. Basically, this type specifies
features and parameters of the desired results, and unlike n-grams, leaves analysis
parameters more ambiguous -- it doesn't specify individual gram sizes, for
example. The model is based on the principle of entropy, which states that the
probability distribution with the most entropy is the best choice. In other words,
the model with the most chaos, and least room for assumptions, is the most
accurate. Exponential models are designed maximize cross entropy, which
minimizes the amount statistical assumptions that can be made. This enables users
to better trust the results they get from these models.
 Continuous space. This type of model represents words as a non-linear

combination of weights in a neural network. The process of assigning a weight
to a word is also known as word embedding. This type becomes especially useful
as data sets get increasingly large, because larger datasets often include more
unique words. The presence of a lot of unique or rarely used words can cause
problems for linear model like an n-gram. This is because the amount of possible
word sequences increases, and the patterns that inform results become weaker. By
weighting words in a non-linear, distributed way, this model can "learn" to
approximate words and therefore not be misled by any unknown values. Its
"understanding" of a given word is not as tightly tethered to the immediate
surrounding words as it is in n-gram models.
Natural Language Processing (NLP)

• Natural Language Processing (NLP) refers to AI method of communicating with an
intelligent systems using a natural language such as English.
• Processing of Natural Language is required when you want an intelligent system like
robot to perform as per your instructions, when you want to hear decision from a
dialogue based clinical expert system, etc.
Components of NLP
1.Natural Language Understanding (NLU)
Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
2.Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural
language from some internal representation.It involves :
• Text planning − It includes retrieving the relevant content from knowledge base.
• Sentence planning − It includes choosing required words, forming meaningful
phrases, setting tone of the sentence.
Text Realization − It is mapping sentence plan into sentence structure
NLP TERMINOLOGY
• Syntax − It refers to arranging words to make a sentence. It also involves determining
the structural role of words in the sentence and in phrases.
• Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations
and how the interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
• Phonology − It is study of organizing sound systematically.
• Morphology − It is a study of construction of words from primitive meaningful units.
• Morpheme − It is primitive unit of meaning in a language. Ex: "un-", "break", and "-
able" in the word "unbreakable”
STEPS IN NLP PIPELINE

1. Sentence Segmentation : it breaks the paragraph into separate sentences.
2. Word Tokenization: it is used to break the sentence into separate words or tokens.
3. Stemming : it is used to normalize words into its base form or root form. For
example, celebrates, celebrated and celebrating, all these words are originated with a
single root word "celebrate." The big problem with stemming is that sometimes it
produces the root word which may not have any meaning.
4. Lemmatization : it is quite similar to Stemming. It is used to group different inflected
forms of the word, called Lemma. In lemmatization, the words intelligence,
intelligent, and intelligently has a root word intelligent, which has a meaning.
5. Identifying Stop Words: In English, there are a lot of words that appear very
frequently like "is", "and", "the", and "a". NLP pipelines will flag these words as stop
words. Stop words might be filtered out before doing any statistical analysis.
6. 6. Dependency Parsing : it is used to find that how all the words in the sentence are
related to each other.
7. 7. POS tags : POS stands for parts of speech, which includes Noun, verb, adverb,
and Adjective. It indicates that how a word functions with its meaning.
8. 8. Named Entity Recognition (NER): it is the process of detecting the named entity
such as person name, movie name, organization name, or location.
9. 9. Chunking : it is used to collect the individual piece of information and grouping
them into bigger pieces of sentences
PHASES IN NLP
• Lexical Analysis − It involves identifying and analyzing the structure of words.
Lexicon of a language means the collection of words and phrases in a language.
Lexical analysis is dividing the whole chunk of txt into paragraphs, sentences, and
words.
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for
grammar and arranging words in a manner that shows the relationship among the
words. The sentence such as “The school goes to boy” is rejected by English syntactic
analyzer
• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyzer disregards sentence
such as “hot ice-cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually
meant.
MACHINE TRANSLATION
Machine translation is the process of using artificial intelligence (AI) to automatically
translate content from one language (the source) to another (the target) without any human
input.
The most common types of machine translation include:
Rule-based machine translation (RBMT)

The earliest form of MT, rule-based MT, has several serious disadvantages including
requiring significant amounts of human post-editing, the requirement to manually add
languages, and low quality in general. It has some uses in very basic situations where a quick
understanding of meaning is required.
Statistical machine translation (SMT)

Statistical MT builds a statistical model of the relationships between words, phrases, and
sentences in a text. It applies the model to a second language to convert those elements to the
new language. Thereby, it improves on rule-based MT but shares many of the same
problems.
Neural machine translation (NMT)

As mentioned above, the neural MT model uses artificial intelligence to learn languages and
constantly improve that knowledge, much like the neural networks in the human brain. It is
more accurate, easier to add languages, and much faster once trained. Neural MT is rapidly
becoming the standard in MT engine development.
Hybrid Machine Translation or HMT

HMT, as the term demonstrates, is a mix of RBMT and SMT. It uses a translation memory,
making it unquestionably more successful regarding quality. Nevertheless, even HMT has a
lot of downsides, the biggest of which is the requirement for enormous editing, and human
translators will also be needed. There are several approaches to HMT like multi-engine,
statistical rule generation, multi-pass, and confidence-based.
SPEECH RECOGNITION
Speech recognition, also known as automatic speech recognition (ASR), computer speech
recognition, or speech-to-text, is a capability which enables a program to process human
speech into a written format.
Various algorithms and computation techniques are used to recognize speech into text and
improve the accuracy of transcription. Below are brief explanations of some of the most
commonly used methods:
 Natural language processing (NLP): While NLP isn’t necessarily a specific

algorithm used in speech recognition, it is the area of artificial intelligence which
focuses on the interaction between humans and machines through language through
speech and text. Many mobile devices incorporate speech recognition into their
systems to conduct voice search—e.g. Siri—or provide more accessibility around

texting.
 Hidden markov models (HMM): Hidden Markov Models build on the Markov
chain model, which stipulates that the probability of a given state hinges on the
current state, not its prior states. While a Markov chain model is useful for
observable events, such as text inputs, hidden markov models allow us to
incorporate hidden events, such as part-of-speech tags, into a probabilistic model.
They are utilized as sequence models within speech recognition, assigning labels to
each unit—i.e. words, syllables, sentences, etc.—in the sequence. These labels
create a mapping with the provided input, allowing it to determine the most
appropriate label sequence.
 N-grams: This is the simplest type of language model (LM), which assigns
probabilities to sentences or phrases. An N-gram is sequence of N-words. For
example, “order the pizza” is a trigram or 3-gram and “please order the pizza” is a
4-gram. Grammar and the probability of certain word sequences are used to improve
recognition and accuracy.
 Neural networks: Primarily leveraged for deep learning algorithms, neural
networks process training data by mimicking the interconnectivity of the human
brain through layers of nodes. Each node is made up of inputs, weights, a bias (or
threshold) and an output. If that output value exceeds a given threshold, it “fires” or
activates the node, passing data to the next layer in the network. Neural
networks learn this mapping function through supervised learning, adjusting based
on the loss function through the process of gradient descent. While neural networks
tend to be more accurate and can accept more data, this comes at a performance
efficiency cost as they tend to be slower to train compared to traditional language
models.
 Speaker Diarization (SD): Speaker diarization algorithms identify and segment
speech by speaker identity. This helps programs better distinguish individuals in a
conversation and is frequently applied at call centers distinguishing customers and
sales agents.
ROBOT (hardware, perception,planning,moving)

Robots are physical agents that perform tasks by manipulating the physical world. Effectors
have a single purpose that to assert physical forces on the environment. Robots are also
equipped with sensors, which allow them to perceive their environment.
Most of today’s robots fall into one of three primary categories.
1.MANIPULATORS: Manipulator motion usually involves a chain of controllable joints,

enabling such robots to place their effectors in any position within the workplace. Few car
manufacturers could survive without robotic manipulators, and some manipulators have even
been used to generate original artwork.
2.MOBBILE ROBOT: The second category is the mobile robot. Mobile robots move about
their environment using wheels, legs, or similar mechanisms. They have been put to use
delivering food in hospitals, moving containers at loading docks, and similar tasks. Other
types of mobile robots include unmanned air vehicles, Autonomous underwater vehicles etc.
3.MOBILE MANIPULATOR: The third type of robot combines mobility with

manipulation, and is often called a mobile manipulator. Humanoid robots mimic the human
torso. The field of robotics also includes prosthetic devices, intelligent environments and
multibody systems, wherein robotic action is achieved through swarms of small cooperating
robots. Robotics brings together many of the concepts we have seen earlier in the book,
including probabilistic state estimation, perception, planning, unsupervised learning, and
reinforcement learning.
ROBOT HARDWARE
The robot hardware mainly depends on
1.sensors
2.effectors
SENSORS : Sensors are the perceptual interface between robot and environment. PASSIVE
SENSOR: Passive sensors, such as cameras, are true observers of the environment: they
capture signals that are generated by other sources in the environment.
ACTIVE SENSOR: Active sensors, such as sonar, send energy into the environment. They
rely on the fact that this energy is reflected back to the sensor. Range finders are sensors that
measure the distance to nearby objects. In the early days of robotics, robots were commonly
equipped with sonar sensors. Sonar sensors emit directional sound waves, which are reflected
by objects, with some of the sound making it back into the sensor. Stereo vision relies on
multiple cameras to image the environment from slightly different viewpoints, analyzing the
resulting parallax in these images to compute the range of surrounding objects. Other
common range sensors include radar, which is often the sensor of choice for UAVs. Radar
sensors can measure distances of multiple kilometers. On the other extreme end of range
sensing are tactile sensors such as whiskers, bump panels, and touch-sensitive skin. A second
important class of sensors is location sensors. Most location sensors use range sensing as a
primary component to determine location. Outdoors, the Global Positioning System (GPS) is
the most common solution to the localization problem. GPS measures the distance to
satellites that emit pulsed signals. The third important class is proprioceptive sensors, which
inform the robot of its own motion. To measure the exact configuration of a robotic joint,
motors are often equipped with shaft decoders that count the revolution of motors in small
increments. Other important aspects of robot state are measured by force sensors and torque
sensors. These are indispensable when robots handle fragile objects or objects whose exact
shape and location is unknown.
EFFECTORS: Effectors are the means by which robots move and change the shape of their
bodies. To understand the design of effectors we use the concept of degree of freedom. We
count one degree of freedom for each independent direction in which a robot, or one of its
effectors, can move. For example, a rigid mobile robot such as an AUV has six degrees of
freedom, three for its (x, y, z) location in space and three for its angular orientation, known as
yaw, roll, and pitch. These six degrees define the kinematic state2 or pose of the robot. The
dynamic state of a robot includes these six plus an additional six dimensions for the rate of
change of each kinematic dimension, that is, their velocities.
For nonrigid bodies, there are additional degrees of freedom within the robot itself. For
example, the elbow of a human arm possesses two degree of freedom. It can flex the upper
arm towards or away, and can rotate right or left. The wrist has three degrees of freedom. It
can move up and down, side to side, and can also rotate. Robot joints also have one, two, or
three degrees of freedom each. Six degrees of freedom are required to place an object, such as
a hand, at a particular point in a particular orientation. In the fig 4(a) has exactly six degrees
of freedom, created REVOLUTE JOINT by five revolute joints that generate rotational
motion and one prismatic joint that generates sliding motion For mobile robots, the DOFs are
not necessarily the same as the number of actuated elements. Consider, for example, your
average car: it can move forward or backward, and it can turn, giving it two DOFs. In
contrast, a car’s kinematic configuration is three-dimensional: on an open flat surface, one
can easily maneuver a car to any (x, y) point, in any orientation. (See Figure 25.4(b).) Thus,
the car has three effective degrees of freedom but two control label degrees of freedom. We
say a robot is nonholonomic if it has more effective DOFs than controllable DOFs and
holonomic if the two numbers are the same. Sensors and effectors alone do not make a robot.
A complete robot also needs a source of power to drive its effectors. The electric motor is the
most popular mechanism for both manipulator actuation and locomotion, but pneumatic
actuation using compressed gas and Hydraulic actuation using pressurized fluids also have
their application niches.
ROBOTIC PERCEPTION: Perception is the process by which robots map sensor

measurements into internal representations of the environment. Perception is difficult because
sensors are noisy, and the environment is partially observable, unpredictable, and often
dynamic. As a rule of thumb, good internal representations for robots have three properties:
they contain enough information for the robot to make good decisions, they are structured so
that they can be updated efficiently, and they are natural in the sense that internal variables
correspond to natural state variables in the physical world.
For robotics problems, we include the robot’s own past actions as observed variables in the
model. Xt is the state of the environment (including the robot) at time t, Zt is the observation
received at time t, and At is the action taken after the observation is received. We would like
to compute the new belief state, P(Xt+1 | z1:t+1, a1:t), from the current belief state P(Xt |
z1:t, a1:t−1) and the new observation zt+1. Thus, we modify the recursive filtering equation
(15.5 on page 572) to use integration rather than summation:
P(Xt+1 | z1:t+1, a1:t) = αP(zt+1 | Xt+1)_ P(Xt+1 | xt, at) P(xt | z1:t, a1:t−1) dxt
This equation states that the posterior over the state variables X at time t + 1 is calculated
recursively from the corresponding estimate one time step earlier. This calculation involves
the previous action at and the current sensor measurement zt+1. The probability P(Xt+1 | xt,
at) is called the transition model or motion model, and P(zt+1 | X t+1) is the sensor
model. 1.Localization and mapping Localization is the problem of finding out where things
are—including the robot itself. Knowledge about where things are is at the core of any
successful physical interaction with the environment. To keep things simple, let us consider a
mobile robot that moves slowly in a flat 2D world. Let us also assume the robot is given an
exact map of the environment. The pose of such a mobile robot is defined by its two
Cartesian coordinates with values x and y and its heading with value θ. If we arrange those
three values in a vector, then any particular state is given by Xt = (xt, yt, θt) T
PLANNING TO MOVE: All of a robot’s deliberations ultimately come down to deciding

how to move effectors. The point-to-point motion problem is to deliver the robot or its end
effector to a designated target location. A greater challenge is the compliant motion problem,
in which a robot moves while being in physical contact with an obstacle. There are two main
approaches: cell decomposition and skeletonization. Each reduces the continuous path-
planning problem to a discrete graph-search problem.
Configuration space: We will start with a simple representation for a simple robot motion
problem. It has two joints that move independently. the robot’s configuration can be
described by a four dimensional coordinate: (xe, ye) for the location of the elbow relative to
the environment and (xg, yg) for the location of the gripper. They constitute what is known as
workspace representation. The problem with the workspace representation is that not all
workspace coordinates are actually attainable, even in the absence of obstacles. This is
because of the linkage constraints on the space of attainable workspace coordinates.
Cell decomposition methods:The first approach to path planning uses cell decomposition—
that is, it decomposes the free space into a finite number of contiguous regions, called cells.
A decomposition has the advantage that it is extremely simple to implement, but it also
suffers from three limitations. First, it is workable only for lowdimensional configuration
spaces, Second, there is the problem of what to do with cells that are ―mixed , And third,
any path through a discretized state space will not be smooth. Cell decomposition methods
can be improved in a number of ways, to alleviate some of these problems. The first approach
allows further subdivision of the mixed cells—perhaps using cells of half the original size. A
second way to obtain a complete algorithm is to insist on an exact cell decomposition of the
free space.
Modified cost functions: This problem can be solved by introducing a potential field. A
potential field is a function defined over state space, whose value grows with the distance to
the closest obstacle. The potential field can be used as an additional cost term in the shortest-
path calculation. This induces an interesting trade off. On the one hand, the robot seeks to
minimize path length to the goal. On the other hand, it tries to stay away from obstacles by
virtue of minimizing the potential function. Clearly, the resulting path is longer, but it is also
safer. There exist many other ways to modify the cost function. However, it is often easy to
smooth the resulting trajectory after planning, using conjugate gradient methods. Such post-
planning smoothing is essential in many real world applications.
Skeletonization methods: The second major family of path-planning algorithms is based on

the idea of skeletonization. These algorithms reduce the robot’s free space to a one-
dimensional representation, for which the planning problem is easier. This lowerdimensional
representation is called a skeleton of the configuration space. Voronoi graph of the free
space—the set of all points that are equidistant to two or more obstacles. To do path planning
with a Voronoi graph, the robot first changes its present configuration to a point on the
Voronoi graph. It is easy to show that this can always be achieved by a straight-line motion in
configuration space. Second, the robot follows the Voronoi graph until it reaches the point
nearest to the target configuration. Finally, the robot leaves the Voronoi graph and moves to
the target. Again, this final step involves straightline motion in configuration space.

Unit 5 - Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5 - Notes

Uploaded by

Copyright:

Available Formats

KIET Group of Institutions, Ghaziabad

Department of Computer Science and Information Technology

Class Note of Artificial Intelligence KCS-071

How language modeling works

Some common statistical language modeling types are:

 N-gram. N-grams are a relatively simple approach to language models. They

 Bidirectional. Unlike n-gram models, which analyze text in one direction

 Exponential. Also known as maximum entropy models, this type is more

 Continuous space. This type of model represents words as a non-linear

Natural Language Processing (NLP)

STEPS IN NLP PIPELINE

The most common types of machine translation include:

Rule-based machine translation (RBMT)

Statistical machine translation (SMT)

Neural machine translation (NMT)

Hybrid Machine Translation or HMT

 Natural language processing (NLP): While NLP isn’t necessarily a specific

systems to conduct voice search—e.g. Siri—or provide more accessibility around

ROBOT (hardware, perception,planning,moving)

Most of today’s robots fall into one of three primary categories.

1.MANIPULATORS: Manipulator motion usually involves a chain of controllable joints,

3.MOBILE MANIPULATOR: The third type of robot combines mobility with

The robot hardware mainly depends on

ROBOTIC PERCEPTION: Perception is the process by which robots map sensor

PLANNING TO MOVE: All of a robot’s deliberations ultimately come down to deciding

Skeletonization methods: The second major family of path-planning algorithms is based on

You might also like