Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

SOFT COMPUTING IN INTELLIGENT MULTI-MODAL

SYSTEMS
R. NAGARJUNA (07EK1A1243)
Department of Information Technology
Abdul Kalam Institute of Technological Sciences
Vepalagadda, Kothagudem

ABSTRACT: INTRODUCTION:
In this paper we will describe an Soft computing techniques are beginning
intelligent multi-modal interface for a to penetrate into new application areas
large workforce management system such as intelligent interfaces,
called the smart work manager. The information retrieval and intelligent
main characteristics of the smart work assistants. The common characteristic of
manager are that it can process speech, all these applications is that they are
text, face images, gaze information and human-centred. Soft computing
simulated gestures using the mouse as techniques are a natural way of handling
input modalities, and its output is in the the inherent flexibility with which
form of speech, text or graphics. The humans communicate, request
main components of the system are a information, describe events or perform
reasoner, a speech system, a vision actions.
system, an integration platform and an
application interface. The overall A multi-modal system is one that uses a

architecture of the system will be variety of modes of communication

described together with the integration between human and computer either in

platform and the components of the combination or isolation. Typically

system which include a non-intrusive research has concentrated on enhancing

neural network based gaze tracking standard devices such as keyboard and

system. Fuzzy and probabilistic mouse, with non-standard ones such as

techniques have been used in the speech and vision .

reasoner to establish temporal An Intelligent multi-modal system is

relationships and learn interaction defined as a system that combines,

sequences. reasons with and learns from,


information originated from different

1
modes of communication between communication can be resolved if extra
human and the computer. information is available . Among the
projects reviewed in the references are
The main reason for using multi- CUBRICON from Calspan-UB Research
modality in a system is to provide a Centre, XTRA from German Research
richer set of channels through which the Centre for AI and the SRI system from
user and computer can communicate. SRI International.
The necessary technologies for such The main characteristics of the SWM are
systems are : that it can process speech, text, face
• AI and Soft Computing for images, gaze information and simulated
representation and reasoning gestures using the mouse as input
• User interfaces for effective modalities, and its output is in the form
communication channels between the of speech , text or graphics. The main
user and the system components of the system are the
reasoner, a speech system, a vision
In this paper we will describe the system, an integration platform and the
development of an intelligent multi- application interface.
modal system referred to as the smart
work manger (SWM) for a large-scale
work force scheduling application .
ENGINEERING ISSUES
Intelligent multi-modal systems use a
RELATED WORK number of input or output modalities to
Research in human-computer communicate with the user, exhibiting
interactions has mainly focused on some form of intelligent behaviour in a
natural language, text, speech and vision particular domain. The functional
primarily in isolation. Recently there requirements of such systems include the
have been a number of research projects ability to receive and process user input
that have concentrated on the integration in various forms such as:
of such modalities using intelligent • typed text from keyboard,
reasoners . The rationale is that many
• mouse movement or clicking,
inherent ambiguities in single modes of

2
• speech from a microphone, ARCHITECTURE
• focus of attention of human eye The overall architecture of SWM with
the various modules and the
captured by a camera,
communications between them is given
in Fig. 1.
The system must be also able to generate
output for the user using speech,
graphics, and text.

A system, which exhibits the above


features, is called a multi-modal system.
For a multi-modal system to be also
called intelligent, it should be capable of
reasoning in a particular domain
automating human tasks, facilitating
THE REASONER
humans to perform tasks more complex
The main functions of the reasoner are
than before or exhibiting a behaviour
two folds. First it must be able to handle
which can be characterised as intelligent
ambiguities such as give me this of
by the users of the system.
that.Second it must have the capabilities
Given these requirements for intelligent
to deal with often-conflicting
multi-modal systems, it becomes
information arriving from various
obvious that such systems, in general,
modalities. The capabilities of the
are difficult to develop. A modular
reasoner are to a large extent dependant
approach is therefore necessary for
upon the capabilities provided by the
breaking down the required functionality
platform on which the reasoner is
into a number of sub-systems which are
implemented. The platform used for the
easier to develop or for which software
reasoner is CLIPS, which is a well
solutions already exist. Other
known expert system shell developed by
requirements for such systems are
NASA with object oriented, declarative
concurrency, a communication
and procedural programming capabilities
mechanism and distribution of processes
and the fuzzyCLIPS extension. The
across a network.
reasoner handles ambiguities by using a

3
knowledge base that is being continually The relationship with the highest CF will
updated by the information arriving from be chosen as the most likely relationship
various modalities. The structure of the between the two events. This
reasoner is shown in Fig. 2. There are relationship can be used later by the
five modules in the reasoner: fuzzy reasoner to resolve conflicts between,
temporal reasoning, query pre- and checking dependency of, the
processing, constraint checking, modalities.
resolving ambiguities (WIZARD) and
post-processing. In the query pre-processing module a
sentence in natural language form is
The fuzzy temporal reasoning module converted to a query which conforms to
receives time-stamped events from the system’s pre-defined
various modalities and determines the grammar.Redundant words are removed,
fuzzy temporal relationship between keywords are placed in the right order
them. It determines to what degree two and multiple word attributes are
events have a temporal relationship, such converted into single strings.
as before, during or overlapping. Using
the certainty factors (CF) of fuzzyCLIPS The constraint checking module
the temporal reasoner can answer examines the content of the queries. If
questions such as individual parts of the query do not
satisfy pre-defined constraints then they
what is the CF that event 1 took place are replaced by ambiguous terms (this,
before event 2 that) to be resolved later, otherwise the
what is the CF that event 1 took place query is passed on to the next module.
just_before event 2
what is the CF that event 1 took place The WIZARD is at the heart of the
during event 2 reasoner and is the module that resolves
what is the CF that event 1 is ambiguities. The ambiguities in this
overlapping with event 2 application take the form of reserved
words such as this or that,and they refer
to objects that the user is or has been

4
talking about, pointing at or looking at. obtained from the user when pointing
The ambiguities are resolved in a with the mouse or gazing at an object on
hierarchical manner as shown in Fig3. the screen. At the same time there could
be dialogue with the user through the
speech recognizer or the keyboard.
CLIPS is mainly used in a declarative
mode in this system and therefore all the
modules work in parallel. This can cause
conflict between information arriving
from various modalities. The conflict
resolution strategy used in the reasoner
is hierarchical as shown in Fig. 3, with
the focus of attention having the highest
priority and the dialogue system the
lowest. This means that the dialogue
system will act as a safety net for the
other modalities if all
fails, or if inconsistent information is
The context of the interactions between received from the modalities. In cases
the user and the system, if it exists, is where text input is required however, the
maintained by the reasoner in the dialogue system is the only modality that
knowledge base. When a new query is will be called upon. In all other cases the
initiated by the user, it is checked against dialogue system will be redundant unless
the context. If there are ambiguities in all others fail in which case a simple
the query and the context contains dialogue in the form of direct questions
relevant information then the context or answers will be initiated by the
will be used to create a complete query system. The WIZARD sends the
that will be sent to the application completed queries to the post-processing
interface for processing.Another module.
mechanism used to resolve ambiguities The post-processing module simply
is the focus of attention, which is converts the completed queries in a form

5
suitable for the application. This combination with other inputs provided
involves simple operations such as by speech and other modalities would be
formatting the query or extracting key most useful to resolve some real
words from it. application tasks.
Table 4 contains some examples of
interactions with the reasoner and how it The general difficulties we have to face
to build a gaze tracking system are as
follows;
• imprecise data, or the head may pan
and tilt resulting in many eye images
(relative to the viewing camera)
corresponding to the same co-ordinates
on the screen,
• Noisy images, mainly due to change
of lighting. This is typical in an
uncontrolled open plan office
environment,
works.
• possibly infinitely large image set, in
GAZETRACKING:
order to learn the variations of the
TECHNICAL ISSUES images and make the system generalise
The initial objective for the gaze system better,
is the capability of tracking the eye • Accuracy and speed compromise, for
movements within slightly restricted a real-time running system, more
environments. More specifically, the complicated computation intensive
scenario is a user working in front of a algorithms have to give way to simple
computer screen viewing objects on algorithms.
display while a camera is pointed at the
user's face. The objective is to find out Gaze Tracking: System description
where (which object) on the screen the Neural networks have been chosen as the
user is looking at, or to what context he core technique to implement the gaze
is paying attention. This information in tracking system .

6
• A three-layer feed-forward neural line training takes about half an hour on
network is used. The net has 600 input the Ultra-1 workstation. Once trained,
units, one divided hidden layer of 8 the system works in real-time .
hyperbolic units each and corresponding
divided output layer with 40 and 30 CONCLUSION:
units, respectively, to indicate the In this paper I have shown how such
positions along x- and y- direction of the flexibility can be exploited within the
screen grid. context of an intelligent multi-modal
• Grey-scale images of the right eye are interface. Soft computing techniques
automatically segmented from the head have been used at the interface for
images inside a search window,the temporal reasoning, approximate query
images of size 40 × 15 are then matching, learning action sequences and
normalised, and a value between -1 and gaze tracking. It is important to note that
1 is obtained for each pixel. Each soft computing has been used in
normalised image comprises the input of conjunction with other AI- based
a training pattern. systems performing dynamic scheduling,
• The pre-planned co-ordinates of this logic programming, speech recognition,
image, which is the desired output of the and natural language understanding. I
training pattern, is used to stimulate two believe that soft computing in
related output units along the x and y combination with other AI techniques
output layer respectively. can make a significant contribution to
• The training data are automatically human-centred computing in terms of
grabbed and segmented when one tracks development time, robustness, cost, and
with the eyes a cursor movement reliability.
following a pre-designed zigzag path
across or up and down the screen. A We plan to investigate the following
total of 2000 images is needed to train improvements to the reasoner in the near
the system to achieve better future:
performance. • The context can be extended to have a
The neural network described has a total tree like structure such that the user is
of 11,430 connection weights. The off-

7
able to make reference to previously Negroponte, Nicholar 1995. Being
used contexts. Digital. Coronet Books.
• The temporal reasoner can be used
more extensively in conflict resolution. Nigay L., Coutaz J., 1995, A Generic
• The grammar can be extended to Platform for Addressing the Multimodal
include multiple format grammars. Challenge, CHI ’95 Proceedings.

• The dialogue system can be improved


to become more user friendly. Page, J.H., and Breen A. P. 1996, The

• Different approaches can be used for Laureate Text-to-Speech system:

resolving ambiguities such as Architecture and Applications. BT

competition between modalities or Technology Journal 14(1), 84-99.

bidding.
Schomaker l., Nijtmans J., Camurri A.,
Lavagetto F., Morasso P., Benoit C.,
REFERENCES:
Guiard-Marigny A., Le Goff B., Robert-
Azvine, B., Azarmi, N. and Tsui, K.C.
Ribes J., Adjoudani A., Defee I., Munch
1996. Soft computing - a tools for
S., Hartung K. and Blauert J., 1997 ‘A
building intelligent systems, BT
Taxonomy of Multimodal Interaction in
Technology Journal, vol. 14, no. 4, pp.
the Human Information Processing
37 45, October.
System’, Report of the ESPRIT
PROJECT 8579 MIAMI, February.
Bishop, C. 1995.Neural Network for
Pattern Recognition. Oxford University
Sullivan, J. W., and S. W. Tyler, eds,
Press.
1991. Intelligent User Interfaces.
Frontier Series. New York: ACM Press.
Lesaint, D., C. Voudouris, N. Azarmi
and B. Laithwaite 1997. Dynamic
Tsui K.C., Azvine B., Djian D.,
Workforce Management. UK IEE
Voudouris C., and Xu L.Q., 1998,
Colloquium on AI for Network
Intelligent Multimodal Systems, BTTJ
Management Systems (Digest No.
Jounal, Vol.16, No. 3, July
1997/094), London, UK.

You might also like