Professional Documents
Culture Documents
3D Motion Detection Using Neural Networks
3D Motion Detection Using Neural Networks
3D Motion Detection Using Neural Networks
CHAPTER 1
INTRODUCTION
One of its most important tasks is to detect the moving obstacles like cars,
bicycles or even pedestrians while the vehicle itself is running in a high speed.
Methods of image differencing with the clear background or between adjacent
frames are well used for the motion detection. But when the observer is also moving,
which leads to the result of continuously changing background scene in the
perspective projection image, it becomes more difficult to detect the real moving
objects by differencing methods. To deal with this problem, many approaches have
been proposed in recent years. Previous work in this area has been mainly in two
categories: 1) Using the difference of optical flow vectors between background and
the moving objects, 2) calibrating the background displacement by using camera’s
3D motion analysis result. Calculate the optical flow and estimate the flow vector’s
reliability between adjacent frames. The major flow vector, which represents the
motion of background, can be used to classify and extract the flow vectors of the real
moving objects. However, by reason of its huge calculation cost and its difficulty for
determining the accurate flow vectors, it is still unavailable for real applications. To
analysis the camera’s 3D motion and calibrate the background is another main
method for moving objects detection. For on-board camera’s motion analysis, many
motion-detecting algorithms have been proposed which always depend on the
previous recognition results like road lane-marks and horizon disappointing. These
methods show some good performance in accuracy and efficiency because of their
detailed analysis of road structure and measured vehicle locomotion, which is,
however, computationally expensive and over-depended upon road features like
lane-marks, and therefore lead to unsatisfied result when lane mark is covered by
other vehicles or not exist at all. Compare with these previous works, a new method
of moving objects detection from an on-board camera is presented in this paper. To
deal with the background-change problem, our method uses camera’s 3D motion
analysis results to calibrate the background scene. With pure points matching and the
introduction of camera’s Focus of Expansion (FOE), our method is able to determine
camera’s rotation and translation parameters theoretically by using only three pairs of
matching points between adjacent frames, which make it faster and more efficient for
real-time applications.
3
There are different topologies of neural networks that may be employed for
time series modeling. In our investigation we used radial basis function networks
which have shown considerably better scaling properties, when increasing the
number of hidden units, than networks with sigmoid activation function.
RBF networks were introduced into the neural network literature by Broom
head/Lowe and Poggio/Girosi in the late 1980s. The RBF network model is
motivated by the locally tuned response observed in biologic neurons, e.g. in the
visual or in the auditory system. RBFs have been studied in multivariate
approximation theory, particularly in the field of function interpolation. The RBF
neural network model is an alternative to multilayer perceptron which is perhaps the
4
most often used neural network architecture. A radial basis function network (RBF),
therefore, has a hidden layer of radial units, each actually modeling a Gaussian
response surface. Since these functions are nonlinear, it is not actually necessary to
have more than one hidden layer to model any shape of function: sufficient radial
units will always be enough to model any function.
After the process of estimation, the detected motion has to be extracted. With
the obtained boundary, two objects (with background) can then be extracted from
two image frames (both current image frame and previous image frame). Extracting
the moving object from its background can be done by the edge enhancement
network and the background remover.
In algorithm level, complexity, regularity and precision are main factors that
directly affect the power consumed in extracting an algorithm for motion estimation.
Concurrency and modularity are the requirements on algorithms that are intended to
execute on low power architecture. This project aims to reduce the power
consumption of motion estimation at algorithm level and architectural level by using
neural network concept.
5
Another goal has been to search for algorithms that can be used to implement
the RBF neural network.
A final goal has been to design and implement an algorithm including object
extraction. This should be done in high level language or matlab. The source code
should be easy to understand so that it can serve as a reference on the standard for
designers that need to implement real time motion detection.
6
CHAPTER 2
Traditionally, the term neural network has been used to refer to a network of
biological neurons. In modern usage, the term is often used to refer to artificial
neural networks, which are composed of artificial neurons or nodes. Thus the term
'Neural Network' has two distinct connotations:
1. Biological neural networks are made up of real biological neurons that are
connected or functionally-related in the peripheral nervous system or the
central nervous system. In the field of neuroscience, they are often identified
as groups of neurons that perform a specific physiological function in
laboratory analysis.
The question of what is the degree of complexity and the properties that
individual neural elements should have in order to reproduce something resembling
animal intelligence is a subject of current research in theoretical neuroscience.
2.3 Background
2.4 Models
• Choice of model: This will depend on the data representation and the
application. Overly complex models tend to lead to problems with learning.
• Learning algorithm: There are numerous tradeoffs between learning
algorithms. Almost any algorithm will work well with the correct hyper
parameters for training on a particular fixed dataset. However selecting and
tuning an algorithm for training on unseen data requires a significant amount
of experimentation.
• Robustness: If the model, cost function and learning algorithm are selected
appropriately the resulting ANN can be extremely robust.
With the correct implementation ANN can be used naturally in online learning and
large dataset applications. Their simple implementation and the existence of mostly
11
local dependencies exhibited in the structure allows for fast, parallel implementations
in hardware.
The feed forward neural networks are the first and arguably simplest type of
artificial neural networks devised. In this network, the information moves in only one
direction, forward, from the input nodes, through the hidden nodes (if any) and to the
output nodes. There are no cycles or loops in the network.
A perceptron can be created using any values for the activated and
deactivated states as long as the threshold value lies between the two. Most
perceptrons have outputs of 1 or -1 with a threshold of 0 and there is some evidence
that such networks can be trained more quickly than networks created from nodes
with different activation and deactivation values.
The universal approximation theorem for neural networks states that every
continuous function that maps intervals of real numbers to some output interval of
real numbers can be approximated arbitrarily closely by a multi-layer perceptron
with just one hidden layer. This result holds only for restricted classes of activation
functions, e.g. for the sigmoid functions.
A three layer Perceptron net capable of calculating XOR. The numbers within
the perceptrons represent each perceptrons' explicit threshold. The numbers that
annotate arrows represent the weight of the inputs. This net assumes that if the
threshold is not reached, zero (not -1) is output. Note that the bottom layer of inputs
is not always considered a real perceptron layer.
The Echo State Network (ESN) is a recurrent neural network with a sparsely
connected random hidden layer. The weights of output neurons are the only part of
the network that can change and be learned. ESN are good to (re)produce temporal
patterns.
14
A stochastic neural network differs from a regular neural network in the fact
that it introduces random variations into the network. In a probabilistic view of
neural networks, such random variations can be viewed as a form of statistical
sampling, such as Monte Carlo sampling.
CHAPTER 3
RBF NETWORK
Radial functions are a special class of function. Their characteristic feature is that
their response decreases (or increases) monotonically with distance from a central
point. The centre, the distance scale, and the precise shape of the radial function are
parameters of the model, all fixed if it is linear.
A typical radial function is the Gaussian which, in the case of a scalar input, is
Its parameters are its centre c and its radius r. The figure illustrates a Gaussian RBF
with centre c = 0 and radius r = 1.
A Gaussian RBF monotonically decreases with distance from the centre. In contrast,
a multiquadric RBF which, in the case of scalar input, is
monotonically increases with distance from the centre. Gaussian-like RBFs are local
(give a significant response only in a neighbourhood near the centre) and are more
commonly used than multiquadric-type RBFs which have a global response.
A RBF is a function which has built into a distance criterion with respect to a
centre. Radial basis functions have been applied in the area of neural networks where
they may be used as a replacement for the sigmoidal hidden layer transfer
characteristic in multi-layer perceptrons. RBF networks have 2 layers of processing:
In the first, input is mapped onto each RBF in the 'hidden' layer. The RBF chosen is
usually a Gaussian. In regression problems the output layer is then a linear
combination of hidden layer values representing mean predicted output. The
16
interpretation of this output layer value is the same as a regression model in statistics.
In classification problems the output layer is typically a sigmoid function of a linear
combination of hidden layer values, representing a posterior probability.
Performance in both cases is often improved by shrinkage techniques, known as
ridge regression in classical statistics and known to correspond to a prior belief in
small parameter values (and therefore smooth output functions) in a Bayesian
framework.
RBF networks have the advantage of not suffering from local minima in the
same way as multi-layer perceptrons. This is because the only parameters that are
adjusted in the learning process are the linear mapping from hidden layer to output
layer. Linearity ensures that the error surface is quadratic and therefore has a single
easily found minimum. In regression problems this can be found in one matrix
operation. In classification problems the fixed non-linearity introduced by the
sigmoid output function is most efficiently dealt with using iterated reweighed least
squares.
RBF networks have the disadvantage of requiring good coverage of the input
space by radial basis functions. RBF centers are determined with reference to the
distribution of the input data, but without reference to the prediction task. As a result,
representational resources may be wasted on areas of the input space that are
irrelevant to the learning task. A common solution is to associate each data point
with its own centre, although this can make the linear system to be solved in the final
layer rather large, and requires shrinkage techniques to avoid over fitting.
Associating each input datum with an RBF leads naturally to kernel methods
such as Support Vector Machines and Gaussian Processes (the RBF is the kernel
function). All three approaches use a non-linear kernel function to project the input
data into a space where the learning problem can be solved using a linear model.
Like Gaussian Processes, and unlike SVMs, RBF networks are typically trained in a
Maximum Likelihood framework by maximizing the probability (minimizing the
error) of the data under the model. SVMs take a different approach to avoiding over
fitting by maximizing instead a margin. RBF networks are outperformed in most
17
Artificial networks typically have three layers: an input layer, a hidden layer
with a non-linear RBF activation function and a linear output layer. The output,
Where N is the number of neurons in the hidden layer, ci is the center vector for
neuron i, and ai are the weights of the linear output neuron. In the basic form all input
are connected to each hidden neuron. The norm is typically taken to be the Euclidean
distance and the basis function is taken to be Gaussian.
The weights ai, , and β are determined in a manner that optimizes the fit between
and the data.
18
3.4 Training
In a RBF network there are three types of parameters that need to be chosen
to adapt the network for a particular task: the center vectors ci, the output weights wi,
and the RBF width parameters βi. In the sequential training of the weights are
updated at each time step as data streams in.
For some tasks it makes sense to define an objective function and select the
parameter values that minimize its value. The most common objective function is the
least squares function
Where,
.
19
3.5 Interpolation
It can be shown that the interpolation matrix in the above equation is non-singular, if
the point’s x_i are distinct, and thus the weights w can be solved by simple linear
algebra:
If the purpose is not to perform strict interpolation but instead more general
function approximation or classification the optimization is somewhat more complex
because there is no obvious choice for the centers. The training is typically done in
two phases first fixing the width and centers and then the weights. This can be
justified by considering the different nature of the non-linear hidden neurons versus
the linear output neuron.
20
Basis function centers can be either randomly sampled among the input instances or
found by clustering the samples and choosing the cluster means as the centers. The
RBF widths are usually all fixed to same value which is proportional to the
maximum distance between the chosen centers.
After the centers ci have been fixed, the weights that minimize the error at the output
are computed with a linear pseudo inverse solution:
Where the entries of G are the values of the radial basis functions evaluated at the
points xi: gji = ρ (| | xj − ci | |).
The existence of this linear solution means that unlike Multi-layer perceptron (MLP)
networks the RBF networks have a unique local minimum (when the centers are
fixed).
3.9 Advantages/Disadvantages
CHAPTER 4
Given a number of sequential video frames from the same source the goal is
to detect the motion in the area observed by the source. When there is no motion all
the sequential frames have to be similar up to noise influence. In the case when
motion is present there is some difference between the frames. For sure, each low-
cost system has some aspect of noise influence. And in case of no motion every two
sequential frames will not be the identical. This is why the system must be smart
enough to distinguish between noise and real motion. When the systems are
calibrated and stable enough the character of noise is that every pixel value may be
slightly different from that in other frame. And in first approximation it is possible to
define some noise per pixel threshold parameter (adaptable for any given state) the
meaning of which is how the pixel value (of the same oriented pixel in two
sequential frames) might differ but actually the indicating value is the same one.
More precisely, if the pixel with coordinates (Xa,Ya) in frame A differs from the
pixel with coordinates (Xb,Yb) in frame B less than on TPP (threshold per pixel)
value so we will see them as pixels with equal values. And we can write it by
formulae:
Pixel (Xa, Ya) equal to Pixel (Xb, Yb) I
if
{abs (Pixel (Xa,Ya)-Pixel(Xb,Yb)) < TPP }
By adapting the TPP value to current system state we can make the system to be
noise-stable. By applying this threshold operation to every pixel pair we may assume
that all the preprocessed pixel values are noise-free. The element of noise that is not
cancelled will be significantly small relative to other part.
Ok, if so we have to post-process these values to detect the motion if any. As
it was memorized above we have manipulate with different pixels inside two
sequential frames to make conclusion about the motion.
22
Firstly, to make the system sensitive enough we have not to fix the TPP value
too big. It mean that keeping the sensitivity of the system high in any two frames
there will be some little number (TPP related) of different pixels. And in this case we
have not to see them as noise. It is the first of the reasons to define a TPF (threshold
per frame) value (adaptable for any given state) the meaning of which is how many
pixels at least, inside two sequential frames must differ in order to see them as
motion. The second reason to deal with TPF is to filter (to drop) small motion. For
instance, by playing with TPF values we can neutralize motion of the small object
(bugs etc.) by still detect the motion of people. And we can write the exact meaning
of TPF by formulae:
Both of TPP and TPF values are variable through the UI to get the optimal system
sensitivity. Also the TPF value has its visual equivalent and it is used as following.
After the pixels pre-processing (by TPP) lets color all static (which do not include
motion) pixels by lets say black color and all the dynamic (which indicate the
motion) pixels will be left with their original color. This will bring the effect of
motion extraction. In the other words, all the static parts of the frames will be black,
and only the moving parts will be seen normally. The enabling/disabling of this
effect is possible to control through the GUI.
The Camera Manager provides routines for acquiring video frames from
CCD cameras. Any process can request a video frame from any video source. The
system manages a request queue for each source and executes them cyclically.
23
CHAPTER 5
This chapter presents the main software design and implementation issues. It
starts by describing the general flow chart of the main program that was implemented
in MATLAB. It then explains each component of the flow chart with some details.
Finally it shows how the graphical user interface GUI was designed.
The above block diagram shows the surveillance system which consists of a
camera system which monitors the particular area, a video daughter card which
transmits the video signal to electrical signal, a network card which helps in
connecting to a network and motion detection algorithm (SAD and Correlation)
along with RBF network.
24
The main task of the software was to read the still images recorded from the
camera and then process these images to detect motions and take necessary actions
accordingly. Figure 6 below shows the general flow chart of the main program.
Start
Setup &
Initializations
Image
Acquisition
Motion Detection
Algorithm Break & clear
No Yes
Is image >
threshold
Actions on
Motion
Detection
Data
Record
Stop
Start
Launch GUI
No Start Yes
button
pressed
Read Threshold
Value
Stop
Figure 7 show the flow chart for the setup and initialization process. This process
includes the launch of the graphical user interface (GUI) where the type of motion
detection algorithm is selected and threshold value (the amount of sensitivity of the
detection) is being initialized.
Also, during this stage a setup process for both the serial port and the video object is
done. This process takes approximately 15 seconds to be completed,(depending on
the specifications of the PC used) for the serial port it starts by selecting a
communication port and reserving the memory addresses for that port, then the PC
connect to the device using the communication setting that was mentioned in the
previous chapter. The video object is part of the image acquisition process but it
should be setup at the start of the program.
Start
Read First
Frame
Convert to
Grayscale
Read Second
Frame
Convert to
Grayscale
Stop
After setup stage the image acquisition starts as shown in figure 8 above. This
process reads images from the PC camera and save them in a format suitable for the
motion detection algorithm.
There were three possible options from which one is implemented. The first
option was by using auto snapshots software that takes images automatically and
save them on a hard disk as JPEG format, and then another program reads these
images in the same sequence as they were saved. It was found that the maximum
speed that can be attained by this software is one frame per second and this limits the
speed of detection. Also, synchronization was required between both image
processing and the auto snapshot software’s where next images need to be available
on the hard disk before processing them.
The second option was to display live video on the screen and then start
capturing the images from the screen. This is a faster option from the previous
approach but again it faced the problem of synchronization, when the computer
monitor goes into a power saving mode where black images are produced all the time
during the period of the black screen.
The third option was by using the image acquisition toolbox provided in
MATLAB 6.5.1 or higher versions. The image acquisition toolbox is a collection of
functions that extend the capability of MATLAB. The toolbox supports a wide range
of image acquisition operations, including acquiring images through many types of
image acquisition devices, such as frame grabbers and USB PC cameras, also
viewing a preview of the live video displayed on monitor and reading the image data
into the MATLAB workspace directly.
For this project video input function was used to initialize a video object that
connects to the PC camera directly. Then preview function was used to display live
video on the monitor. Get snapshot function was used to read images from the
camera and place them in MATLAB workspace.
28
The later approach was implemented because it has many advantages over the
others. It achieved the fastest capturing speed at a rate of five frames per seconds
depending on algorithm complexity and PC processor speed. Furthermore, the
problem of synchronization was solved because both capturing and processing of
images were done using the same software.
and D(t ) = 0 . However noise is always presented in images and a better model of
the images in the absence of motion will be
I (t i ) = I (t j ) + n( p )
The figure also shows a test case that contains a large change in the scene being
monitored by the camera this was done by moving the camera. During the time
before the camera was moved the SAD value was around 1.87 and when the camera
was moved the SAD value was around 2.2. If the threshold for detection was fixed
around the value less than 2.2 it will continuously detect motion after the camera stop
moving.
This approach solve the need for continuously re-estimate the threshold
value. Choosing a threshold of 1*10-3 will detect the times when only the camera is
moved. This results into a robust motion detection algorithm that can not be affected
by illumination change and camera movements.
Start
Time
Date Update
Frame# Log File
Trigger
Serial
Port
Display
Image
Convert
Image to
Frame
Stop
As the above flow chart show a number of activities happen when motion is
detected. First the serial port is being triggered by a pulse from the PC; this pulse is
used to activate external circuits connected to the PC. Also a log file is being created
and then appended with information about the time and date of motion also the frame
number in which motion occur is being recorded in the log file. Another process is to
display the image that was detected on the monitor. Finally the image that was
detected in motion will be converted to a movie frame and will be added to the film
structure.
After motion detection algorithm applied on the images the program checks if
the stop button on GUI was pressed. If it was pressed the flag value will be changed
from one to zero and the program will break and terminate the loop then it will return
the control to the GUI. Next both serial port object and video object will be cleared.
This process is considered as a cleaning stage where the devises connected to the PC
through those objects will be released and the memory space will be freed.
Finally when the program is terminated a data collection process starts where
variable and arrays that contain result of data on the memory will be stored on the
hard disk. This approach was used to separate the real time image processing from
results processing. This has the advantage of calling back these data whenever it is
required. The variables that are being stored from memory to the hard disk are
variance values and the movie structure that contain the entire frames with motion.
At this point the control will be returned to the GUI where the operator can callback
the results that where archived while the system was turned on. Next section will
explain the design of the GUI highlighting each button results and callbacks.
32
START
IMAGE ACQUISTION
FRAME SEPARATION
DIVIDE QUADRANTS
SUM OF
ABSOLUT DIFFERENCE
>T DATA
RECORD
The GUI was designed to facilitate interactive system operation. GUI can be
used to setup the program, launch it, stop it and display results.
Start
Launch program
Terminate Program
View Results
Yes NO
Start Again
Exit
End
The term correlation can also mean the cross-correlation of two functions or
electron correlation in molecular systems. In probability theory and statistics,
correlation, also called correlation coefficient, indicates the strength and direction of
a linear relationship between two random variables. In general statistical usage,
correlation or co-relation refers to the departure of two variables from independence,
although correlation does not imply causation. In this broad sense there are several
coefficients, measuring the degree of correlation, adapted to the nature of data. A
number of different coefficients are used for different situations. The best known is
the Pearson product-moment correlation coefficient, which is obtained by dividing
the covariance of the two variables by the product of their standard deviations.
35
The correlation ρX, Y between two random variables X and Y with expected values μX
and μY and standard deviations σX and σY is defined as:
Where E is the expected value of the variable and cov means covariance. Since μX =
E(X), σX2 = E(X2) − E2(X) and likewise for Y, we may also write
The correlation is defined only if both of the standard deviations are finite and both
of them are nonzero. It is a corollary of the Cauchy-Schwarz inequality that the
correlation cannot exceed 1 in absolute value.
If the variables are independent then the correlation is 0, but the converse is not true
because the correlation coefficient detects only linear dependencies between two
variables. Here is an example: Suppose the random variable X is uniformly
distributed on the interval from −1 to 1, and Y = X2. Then Y is completely determined
by X, so that X and Y are dependent, but their correlation is zero; they are
uncorrelated. However, in the special case when X and Y are jointly normal,
independence is equivalent to uncorrelated ness. A correlation between two variables
is diluted in the presence of measurement error around estimates of one or both
variables, in which case disattenuation provides a more accurate coefficient.
36
The correlation coefficient can also be viewed as the cosine of the angle
between the two vectors of samples drawn from the two random variables.
This method only works with centered data, i.e., data which have been shifted
by the sample mean so as to have an average of zero. Some practitioners prefer an
uncentered (non-Pearson-compliant) correlation coefficient. See the example below
for a comparison.
Note that the above data were deliberately chosen to be perfectly correlated: y = 0.10
+ 0.01 x. The Pearson correlation coefficient must therefore be exactly one.
Centering the data (shifting x by E(x) = 3.8 and y by E(y) = 0.138) yields x = (-2.8,
-1.8, -0.8, 1.2, 4.2) and y = (-0.028, -0.018, -0.008, 0.012, 0.042), from which
as expected.
37
As Cohen himself has observed, however, all such criteria are in some ways
arbitrary and should not be observed too strictly. This is because the interpretation of
a correlation coefficient depends on the context and purposes. A correlation of 0.9
may be very low if one is verifying a physical law using high-quality instruments,
but may be regarded as very high in the social sciences where there may be a greater
contribution from complicating factors
START
IMAGE ACQUISTION
FRAME SEPARATION
DIVIDE QUADRANTS
CORRELATION
NETWORK
Decisi
on DATA
RECORD
CHAPTER 6
PROPOSED OBJECT EXTRACTION
Many attempts have been made to extract data from video and film in a form
suitable for use by animators and modelers. Such an approach is attractive, since
motions and movements for people and animals may be obtained in this way that
would be difficult using mechanical or magnetic motion capture systems. Visual
extraction is also appealing since it is non-intrusive and has the potential to capture,
from film, the motion and characteristics of people or animals long dead or extinct.
Almost all attempts to perform visual extraction have been based around
bespoke computer vision applications which are difficult for non-experts to use or
adapt to their own needs. This paper presents a generic approach to extracting data
from video. Whilst our approach allows low-level information to be extracted we
show that higher-level functionality is available also. This functionality can be
utilized in a manner that requires little knowledge of the underlying techniques and
principles. Our approach is to approximate an image using principal component
analysis, and then to train a multi-layer perceptron to predict the feature required by
the user. This requires the user to hand-label the features of interest in some of the
frames of the image sequence. One of the aims of this work is to keep to a minimum
the number of frames that need to need labeled by the user. The trained multi-layer
perceptron is then used to predict features for images that have never been labeled by
the user.
Other attempts to extract useful information from video sequences include the
use of edge-detection and contour or edge tracking, template matching and template
tracking. All such systems work well in some circumstances, but fail or require
adaptation to meet the requirements of new users. For instance, in the case of
template tracking, the user needs to be aware of the kinds of features that can be
tracked well in an image and also choose a suitable template size. This is not a trivial
task for non-specialists.
40
6.1 Method
The main steps in extraction using our system are detailed below:
The user selects the sequence (or set) of images for which they wish data to be
extracted from. This may well comprise of several shorter clips taken from different
parts of a film.
The user decides what feature(s) they wish to extract and labels this feature by
hand in a fraction of the images chosen at random. The labeling process may
involve clicking on a point to be tracked, labeling a distance or ratio of distances,
measuring an angle, making a binary decision (yes/no, near/far etc.) or classifying the
feature of interest into one of several classes.
Once this ground-truth data is available, a neural network is trained to predict the
feature values in images that have not been labeled by the user.
Principal components analysis (also known as eigenvector analysis) has been used
extensively in computer vision for image reconstruction, pattern matching and
classification.
a measure of the amount of variance each of the eigen vectors accounts for.
Unfortunately, the matrix XXT is typically too large to manipulate since it is of size
M by M. Such computation is wasteful anyway since only N princi pal modes are
meaningful, where N is the number of example images. In all our work N M.
Therefore we compute:
XTXu =i ui(2)
qi =Xui (3)
REFERENCE;
[1] ' Special issue on third generation surveillance systems', froc. IEEE, 2001, 89,
JAIN, R., KASTURI, R., and SCHUNCK, B.G. This paper gives the detailks
about the surveillance systems
[2] 'Machine vision' (McGraw-Hill Inc., 1995) PONS, J., PRADES-NEBOT, J.,
ALBIOL, A,, and MOLINA, J.his paper provides the details about artificient
intelligence.
[3] 'Motion video sensor in the compressed domain'. SCS Euromedia Conf.,
Valencia, Spain, 2001, This paper provides the details about algorithms in
compressed domain.
[4] Y. Song, A perceptual approach to human motion detection and labeling. PhD
thesis, California Institute of Technology, 2003. This paper provides the details about
human motion detection
[7] S. Wachter and H.-H. Nagel, “Tracking persons in monocular image sequences,”
Computer Vision and Image Understanding, vol. 74, pp. 174–192, 1999.
This paper provides details about motion detection in image sequences.
43
44