Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

1. INTRODUCTION Object tracking is an important task within the field of computer vision.

The proliferation of high-powered computers, the availability of high quality and inexpensive video cameras, and the increasing need for automated video analysis has generated a great deal of interest in object tracking algorithms. There are three key steps in video analysis: detection of interesting moving objects, tracking of such objects from frame to frame, and analysis of object tracks to recognize their behavior. Therefore, the use of object tracking is pertinent in the tasks of: motion-based recognition, that is, human identification based on gait, automatic object detection, etc; automated surveillance, that is, monitoring a scene to detect suspicious activities or unlikely events; video indexing, that is, automatic annotation and retrieval of the videos in multimedia databases; 2. OBJECT REPRESENTATION In a tracking scenario, an object can be defined as anything that is of interest for further analysis. For instance, boats on the sea, fish inside an aquarium, vehicles on a road, planes in the air, people walking on a road, or bubbles in the water are a set of objects that may be important to track in a specific domain. Objects can be represented by their shapes and appearances. In this section, we will first describe the object shape representations commonly employed for tracking and then address the joint shape and appearance representations. Points. The object is represented by a point, that is, the centroid (Figure 1(a)) [Veenman et al. 2001] or by a set of points (Figure 1(b)) [Serby et al. 2004]. In general, the point representation is suitable for tracking objects that occupy small regions

3. FEATURE SELECTION FOR TRACKING Selecting the right features plays a critical role in tracking. In general, the most desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space. Feature selection is closely related to the object representation. For example, color is used as a feature for histogram-based appearance representations, while for contour-based representation, object edges are usually used as features. In general, many tracking algorithms use a combination of these features. The details of common visual features are as follows. Color. The apparent color of an object is influenced primarily by two physical factors, 1) the spectral power distribution of the illuminant and 2) the surface reflectance properties of the object. In image processing, the RGB (red, green, blue) color space is usually used to represent color. However, the RGB space is not a perceptually uniform color space, that is, the differences between the colors in the RGB space do not correspond to the color differences perceived by humans [Paschos 2001]. Additionally, the RGB dimensions are highly correlated. In contrast, Luv and Lab are perceptually uniform color spaces, while HSV (Hue, Saturation, Value) is an approximately uniform color space. However, these color spaces are sensitive to noise 4. OBJECT DETECTION Every tracking method requires an object detection mechanism either in every frame or when the object first appears in the video. A common approach for object detection is to use information in a single frame. However, some object detection methods make use of the temporal information computed from a sequence of frames to reduce the number of false detections. This temporal information is usually in the form of frame differencing, which highlights changing regions in consecutive frames. Given the object regions in the image, it is then the trackers task to perform object correspondence from one frame to the next to generate the tracks. We tabulate several common object detection methods in Table I. Although the object detection itself requires a survey of its own, here we outline the popular methods in the context of object tracking for the sake of completeness. 4.1. Point Detectors Point detectors are used to find interest points in images which have an expressive texture in their respective localities. Interest points have been long used in the context of motion, stereo, and tracking problems. A desirable quality of an interest point is its invariance to changes in illumination and camera viewpoint. In the literature, commonly used interest point detectors include Moravecs interest operator [Moravec 1979], Harris interest point detector [Harris and Stephens 1988], KLT detector [Shi and Tomasi 1994], and SIFT detector [Lowe 2004]. For a comparative evaluation of interest point detectors, we refer the reader to the survey by Mikolajczyk and Schmid [2003]. To find interest points, Moravecs operator computes the variation of the image intensities in a 4 4 patch in the horizontal, vertical, diagonal, and antidiagonal directions and selects the minimum of the four variations as representative values for the window.

A point is declared interesting if the intensity variation is a local maximum in a 12 12 patch. ACM Computing Surveys, Vol. 38, No. 4, Article 13, Publication date: December 2006. 8 A. Yilmaz et al. Fig. 2. Interest points detected by applying (a) the Harris, (b) the KLT, and (c) SIFT operators. The Harris detector computes the first order image derivatives, (Ix , I y ), in x and y directions to highlight the directional intensity variations, then a second moment matrix, which encodes this variation, is evaluated for each pixel in a small neighborhood: M =__I 2x_Ix I y _Ix I y_I 2y_. 5. OBJECT TRACKING The aim of an object tracker is to generate the trajectory of an object over time by locating its position in every frame of the video. Object tracker may also provide the complete region in the image that is occupied by the object at every time instant. The tasks of detecting the object and establishing correspondence between the object instances across frames can either be performed separately or jointly. In the first case, possible object regions in every frame are obtained by means of an object detection algorithm, and then the tracker corresponds objects across frames

Proximity assumes the location of the object would not change notably from one frame to other (see Figure 10(a)).

5.2.3. Discussion. The main goal of the trackers in this category is to estimate the object motion.With the region-based object representation, computed motion implicitly defines the object region as well as the object orientation in the next frame since, for each point of the object in the current frame, its location in the next frame can be determined using the estimated motion model. Depending on the context in which these trackers are being used, only one of these three properties might be more important 8. CONCLUDING REMARKS In this article, we present an extensive survey of object tracking methods and also give a brief review of related topics. We divide the tracking methods into three categories based on the use of object representations, namely, methods establishing point correspondence, methods using primitive geometric models, and methods using contour evolution. Note that all these classes require object detection at some point. For instance, the point trackers require detection in every frame, whereas geometric region or contours-based trackers require detection only when the object first appears in the scene.

Active Contour: Abstract Because the Active Appearance Model (AAM) is sensitive to the initial parameters, it is difficult to track an object that shows a large motion. This paper proposes an active contour combined Active Appearance Model that can track an object whose motion is large. The proposed AAM fitting algorithm consists of two alternating procedures: active contour fitting to find the contour sample that best fits the face image and then the active appearance model fitting that begins from the estimated motion parameters. Experimental results show that the proposed active contour combined AAM provides better accuracy and convergence characteristics in terms of RMS error and convergence rate than the existing AAM. The combination of the existing robust AAM and the proposed active contour based AAM (AC-R-AAM) had the best accuracy and convergence performances.

Image aprocessing Colour Adjustment:

In photography and image processing, color balance is the global adjustment of the intensities of the colors (typically red, green, and blue primary colors). An important goal of this adjustment is to render specific colors particularly neutral colors correctly; hence, the general method is sometimes called gray balance, neutral balance, or white balance. Color balance changes the overall mixture of colors in an image and is used for color correction; generalized versions of color balance are used to get colors other than neutrals to also appear correct or pleasing. Image data acquired by sensors either film or electronic image sensors must be transformed from the acquired values to new values that are appropriate for color reproduction or display. Several aspects of the acquisition and display process make such color correction essential including the fact that the acquisition sensors do not match the sensors in the human eye, that the properties of the display medium must be accounted for, and that the ambient viewing conditions of the acquisition differ from the display viewing conditions. The color balance operations in popular image editing applications usually operate directly on the red, green, and blue channel pixel values,[1][2] without respect to any color sensing or reproduction model. In shooting film, color balance is typically achieved by using color correction filters over the lights or on the camera lens

YcBcR:
YCbCr or YCbCr, sometimes written YCBCR or YCBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y (with prime) is distinguished from Y which is luminance, meaning that light intensity is non-linearly encoded using gamma correction. YCbCr is not an absolute color space, it is a way of encoding RGB information. The actual color displayed depends on the actual RGB primaries used to display the signal. Therefore a value expressed as YCbCr is predictable only if standard RGB primary chromaticities are use YCbCr is sometimes abbreviated to YCC. YCbCr is often called YPbPr when used for analog component video, although the term YCbCr is commonly used for both systems, with or without the prime. YCbCr is often confused with the YUV color space, and typically the terms YCbCr and YUV are used interchangeably, leading to some confusion; when referring to signals in video or digital form, the term "YUV" mostly means "YCbCr". YCbCr signals (prior to scaling and offsets to place the signals into digital form) are called YPbPr, and are created from the corresponding gamma-adjusted RGB (red, green and blue) source using two defined constants KB and KR as follows:

where KB and KR are ordinarily derived from the definition of the corresponding RGB space. (The equCbCr Planes at different Y values ivalent matrix manipulation is often referred to as the "color matrix".)

Digital Image Capture and Processing Section Overview: The Olympus MIC-D inverted digital microscope captures images with a complementary metal oxide semiconductor (CMOS) image sensor housed in the base and coupled to a host computer for acquisition, cataloging and processing of digital images

Basic Propeeties Of Digital Image Addressed in this discussion is the acquisition, sampling, and quantization of digital images as well as other important concepts, such as spatial resolution and dynamic range.

Basic Concepts in Digital Image Processing


Digital image processing enables a reversible noise-free modification of an image in the form of a matrix of integers instead of the classical darkroom manipulations or filtration of time-dependent voltages for images and video signals.

Filtering an Image Image filtering is useful for many applications, including smoothing, sharpening, removing noise, and edge detection. A filter is defined by a kernel, which is a small array applied to each pixel and its neighbors within an image. In most applications, the center of the kernel is aligned with the current pixel, and is a square with an odd number (3, 5, 7, etc.) of elements in each dimension. The process used to apply filters to an image is known as convolution, and may be applied in either the spatial or frequency domain. The following examples in this section will focus on some of the basic filters applied within the spatial domain using the CONVOL function:

Low Pass Filtering High Pass Filtering Directional Filtering Laplacian Filtering

Low Pass Filtering A low pass filter is the basis for most smoothing methods. An image is smoothed by decreasing the disparity between pixel values by averaging nearby pixels (see Smoothing an Image for more information). Using a low pass filter tends to retain the low frequency information within an image while reducing the high frequency information. An example is an array of ones divided by the number of elements within the kernel, such as the following 3 by 3 kernel:

High Pass Filtering A high pass filter is the basis for most sharpening methods. An image is sharpened when contrast is enhanced between adjoining areas with little variation in brightness or darkness (see Sharpening an Image for more detailed information). A high pass filter tends to retain the high frequency information within an image while reducing the low frequency information. The kernel of the high pass filter is designed to increase the brightness of the center pixel relative to neighboring pixels. The kernel array usually contains a single positive value at its center, which is completely surrounded by negative values. The following array is an example of a 3 by 3 kernel for a high pass filter:

Directional Filtering A directional filter forms the basis for some edge detection methods. An edge within an image is visible when a large change (a steep gradient) occurs between adjacent pixel values. This change in values is measured by the first derivatives (often referred to as slopes) of an image. Directional filters can be used to compute the first derivatives of an image (see Detecting Edges for more information on edge detection). Directional filters can be designed for any direction within a given space. For images, xand y-directional filters are commonly used to compute derivatives in their respective directions. The following array is an example of a 3 by 3 kernel for an x-directional filter (the kernel for the y-direction is the transpose of this kernel):

Laplacian Filtering A Laplacian filter forms another basis for edge detection methods. A Laplacian filter can be used to compute the second derivatives of an image, which measure the rate at which the first derivatives change. This helps to determine if a change in adjacent pixel values is an edge or a continuous progression (see Detecting Edges for more information on edge detection). Kernels of Laplacian filters usually contain negative values in a cross pattern (similar to a plus sign), which is centered within the array. The corners are either zero or positive values. The center value can be either negative or positive. The following array is an example of a 3 by 3 kernel for a Laplacian filter:

Spatial Filtering: 1. Introduction Digital holography is a new imaging technique that uses a charged-coupled device ~CCD! camera for hologram recording and a numerical method for hologram reconstruction. In comparison with classical holography, which employs photographic plates as recording media, the major advantage of digital holography is that chemical processing of the hologram is suppressed, thus adding more flexibility and speed to the holographic process. Advances in computer performance and electronic image acquisition devices have made digital holography an attractive option for many applications

Dilation and Erosion Morphology is a broad set of image processing operations that process images based on shapes. Morphological operations apply a structuring element to an input image, creating an output image of the same size. In a morphological operation, the value of each pixel in the output image is based on a comparison of the corresponding pixel in the input image with its neighbors. By choosing the size and shape of the neighborhood, you can construct a morphological operation that is sensitive to specific shapes in the input image. The most basic morphological operations are dilation and erosion. Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels on object boundaries.

The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element used to process the image. In the morphological dilation and erosion operations, the state of any given pixel in the output image is determined by applying a rule to the corresponding pixel and its neighbors in the input image. The rule used to process the pixels defines the operation as a dilation or an erosion. This table lists the rules for both dilation and erosion. Rules for Dilation and Erosion Operation Dilation Rule The value of the output pixel is the maximum value of all the pixels in the input pixel's neighborhood. In a binary image, if any of the pixels is set to the value 1, the output pixel is set to 1. The value of the output pixel is the minimum value of all the pixels in the input pixel's neighborhood. In a binary image, if any of the pixels is set to 0, the output pixel is set to 0.

Erosion

Dilation of a Binary Image

The following figure illustrates this processing for a grayscale image. The figure shows the processing of a particular pixel in the input image. Note how the function applies the rule to the input pixel's neighborhood and uses the highest value of all the pixels in the neighborhood as the value of the corresponding pixel in the output image. Morphological Dilation of a Grayscale Image

Combining Dilation and Erosion Dilation and erosion are often used in combination to implement image processing operations. For example, the definition of a morphological opening of an image is an erosion followed by a dilation, using the same structuring element for both operations. The related operation, morphological closing of an image, is the reverse: it consists of dilation followed by an erosion with the same structuring element. The following section uses imdilate and imerode to illustrate how to implement a morphological opening. Note, however, that the toolbox already includes the imopen function, which performs this processing. The toolbox includes functions that perform many common morphological operations Dilation- and Erosion-Based Functions This section describes two common image processing operations that are based on dilation and erosion:

Skeletonization Perimeter determination

This table lists other functions in the toolbox that perform common morphological operations that are based on dilation and erosion. For more information about these functions, see their reference pages. Dilation- and Erosion-Based Functions Function Morphological Definition

bwhitmiss Logical AND of an image, eroded with one structuring element, and the image's complement, eroded with a second structuring element. imbothat imclose imopen imtophat Subtracts the original image from a morphologically closed version of the image. Can be used to find intensity troughs in an image. Dilates an image and then erodes the dilated image using the same structuring element for both operations. Erodes an image and then dilates the eroded image using the same structuring element for both operations. Subtracts a morphologically opened image from the original image. Can be used to enhance contrast in an image.

Thresholding (image processing) Thresholding is the simplest method of image segmentation. From a grayscale image, thresholding can be used to create binary images.

Method During the thresholding process, individual pixels in an image are marked as "object" pixels if their value is greater than some threshold value (assuming an object to be brighter than the background) and as "background" pixels otherwise. This convention is known as threshold above. Variants include threshold below, which is opposite of threshold above; threshold inside, where a pixel is labeled "object" if its value is between two thresholds; and threshold outside, which is the opposite of threshold inside (Shapiro, et al. 2001:83). Typically, an object pixel is given a value of 1 while a background pixel is given a value of 0. Finally, a binary image is created by coloring each pixel white or black, depending on a pixel's labels. [edit] Threshold selection The key parameter in the thresholding process is the choice of the threshold value (or values, as mentioned earlier). Several different methods for choosing a threshold exist;

users can manually choose a threshold value, or a thresholding algorithm can compute a value automatically, which is known as automatic thresholding One method that is relatively simple, does not require much specific knowledge of the image, and is robust against image noise, is the following iterative method: 1. An initial threshold (T) is chosen, this can be done randomly or according to any other method desired. 2. The image is segmented into object and background pixels as described above, creating two sets: 1. G1 = {f(m,n):f(m,n)>T} (object pixels) 2. G2 = {f(m,n):f(m,n) T} (background pixels) (note, f(m,n) is the value of the pixel located in the mth column, nth row) 3. The average of each set is computed. 1. m1 = average value of G1 2. m2 = average value of G2 4. A new threshold is created that is the average of m1 and m2 1. T = (m1 + m2)/2 5. Go back to step two, now using the new threshold computed in step four, keep repeating until the new threshold matches the one before it (i.e. until convergence has been reached). Multiband thresholding Colour images can also be thresholded. One approach is to designate a separate threshold for each of the RGB components of the image and then combine them with an AND operation. This reflects the way the camera works and how the data is stored in the computer, but it does not correspond to the way that people recognize colour. Therefore, the HSL and HSV colour models are more often used. It is also possible to use the CMYK colour model.

You might also like