This document summarizes a research paper that proposes a method for recognizing objects in compressed images without needing to decompress the images first. It involves developing a 3D image compression algorithm that retains information relevant for recognition like edge locations and color. This would allow tasks like vehicle counting, speed detection and matching to be performed directly on the compressed traffic monitoring images as they are transmitted through a hierarchical network, improving efficiency. The method acquires 3D information from 3 cameras, compresses it while coding contours, distances, colors and more, and recognizes objects in the compressed domain for applications like traffic monitoring.
This document summarizes a research paper that proposes a method for recognizing objects in compressed images without needing to decompress the images first. It involves developing a 3D image compression algorithm that retains information relevant for recognition like edge locations and color. This would allow tasks like vehicle counting, speed detection and matching to be performed directly on the compressed traffic monitoring images as they are transmitted through a hierarchical network, improving efficiency. The method acquires 3D information from 3 cameras, compresses it while coding contours, distances, colors and more, and recognizes objects in the compressed domain for applications like traffic monitoring.
This document summarizes a research paper that proposes a method for recognizing objects in compressed images without needing to decompress the images first. It involves developing a 3D image compression algorithm that retains information relevant for recognition like edge locations and color. This would allow tasks like vehicle counting, speed detection and matching to be performed directly on the compressed traffic monitoring images as they are transmitted through a hierarchical network, improving efficiency. The method acquires 3D information from 3 cameras, compresses it while coding contours, distances, colors and more, and recognizes objects in the compressed domain for applications like traffic monitoring.
Dearborn (MI), USA October 3-5, 2000 Recognition of 3D Compressed Images and Its Traffic Monitoring Applications Nicole S. Love, Ichiro Masaki*, Berthold K.P. Horn Massachusetts Institute of Technology Artificial Intelligence Laboratories 545 Technology Square, Cambridge, MA 02 139 nsl@ai.mit.edu, bkph@ai.mit.edu *Massachusetts Institute of Technology Microsystems Technology Laboratories Intelligent Transportation Research Center 50 Vassar Street, Cambridge, MA 02139 masaki @ mi t .edu Abstract In a digital image network for trafJic monitoring a large number of cameras are connected to control ten- ters through a hierarchical network. Compressed image data and recognition results are transmitted over the net- work. with conventional approaches, each control ten- ter receives compressed image data along with prelimi- nary recognition results from low level control centers or surveillance cameras. Each center needs to decompress image data for further recognition processing, and if nec- essary the center sends the compressed image data and recognition results to the upper-level control centeer: In order to increase the cost-efJiciency of the digital image network, we propose eliminating the decompres- sion required at each center by developing a recognition method which works in the compressed domain. The main stream of conventional image compression methods such as Discrete Cosine Transfonn is based on spatial frequency which makes it difJicult to carry out recognition processes in the compressed domain. In contrast, we will compress the image data by using attributes which are relevant both for compression and recognition. Examples of the common attributes are binary edge locations and the color i nf om- tion surrounding the edge. This and other information is retained in the compression domain to enable recognition without decompression. 1 Introduction Until recently, image analysis and image compression have been studied independently. Over the past few years the interest in combining the two has increased. Re- searchers are beginning to see the benefits of analyzing compressed images without decompression. A few of the advantages of combining image analysis and image com- pression techniques include a decrease in memory usage, a decrease in processing time and an increased efficiency in database retrieval. Many applications rely on efficient image compression schemes. Video conferencing, image databases, surveil- lance, and image networks are examples of such applica- tions. These applications require the compression of im- ages for efficiency. In some cases, image quality, and/or data that can be extracted from the images is important. The need for image compression and analysis remains key to these applications. This paper demonstrates recognition of 3D compressed images illustrated on traffic monitoring applications. A traffic monitoring network which relies on image compression schemes may consist of a hierarchical image network as shown in Figure 1. This network utilizes real- time image processing to count vehicles, and determines vehicle speeds and vehicle density. These networks also transmit image data to higher levels in the network for ac- cident or traffic flow analysis. A traffic monitoring system which saves on processing time is essential for efficient op- eration of the system. The goal of this research is to de- velop a 3D compression algorithm which allows recogni- tion from compressed image data without decompression. 2 Recognition of Compressed Image without Decompression Three applications are being developed for recognition of an image in the 3D compressed domain. These applica- tions are vehicle counting, vehicle speed detection and ve- hicle matching. Vehicle counting requires object detection and tracking. Vehicle speed detection requires object mo- 0-7803-6363-91001$10.00 0 2000 IEEE 463 Figure 1 : Traffic Monitoring System Architecture Acquisition Acquisition t t Feature Extraction Feature Extraction tion estimation. Vehicle matching utilizes template match- ing and scaling of the images. These applications, though specific to traffic monitoring, demonstrate the capabilities of the system and can be used on any system requiring recognition of 3D compressed images. Acquisition t Feature Extraction \- Figure 2: System Overview 3 Three-Dimensional Image Acquisition The system consists of three modules: 3D image acqui- sition, compression, and recognition. Figure 2 shows the overview of the system. The 3D image acquisition module receives three input images and produces the edge depth map of the center image. The compression module uses the edge depth map, edge map and the center image to com- press the attributes of the image. The recognition unit uses the 3D compressed data to determine the vehicle count, ve- hicle speed or vehicle matching without decompressing the image. The results can be stored into memory or be trans- mitted over the network to higher level control centers. The 3D compression algorithm begins with a 3D image acquisition. 3D information is used to distinguish objects in a scene. Conventional 2D compression algorithms are not conducive to detection of partially occluded objects, but the 3D information along with color allows the system to detect partially occluded objects. Stereo vision algorithms have used two or more cam- eras. The two camera approach produces correspondence Figure 3: Flow chart. errors. Our system uses three cameras. The third camera is used to reduce the number of missed correspondences. Additionally, our system [ 11uses feature correlation rather than area correlation as in Kanade's system [3]. Feature correlation is faster due to the decrease in data and the decrease in calculations in the algorithm. The 3D image acquisition is based on a trinocular vision algorithm [2] which produces an edge depth map of the center image. Figure 3 is a flow chart of the algorithm. The images are acquired simultaneously from all three cameras. The cam- eras are equally spaced, with their optical axes aligned. Figure 4 shows a sample of three images taken by the cam- eras. The first step is to generate the vertical edge gradient Figure 4: Sample input triple. for each image. Edge points in the left and right images are matched using the center camera to help eliminate false 464 correspondences. using The depth may be calculated directly for each disparity f 4- 4 z=b- where b is the length of the baseline (distance between left and right camera), f is the focal length, and 4 and 4 are the x-coordinates in the left and right images respectively. Figure 6 displays a histogram of the number of edge points versus depth that can be constructed after bin averaging. The significant peaks correspond to objects in the original image. The three peaks correspond to the sign on the left, the vehicle in the closest lane and the parked vehicle which is partially occluded. 4 Compression The goal of the compression algorithm is two-fold; (I ) to compress the data as much as possible without los- ing any of the relevant information, and (2) to provide a representation of the data which is conducive to ob- ject recognition. The decision was made to use a contour based lossy compression method based on Mizukis algo- rithm [8]. Mizuki proposed a 2D compression algorithm, which has been extended to 3D in this research. The algo- rithm produces a high compression ratio and enhances the recognition component of the system. Marshall [6] uses a contour based compression method for shape recognition. Marshalls work assumes that a con- tour with length larger than some threshold is an object. This assumption creates a problem if the contour of an ob- ject is disconnected. An object with 2 or more long con- tours would be considered to be more than one object. The differences between Marshalls algorithm and this com- pression algorithm is the addition of three-dimensional data and the allowance of broken contours. Although Mar- shalls method has problems with overlapping objects and objects consisting of several contours, his work demon- strates the potential of a contour based compression do- main used for shape recognition. Many image processing schemes use the edge infor- mation as a guide in object recognition, for this reason a contour based algorithm was used [7]. The contours provide a skeleton of the contents of the image which is used for recognition of objects as well as compression. The algorithm focuses on retaining information relevant to recognition, such as contour, color, and distance attributes. A flow chart of the steps are shown in Figure 5. A description of the components are as follows: Depth Map The depth map contains the distance of edge pixels from the center camera. The method of deter- mining the distance is described in Section 3. Contour Coding The contours are determined by tracing edges with similar color and distance information. The contours are then coded by using the start loca- tion and the directional codes for subsequent points in the contour. Short contours are eliminated. Meancoding The mean RGB value of pixels between contours is coded. Distance Coding The mean distance value of contour pix- els is coded. Color Extraction RGB values for pixels between con- tours is determined and a line approximation is used to represent the information. Color Coding The endpoints of the line approximating the RGB values for pixels between contours is coded. Binary Block Matching The edge maps from two con- secutive images are used to determine the motion vec- tors of each n x n block. Error Coding The reconstructed image from the motion vectors is compared with the second image to produce an error which is encoded. . Motion 1 vectors -- center U- ! RecOnstruct 1 Compressed age Concatenation Figure 5: Compression Algorithm The algorithm combines contour, color, and distance at- tributes to produce an image coding system which can be used for recognition. 5 Recognition The transportation industry has always been concerned with the efficient and safe movement of traffic. The data gathered from traffic monitoring is used to make real-time traffic control decisions which affect the movement and 465 safety of traffic. This research is geared towards high- way monitoring to demonstrate the feasibility and bene- fits of recognition of 3D compressed images. Current traf- fic monitoring systems have problems with shadows and overlapping vehicles [4], [ 5] . The 3D data will provide the necessary information to eliminate the problems of shad- ows and overlapping vehicles. Three applications (vehicle counting, vehicle speed detection and vehicle matching) will demonstrate the feasibility of the system. All three applications require the detection of vehicles in the image. Vehicle counting and vehicle speed detection require the tracking of vehicles in subsequent scenes. Vehicle speed detection utilizes motion estimates to determine the vehicle speeds. The following sections discuss the significant com- ponents for recognition: detection of vehicles in a scene, tracking of vehicles in subsequent scenes, and motion esti- mations of vehicles. 5.1 Detection of Vehicles Vehicle counting, speed detection and vehicle match- ing rely on the detection of vehicles in a scene. Detection methods utilizing color, edge and depth maps still apply in the 3D compressed domain. The 3D compressed im- age retains these attributes which allows for object detec- tion. Each vehiclein the scene consists of several contours. In detecting the vehicle within the 3D compressed data, a histogram of the number of edges at given distances can be constructed and used to classify the vehicles. Figure 6 shows the histogram of the edges from the image used in Figure 4. Based on the histogram the edges can be classi- fied, distinguishing the objects in the scene. Figure 6 shows an example of the detection of vehicles using only the dis- tance and proximity information. In vehicle matching, the color can be used in the detec- tion of a vehicle. The probability of finding the same car in different images can increase if the color is used as an in- dicator. Database searches also rely on the color of objects in the images. In the compression method, color attributes are retained and can be used to detect vehicles. A search of the compressed image database may be initiated for a par- ticular color vehicle. The correct image(s) and the location of the vehicle(s) in the images can be determined from the color information. Figure 7 displays the result of searching for a yellow car in the compressed image. Using both dis- tance and color to detect vehicles in an image or database is extremely useful for traffic monitoring applications. 5.2 'hacking Vehicles in Subsequent Images Once a vehicle has been detected, vehicle counting and speed detection require tracking of the vehicle through sub- sequent images. Tracking of the vehicle can be determined assuming only translational motion of the contours. The f 160 a0 40 1 Figure 6: Histogram and detected objects Figure 7: Detection of yellow car from compressed image matching of contours is based on color, shape, and re- stricted locations of the vehicle. Groups of contours are assigned to a vehicle by the de- tection component. Tracking the vehicle will consist of matching the overall color of the vehicle or if necessary the color distribution of the contours. Also matching the shape or more precisely the grouping of the contours which make up the car is important in ensuring the correct vehicle has been located. Another key component is the method to search for the corresponding vehicle. Specifically, the difficulty arises in deciding where to search. Limiting the search of vehicles and checking the most likely positions first can increase the performance and decrease the pro- cessing time of the system. 466 5.3 Speed Estimation Once a vehicle has been detected and tracked through subsequent images, an estimate of the speed can be deter- mined using the three-dimensional data. The location of the vehicle can be determined in the 3D domain from the calculation of the depth map. The tracking and frame speed is enough information to estimate the speed of the vehi- cle through subsequent images. To estimate the speed, the x,y, and z component of the average velocity is calculated and the magnitude of the average velocity is taken as the estimated speed of the vehicle. All velocity components follow this basic form: where Xi is the x-coordinate in the real world of a point in the izh frame, j is the difference between the two frames used to calculate the average speed, and F, is the frame rate. The magnitude of the velocity is The speed of the vehicle will be calculated by taking the average of all points with known estimated speed for the particular vehicle. The motion vectors can be used to esti- mate the speed once the vehicles have been detected. 6 Conclusion This research focuses on recognition of 3D compressed images without decompression. Traffic monitoring ap- plications are developed to demonstrate the benefits of combining image processing and image compression tech- niques to produce an efficient system which can be used for image networking systems and dynamic route guid- ance. Any applications requiring the transmission, storage and processing of images can benefit in reduced usage of memory and processing time by combining processing and compression. The combination of compression and pro- cessing is a natural extension of image compression, utiliz- ing its attributes and increasing its benefits. The increasing interest in this combination may lead to new image com- pression standards which will focus on the capability of recognition without decompression as well as compression performance. References [ 11J. Bergendahl, A Computationally Efficient Stereo Vision Algorithmfor AdaptiveCruise Control, MIT Master's The- sis, May 1997. [2] J. Bergendahl, 1. Masaki, B.K.P. Hom, Three-camerastereo vision for intelligent transportation systems, Proceedings of SPl Es Photonics East '96 Symposium, Boston, MA, November 18-22, 1996. 467 U1 [41 T. Kanade, A Stereo Machinefor Video-RateDenseDepth Mapping and Its NewApplications, Proc. ARPA l mg e Un- derstanding Workshop, pp. 805-814, PalmSprings, 1996. N. Kehtamavaz, C. Huang, T. Urbanik, Video ImageSens- ing for a Smart Controller at Diamond Interchanges, Pro- ceedings of the 1995 Annual Meeting of ITS America, pp. 447-451, March1995. [5] J . Malik, S. Russell, J. Weber, T. Huang, and D. Koller. A MachineVision Based Surveillance Systemfor California Roads, PATHproject MOU-83 Final Report, 1995. [6] S . Marshall. Application of ImageContours to Three As- pects of ImageProcessing: Compression, ShapeRecog- nition andStereopsis, IEE Proceedings-I Communications Speech & Vision, Vol. 139, No. 1, pp. 1-8, Feb. 1992. [7] I. Masaki. Industrial Vision Systems Based onApplication- Specific IC Chips, IEICE Transactions, Vol. E 74, NO. 6, June1991. [8] M. M. Mizuki, EdgeBased Video ImageCompression for Low Bit RateApplications, b4.S.E.E Thesis, MIT, Cam- bridge, September 1996.