Professional Documents
Culture Documents
Multiple Object Tracking Using Motion Vectors From Compressed Video
Multiple Object Tracking Using Motion Vectors From Compressed Video
compressed video
Abstract— Motion vectors extracted from a compressed video frame, motion vector approach requires much lower
file can be used to track objects in the video and it could be efficient computational cost. More importantly, the trajectory
as motion vectors provide trajectory information of the objects. information of motion vectors are valuable for tracking objects
However, tracking objects represented by the motion vectors can which the above methods don’t possessed.
be inaccuracy because of camera movement, small size sets of
motion vectors acting as noise, unmoving of the object and Motion vectors are determined during the motion estimation
occlusion. These are conditions in most real world video step when encoding video by matching similar blocks between
application. The system in this paper uses the statistical and successive frames. After encoding, motion vectors are used to
distributional information of motion vectors to overcome the store the video. Extracting the motion vectors without decoding
problems with three stages. 1) Frame preprocessing uses a Mode the video and use them for object tracking is the current research
reduction technique to remove unwanted motion vectors created trend. A few articles discuss the use of motion vector to track
from camera movements. 2) Intra-frame processing: k-means is multiple objects. The approaches mainly contains two steps.
used to segment and cluster moving objects. Statistical standard Object detection [4-6] which extracts the sets of motion vectors
deviation is used to extract objects’ torso and remove small size representing objects and object correspondence [7-9] which
sets of motion vectors. 3) Inter-frame processing: By comparing associates same object between consecutive frames. You et al.
the positional information between successive frames, tracking [10] uses probabilistic spatio-temporal MB filtering (PSMF) to
object in successive frames is assigned a same label. A copying rule detect objects and track the multiple objects with the color
is used to represent the stopping of the tracking object. The
information. Kas et al. [11] uses an outlier filter to smooth the
direction and velocity information of motion vector is used for the
occlusion problems. Overall, an experiment on tracking multiple
boundary of object and remove noise. For the object
basketball players demonstrates a good result of the system. correspondence, object size and moving direction is used to
associate the same object. In the system of this paper, different
Keywords—motion vectors; multiple object tracking; techniques are used and the techniques are based on applying
clustering;compressed video; statistical methods. These methods are designed to overcome the
problems including: First, camera movements, when video is
recording, generate homogenous background motion vectors
I. INTRODUCTION
which are presented in the data after the extraction from
Object tracking in video has many practical applications compressed video file. These background motion vectors need
such as video surveillance, human activity recognition and to be removed for motion segmentation. Second, the number of
robotic. While the availability of video is prevalent, compressed moving objects in a frame is unknown. This information is
domain methods are more suitable than pixel domain methods needed to group motion vectors to represent an object as a set of
as the lower computational requirement [1]. For video formats motion vectors. This system provides a method to apply k-means
like MPEG and H.264, motion vectors are used in compression with knowing approximated number of clusters. Third, when
to reduce the transmission and storage requirement [2]. They tracking a person often the movement of the torso is interested.
record the positional information which is indicating a pixel’s But other movement of the other parts such as hands will distract
source coordinates and destination coordinates between two the tracking region. The distributional information of the
consecutive frames. It is also knew as optical flow. These motion. Lastly, object occlusion is a changeling problem for
coordinates are useful for object tracking as they provide object tracking task and this system provides a method to
trajectory information of moving objects in the scene and the differentiate the front object and the back object by the direction
generated attributes of the objects such as the direction of and the velocity of motion vectors to make sure correct labelling.
travelling and the velocity. There are other advantages of using
motion vectors. In contrast to the popular object tacking methods The system is designed for multiple object tracking under
[3] which use a target shape model, there is no need to predefine those problems. Three stages are performed to track multiple
any target shape template as an object is represented by a set of target objects. The first stage is frame preprocessing to remove
motion vectors. Comparing to the feature based methods which unwanted motion vectors generated by camera movements with
segment object by edge detection on pixel domain in every a Mode reduction technique. The second stage is to use k-means
extract objects' torus. The third stage associates the same target
between successive frames with a same label by using the
positional information. Also, a copying rule is defined to = ( − ) + ( − ) (3)
represent the stopping of the tracking object. Finally, the
occlusion problem between two objects is solved by using the
direction and the velocity of motion vectors. In statistic, a Mode is the value that occurs most often in a
dataset within a dimension. By iteratively removing the motion
II. FRAME PREPROCESSING vectors of the Mode in the dimension of (2) until reaching a
threshold, the background motion vectors are removed. The
After extracting the motion vectors from a compressed video threshold can be estimated as the number of motion vectors on
using FFMPEG [12], they are imported into MATLAB. Each the players. When removing the background motion vectors in
frame is stored as a cell array and they contain motion vectors. each iteration, some motion vectors on the players maybe have
So, each cell array contains a 4 column matrix: same direction and different velocity to the background motion
vectors. To avoid removing these motion vectors, the Mode of
the background motion vectors in the dimension of (3) is also
(1) considered.
⋮ ⋮ ⋮ ⋮
First frame
The second step is to detect the stopping objects. In the case Occlusion frame
that there is no clusters having high enough number of
overlapping motion vectors, it indicates there is no cluster in
next frame is overlapped with the tracking object. In other
words, the tracking object is stop moving. Then the system copy
the coordinates and of the tracking object into next frame.
These coordinates represent the tracking object in next frame.
The last step is to deal with the occlusion problem. If a
cluster, , is assigning with two labels, it indicates there are two
tracking objects are in occlusion. is labelled with one of the
two labels of the two objects. This is determined by the
similarity between and the tracking objects. The object having
higher similarity in term of direction and velocity will be the
front object and is assigned with the same label as the object. Last frame
On the other hand, the back object’s label is recoded and it will
be assigned to a cluster when there is a less similar cluster close
Fig. 6. Tracking players with occulsion.
to . This approach make sure the front object is always having
correct label.
In the tracking of two players when occlusion occurs, there Future work will be developing techniques for the more
are 2 cases fail. In one case, both players are loss. In another, one difficult occlusion situations such as more than two objects are
player is loss. There are total 10 players are tracked. So the in occlusion. Use of color information may improve the result.
accurate rate is 70%. In the tracking of three players, 4 out of 15 Using k-means with approximated number of clusters may be
players are loss. So the accurate rate is 73.33%. From both the updated with a more advance method from statistics to
situation, the main cause for the trajectory loss is the occlusion. automatically determine the number of clusters. This makes the
application easier to use. The quality of the motion vectors
extracted from compression video is affected by the
compression formats such as MPEG and H.264. The parameters
such as the size of motion blocks, compression rates and
compression speed are to be considered in the future work.
REFERENCES
[1] Babu, R., Tom, V., & Wadekar, M. (2016). A survey on compressed
domain video analysis techniques. Multimedia Tools and Applications,
75(2), 1043-1078.
First frame [2] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, "Overview of
the H.264|AVC Video Coding Standard," IEEE Trans. Circuits Syst.
Video Technol., vol. 13, No. 7, pp. 560–576, July 2003.
[3] Trucco, E., & Plakas, K. (2006). Video Tracking: A Concise Survey.
Oceanic Engineering, IEEE Journal of, 31(2), 520-529.
[4] Babu, R., Ramakrishnan, K., & Srinivasan, S. (2004). Video object
segmentation: A compressed domain approach. Circuits and Systems for
Video Technology, IEEE Transactions on, 14(4), 462-474.
[5] Xiao-Dong Yu, Ling-Yu Duan, & Qi Tian. (2003). Robust moving video
object segmentation in the MPEG compressed domain. Image Processing,
Last frame 2003. ICIP 2003. Proceedings. 2003 International Conference on, 3, III-
933.
[6] Fei, W., & Zhu, S. (2010). Mean shift clustering-based moving object
Fig. 7. Tracking multiple players. segmentation in the h.264 compressed domain. IET Image Processing,
4(1), 11-18.
[7] Favalli, L., Mecocci, A., & Moschetti, F. (2000). Object tracking for
VI. CONCLUSION retrieval applications in MPEG-2. Circuits and Systems for Video
In this paper, it is demonstrated that tracking multiple objects Technology, IEEE Transactions on, 10(3), 427-432.
can be efficient by using motion vectors extracted from [8] Yokoyama, T., Iwasaki, T., & Watanabe, T. (2009). Motion Vector Based
Moving Object Detection and Tracking in the MPEG Compressed
compressed video. The system applies statistical and clustering Domain. Content-Based Multimedia Indexing, 2009. CBMI '09. Seventh
techniques on motion vectors to track multiple objects in real International Workshop on, 201-206.
video. This approach allows effective use of the positional, [9] Lan Dong, Zoghlami, & Schwartz. (2006). Object Tracking in
distributional and directional information of the motion vectors. Compressed Video with Confidence Measures. Multimedia and Expo,
2006 IEEE International Conference on, 753-756.
To overcome the problems of using motion vector for [10] You, W., Sabirin, M., & Kim, M. (2012). Real-time detection and tracking
tracking multiple objects, the system performs three stages on of multiple objects with partial decoding in H.264/AVC bitstream
the data. The preprocessing stage uses a Mode reduction domain. Proceedings of SPIE 2009, Volume: 7244, Publisher: SPIE,
technique to remove background motion vectors created by Pages: 72440D-72440D-12.
camera movements. The intra frame processing uses k-means to [11] Käs, C., & Nicolas, H. (2009). An approach to trajectory estimation of
cluster the motion vectors for the representation of the moving moving objects in the H.264 compressed domain. Lecture Notes in
Computer Science (including Subseries Lecture Notes in Artificial
objects. To extract object’s torus as the tracking area, statistical Intelligence and Lecture Notes in Bioinformatics), 5414, 318-329.
standard deviation is used to remove unwanted motion vectors.
[12] FFmpeg: Documentation. Retrieved from
The inter frame processing uses the positional information to https://www.ffmpeg.org/documentation.html
associate the same target between successive frames with a same [13] MacQueen, J. B. (1967). Some Methods for classification and Analysis of
label. The occlusion problem is solved by using the direction and Multivariate Observations. Proceedings of 5th Berkeley Symposium on
velocity information. In the experiment of tracking multiple Mathematical Statistics and Probability. University of California Press.
basketball player, the system performs well. pp. 281–297.
[14] [NBA]. (2016). Best Highlights From the 2017 NBA Playoffs | 1st Round.
Retrieved from https://www.youtube.com/watch?v=yGnKHxNVibw