Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Multiple object tracking using motion vectors from

compressed video

Weisheng Li David Powers


School of Computer Science, Engineering and Mathematics School of Computer Science, Engineering and Mathematics
Flinders University Flinders University
Adelaide, Australia Adelaide, Australia
li0655@flinders.edu.au david.powers@flinders.edu.au

Abstract— Motion vectors extracted from a compressed video frame, motion vector approach requires much lower
file can be used to track objects in the video and it could be efficient computational cost. More importantly, the trajectory
as motion vectors provide trajectory information of the objects. information of motion vectors are valuable for tracking objects
However, tracking objects represented by the motion vectors can which the above methods don’t possessed.
be inaccuracy because of camera movement, small size sets of
motion vectors acting as noise, unmoving of the object and Motion vectors are determined during the motion estimation
occlusion. These are conditions in most real world video step when encoding video by matching similar blocks between
application. The system in this paper uses the statistical and successive frames. After encoding, motion vectors are used to
distributional information of motion vectors to overcome the store the video. Extracting the motion vectors without decoding
problems with three stages. 1) Frame preprocessing uses a Mode the video and use them for object tracking is the current research
reduction technique to remove unwanted motion vectors created trend. A few articles discuss the use of motion vector to track
from camera movements. 2) Intra-frame processing: k-means is multiple objects. The approaches mainly contains two steps.
used to segment and cluster moving objects. Statistical standard Object detection [4-6] which extracts the sets of motion vectors
deviation is used to extract objects’ torso and remove small size representing objects and object correspondence [7-9] which
sets of motion vectors. 3) Inter-frame processing: By comparing associates same object between consecutive frames. You et al.
the positional information between successive frames, tracking [10] uses probabilistic spatio-temporal MB filtering (PSMF) to
object in successive frames is assigned a same label. A copying rule detect objects and track the multiple objects with the color
is used to represent the stopping of the tracking object. The
information. Kas et al. [11] uses an outlier filter to smooth the
direction and velocity information of motion vector is used for the
occlusion problems. Overall, an experiment on tracking multiple
boundary of object and remove noise. For the object
basketball players demonstrates a good result of the system. correspondence, object size and moving direction is used to
associate the same object. In the system of this paper, different
Keywords—motion vectors; multiple object tracking; techniques are used and the techniques are based on applying
clustering;compressed video; statistical methods. These methods are designed to overcome the
problems including: First, camera movements, when video is
recording, generate homogenous background motion vectors
I. INTRODUCTION
which are presented in the data after the extraction from
Object tracking in video has many practical applications compressed video file. These background motion vectors need
such as video surveillance, human activity recognition and to be removed for motion segmentation. Second, the number of
robotic. While the availability of video is prevalent, compressed moving objects in a frame is unknown. This information is
domain methods are more suitable than pixel domain methods needed to group motion vectors to represent an object as a set of
as the lower computational requirement [1]. For video formats motion vectors. This system provides a method to apply k-means
like MPEG and H.264, motion vectors are used in compression with knowing approximated number of clusters. Third, when
to reduce the transmission and storage requirement [2]. They tracking a person often the movement of the torso is interested.
record the positional information which is indicating a pixel’s But other movement of the other parts such as hands will distract
source coordinates and destination coordinates between two the tracking region. The distributional information of the
consecutive frames. It is also knew as optical flow. These motion. Lastly, object occlusion is a changeling problem for
coordinates are useful for object tracking as they provide object tracking task and this system provides a method to
trajectory information of moving objects in the scene and the differentiate the front object and the back object by the direction
generated attributes of the objects such as the direction of and the velocity of motion vectors to make sure correct labelling.
travelling and the velocity. There are other advantages of using
motion vectors. In contrast to the popular object tacking methods The system is designed for multiple object tracking under
[3] which use a target shape model, there is no need to predefine those problems. Three stages are performed to track multiple
any target shape template as an object is represented by a set of target objects. The first stage is frame preprocessing to remove
motion vectors. Comparing to the feature based methods which unwanted motion vectors generated by camera movements with
segment object by edge detection on pixel domain in every a Mode reduction technique. The second stage is to use k-means

978-1-5386-2839-3/17/$31.00 ©2017 IEEE


to cluster each frame individually. All moving object for each
= arctan ( ) (2)
frame is detected and statistical standard deviation is used to

extract objects' torus. The third stage associates the same target
between successive frames with a same label by using the
positional information. Also, a copying rule is defined to = ( − ) + ( − ) (3)
represent the stopping of the tracking object. Finally, the
occlusion problem between two objects is solved by using the
direction and the velocity of motion vectors. In statistic, a Mode is the value that occurs most often in a
dataset within a dimension. By iteratively removing the motion
II. FRAME PREPROCESSING vectors of the Mode in the dimension of (2) until reaching a
threshold, the background motion vectors are removed. The
After extracting the motion vectors from a compressed video threshold can be estimated as the number of motion vectors on
using FFMPEG [12], they are imported into MATLAB. Each the players. When removing the background motion vectors in
frame is stored as a cell array and they contain motion vectors. each iteration, some motion vectors on the players maybe have
So, each cell array contains a 4 column matrix: same direction and different velocity to the background motion
vectors. To avoid removing these motion vectors, the Mode of
the background motion vectors in the dimension of (3) is also
(1) considered.
⋮ ⋮ ⋮ ⋮

Where and are the coordinates of a pixel in previous


frame and is for source. Where and are the coordinates For each frame the Frame Preprocessing algorithm is summaries
of a pixel in current frame and is for destination. A row as:
containing these four attributes contributes a motion vector. The
number of rows is the number of pixels that have a motion vector 1. Find the Mode of the motion vectors based on the
in a frame. Figure 1 shows the motion vectors where the tail of direction: In the dimension of , the most occurring
the arrow is the coordinate form previous frame and the head of value is found as the Mode of direction. The motion
the arrow belongs to current frame. So they are the points that vectors having this Mode are selected.
have been moved. For visualizing purpose, the magnitudes of all 2. Among the selected motion vectors, in the dimension of
motion vector are set to be the same. Only the direction of each , the most occurring value is found as the Mode of
motion vector is shown on Figure 1. velocity. The motion vectors having this Mode are
selected.
3. Remove the selected motion vectors.
4. Repeat steps 1 to 3 until the number of motion vectors
in the frame is the number of motion vectors on all the
objects.

Fig. 1. Motion vectors with same magnitudes. (A YouTube video [14])

It can be seen that in Figure 1 there are a large number of


motion vectors that are similar in direction which is calculated
as (2) and in velocity as (3). On the other hand, the number Fig. 2. After removing background motion vectors.
of motion vectors on the players is small comparing to these
background motion vectors. They are created by the camera
movement. Camera movement may create one or more such
group of background motion vectors. A Mode reduction
technique is designed to remove these groups.
III. INTRA FRAME PROCESSING Where is the mean of the coordinates of all the
The aim of this process is to detect all moving objects in each motion vectors in the cluster and is the standard deviation of
individual frame with the objects’ torso represented by motion . is the mean of the coordinates of all the motion
vectors. The first step is to use k-means [13] to cluster the data vectors in the cluster and is the standard deviation of .
and of each frame. It clusters all the current position data Final step for intra frame processing is to remove clusters
of all moving objects so that each cluster represents an object. consisting small number of motion vectors. These motion
However, the difficulty of using k-means is the unknown of the vectors are from the bad clusters as discussed and the outliers
number of objects in the frame. The approach is to use an when performing the k-means.
approximated number of objects estimated by user and remove
unwanted motion vectors in any cluster after performing k-
means. Figure 3 shows a result after using k-means with
For each frame the Intra Frame Processing algorithm is
approximated number of clusters. It can be seen that some bad
summaries as:
clusters have a void inside. The small sparse groups around the
void are created by small parts of the object such as hand 1. Perform k-means with approximated number of
movement or the object’s shadow. They are unwanted for object clusters.
tracking. They also appears in the compact clusters sitting
around the cluster. So, the unwanted motion vectors are all 2. For each cluster, remove motion vectors that sitting
around all clusters. The approach to remove them is to use outside the 1.2 standard deviation from the center of the
statistical standard deviation to detect them. cluster.
3. Remove clusters that have small number of motion
vectors.

Figure 4 shows the resulting clusters that their sizes are


reduced by removing the unwanted motion vectors and the
moving objects are now represented by the motion vectors on
the torso. They are also represented by ellipses with 4 as the
dimeter.

Fig. 3. k-means clustering with appromixated number of clusters.

Standard deviation is widely used in statistic to remove


outliers. It measures the distance between the mean and the tail
of the normal distribution where within this distance the
percentages of data points are distributed according to the
standard deviation, , which is equal to the square root of the
variance. For example, within ± there are around 68% data
points and within ±2 there are around 96% data points. Most Fig. 4. k-means with motion vectors removing using 1.2 standard deviation.
data points are within ± and less data points are in the outer.
This rule can be used to remove the unwanted motion vectors as
the unwanted motion vectors are properly located around IV. INTER FRAME PROCESSING
the ±2 . The determination of the number in front of ± After every individual frame has been clustered, all moving
depends on the resolution of the video. In the system of this objects are labeled. However, the same object in successive
paper, 1.2 is selected. So for each cluster, a motion vector is frames gets different labels. To correspond the labels between
removed if it is outside: successive frames, the system performs a comparison algorithm
between each tracking object in current frame and all the clusters
in next frame. The aim is to predict the tracking object’s position
± 1.2 or ± 1.2 (4) in next frame. If there is a cluster of next frame is close to the
tracking object of current frame, it is a good indicator that the
cluster is the displacement of the tracking object.
In the first step, each tracking object in current frame is V. EXPERIMENT
compared with all the clusters in next frame. The closeness of The experiment is the application of the system to track
the cluster to the tracking object is defined by their overlapping basketball players. Two kinds of situation are included. The first
area. This overlapping area is indicated by the number of motion kind is that all the players are running from one side of the court
vectors from the cluster residing within the tracking object’s to the other and three players are selected at the first frame. The
ellipse defined by the standard deviations as shown in Figure 5. second kind is that a player with the ball runs and passes the
The radius of the x and y axes are: defenders. The player with the ball and the player in occlusion
with him are selected at the first frame.
±2 (5) A video downloaded from YouTube [14] contains the
situations in different match. 5 video cuts for each situation are
±2 (6) performed the test. Each video is around 3 to 4 seconds and is
about 100 frames. The experiment tries to find out how many
players are loss track by the system. Figure 6 and Figure 7 are
Where and are the mean and the standard deviation of examples for the two situations.
the coordinates of all the motion vectors on the tracking
object. and are the mean and the standard deviation of the
coordinates of all the motion vectors on the tracking object.
Motion vectors from a cluster of next frame and residing in
the tracking object’s ellipse are considered. If the number of
them is high enough, the cluster is considered as the possible
position of the tracking object in next frame. After comparing all
the clusters in next frame, the cluster having the highest number
is selected as the position of the tracking object.

First frame

Fig. 5. Overlapping area.

The second step is to detect the stopping objects. In the case Occlusion frame
that there is no clusters having high enough number of
overlapping motion vectors, it indicates there is no cluster in
next frame is overlapped with the tracking object. In other
words, the tracking object is stop moving. Then the system copy
the coordinates and of the tracking object into next frame.
These coordinates represent the tracking object in next frame.
The last step is to deal with the occlusion problem. If a
cluster, , is assigning with two labels, it indicates there are two
tracking objects are in occlusion. is labelled with one of the
two labels of the two objects. This is determined by the
similarity between and the tracking objects. The object having
higher similarity in term of direction and velocity will be the
front object and is assigned with the same label as the object. Last frame
On the other hand, the back object’s label is recoded and it will
be assigned to a cluster when there is a less similar cluster close
Fig. 6. Tracking players with occulsion.
to . This approach make sure the front object is always having
correct label.
In the tracking of two players when occlusion occurs, there Future work will be developing techniques for the more
are 2 cases fail. In one case, both players are loss. In another, one difficult occlusion situations such as more than two objects are
player is loss. There are total 10 players are tracked. So the in occlusion. Use of color information may improve the result.
accurate rate is 70%. In the tracking of three players, 4 out of 15 Using k-means with approximated number of clusters may be
players are loss. So the accurate rate is 73.33%. From both the updated with a more advance method from statistics to
situation, the main cause for the trajectory loss is the occlusion. automatically determine the number of clusters. This makes the
application easier to use. The quality of the motion vectors
extracted from compression video is affected by the
compression formats such as MPEG and H.264. The parameters
such as the size of motion blocks, compression rates and
compression speed are to be considered in the future work.

REFERENCES
[1] Babu, R., Tom, V., & Wadekar, M. (2016). A survey on compressed
domain video analysis techniques. Multimedia Tools and Applications,
75(2), 1043-1078.
First frame [2] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, "Overview of
the H.264|AVC Video Coding Standard," IEEE Trans. Circuits Syst.
Video Technol., vol. 13, No. 7, pp. 560–576, July 2003.
[3] Trucco, E., & Plakas, K. (2006). Video Tracking: A Concise Survey.
Oceanic Engineering, IEEE Journal of, 31(2), 520-529.
[4] Babu, R., Ramakrishnan, K., & Srinivasan, S. (2004). Video object
segmentation: A compressed domain approach. Circuits and Systems for
Video Technology, IEEE Transactions on, 14(4), 462-474.
[5] Xiao-Dong Yu, Ling-Yu Duan, & Qi Tian. (2003). Robust moving video
object segmentation in the MPEG compressed domain. Image Processing,
Last frame 2003. ICIP 2003. Proceedings. 2003 International Conference on, 3, III-
933.
[6] Fei, W., & Zhu, S. (2010). Mean shift clustering-based moving object
Fig. 7. Tracking multiple players. segmentation in the h.264 compressed domain. IET Image Processing,
4(1), 11-18.
[7] Favalli, L., Mecocci, A., & Moschetti, F. (2000). Object tracking for
VI. CONCLUSION retrieval applications in MPEG-2. Circuits and Systems for Video
In this paper, it is demonstrated that tracking multiple objects Technology, IEEE Transactions on, 10(3), 427-432.
can be efficient by using motion vectors extracted from [8] Yokoyama, T., Iwasaki, T., & Watanabe, T. (2009). Motion Vector Based
Moving Object Detection and Tracking in the MPEG Compressed
compressed video. The system applies statistical and clustering Domain. Content-Based Multimedia Indexing, 2009. CBMI '09. Seventh
techniques on motion vectors to track multiple objects in real International Workshop on, 201-206.
video. This approach allows effective use of the positional, [9] Lan Dong, Zoghlami, & Schwartz. (2006). Object Tracking in
distributional and directional information of the motion vectors. Compressed Video with Confidence Measures. Multimedia and Expo,
2006 IEEE International Conference on, 753-756.
To overcome the problems of using motion vector for [10] You, W., Sabirin, M., & Kim, M. (2012). Real-time detection and tracking
tracking multiple objects, the system performs three stages on of multiple objects with partial decoding in H.264/AVC bitstream
the data. The preprocessing stage uses a Mode reduction domain. Proceedings of SPIE 2009, Volume: 7244, Publisher: SPIE,
technique to remove background motion vectors created by Pages: 72440D-72440D-12.
camera movements. The intra frame processing uses k-means to [11] Käs, C., & Nicolas, H. (2009). An approach to trajectory estimation of
cluster the motion vectors for the representation of the moving moving objects in the H.264 compressed domain. Lecture Notes in
Computer Science (including Subseries Lecture Notes in Artificial
objects. To extract object’s torus as the tracking area, statistical Intelligence and Lecture Notes in Bioinformatics), 5414, 318-329.
standard deviation is used to remove unwanted motion vectors.
[12] FFmpeg: Documentation. Retrieved from
The inter frame processing uses the positional information to https://www.ffmpeg.org/documentation.html
associate the same target between successive frames with a same [13] MacQueen, J. B. (1967). Some Methods for classification and Analysis of
label. The occlusion problem is solved by using the direction and Multivariate Observations. Proceedings of 5th Berkeley Symposium on
velocity information. In the experiment of tracking multiple Mathematical Statistics and Probability. University of California Press.
basketball player, the system performs well. pp. 281–297.
[14] [NBA]. (2016). Best Highlights From the 2017 NBA Playoffs | 1st Round.
Retrieved from https://www.youtube.com/watch?v=yGnKHxNVibw

You might also like