Professional Documents
Culture Documents
Human Activity Recognition From Video
Human Activity Recognition From Video
Introduction
Human movement at different levels: Analysis of the movement of body parts Single person activities Over increasing temporal windows Large scale interaction Human Motion analysis common tasks: Person detection & tracking Activity classification Behavior interpretation & person identification
Human action interpretation Three Approaches: 1.Generic model recovery - Try to fit 3D model to the person pose 2.Appearance based model -based on extraction of 2D shape model directly from the image 3. Motion based model -rely on people motion characteristics
Bobick and davis work [1] used Motion Energy and Motion History Images (MEI and MHI) to classify aerobic-type exercises. Efros et al [3] compute optical flow measurements in a spatio-temporal volume to recognize human activities in a nearest-neighbor framework. The CAVIAR [13] sequences are used in [7] to recognize a the set of activities, scenarios and roles. The approach generates a list of features and automatically chooses the smallest set, that accurately identifies the desired class. The design of the classifier, we use Bayesian classifier Functions are modeled as Gaussian mixture
3 4
5
9,831, 297
594
Walking
Features There are two large sets of features, each organized in several subgroups. 1.Subset of features code the instantaneous position & velocity of the tracked subject. - Organized in 3 groups: i) instantaneous measurement ii) Avg. speed/velocity based features iii) 2nd order moments/energy related indicators.
2. Based on estimates of the optic flow or instantaneous pixel motion inside the bounding box. -Organized in 4 subgroups: i) instantaneous measurement ii) Spatial 2nd order moments iii) Temporal averaged quantities & iv) Temporal 2nd order moments/energy related indicators.
To build the Bayesian classifier, estimate the likelihood function of the features, given each class. The likelihood function is approximated by: p(F(t)|Ak)j N(j,j ) Where N(j,j ) denotes a normal distibution j represents the weight of that Gaussian in the mixture for each listed activity
Nf
1 2 3
29 406 3654
29 57 84
29 30 30
Relief algorithm creates a weight vector over all features to quantify their quality. This vector is updated according to: wi= wi+(xi-nearmiss(x)i)2-(xi-nearhit(x)i)2 where wi represents the weight vector Xi the ith feature for data point x nearmiss(x) & nearhit(x) denote the nearest point to x from the same & different class respectively.
Following table show the results obtained using these different feature search criteria & for 1,2, or 3 features. Brute Search
Feat .
1
Lite Search
Feat.
7
Lite-Lite Search
Feat.
7
Relief
Feat.
14
Feat.
7
R. rate
83,9%
R. rate
83,9%
R. rate
83,9%
R. rate
46,8 % 59,2 % 57,1 %
9 18
93,5%
7 25
89,8%
7 18
89,6%
14 18
3 9 20
94%
7 19 25
92,1%
7 18 23
86,7%
14 18 23
Classifier Structure
Group activities in subsets & perform classification in a hierarchical manner Figure shows binary hierarchical classifier
Active Inactive Walking Running Fighting
95,5% classifier1
Walking
Running
Sequential approach
Exemplar based approach State model based approach
Space-time approach
Approaches that recognize human activities by analyzing space-time volumes of activity videos. The video volumes are constructed by concatenating image frames along a time axis, and are compared to measure their similarities. Figure 4 shows example 3-D XYT volumes corresponding to a human action of `punching'.
Represents each action with a template composed of two 2dimensional images: a 2-dimensional binary motion-energy image (MEI) and a scalar-valued motion-history image (MHI). These images are constructed from a sequence of foreground images, which essentially are weighted 2-D (XY) projections of the original 3-D XYT space-time volume.
Shechtman and Irani have estimated motion flows from a 3D space-time volume to recognize human actions. Rodriguez have analyzed 3-D space-time volumes by synthesizing filters: adopted the maximum average correlation height (MACH) filters used for an analysis of images (e.g. object recognition), to solve the action recognition problem. Disadvantage : difficulty in recognizing actions when multiple persons are present in the scene.
2. Action recognition with space-time trajectories. Interpret an activity as a set of space-time trajectories. A person is generally represented as a set of 2dimensional (XY) or 3-dimensional (XYZ) points corresponding to his/her joint positions.
Sequential approaches
Recognize human activities by analyzing sequences of features. Two categories
Exemplar-based recognition approaches State model-based recognition approaches
Exemplar-based sequential approaches describe classes of human actions using training samples directly. State model-based sequential approaches are approaches that represent a human action by constructing a model which is trained to generate sequences of feature vectors corresponding to the activity.