Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

HUMAN ACTIVITY RECOGNITION from VIDEO

Guide Prof. Vijay Bhosale

Presented by: Ms.Rajashri S. Ms. Bhagyashri S. Ms. Dipashri S.

Human Activity Recognition


Focus is on three fundamental issues: Design of a classifier & data modeling for activity recognition How to perform feature selection How to define the structure of the classifier

Introduction
Human movement at different levels: Analysis of the movement of body parts Single person activities Over increasing temporal windows Large scale interaction Human Motion analysis common tasks: Person detection & tracking Activity classification Behavior interpretation & person identification

Human action interpretation Three Approaches: 1.Generic model recovery - Try to fit 3D model to the person pose 2.Appearance based model -based on extraction of 2D shape model directly from the image 3. Motion based model -rely on people motion characteristics

Bobick and davis work [1] used Motion Energy and Motion History Images (MEI and MHI) to classify aerobic-type exercises. Efros et al [3] compute optical flow measurements in a spatio-temporal volume to recognize human activities in a nearest-neighbor framework. The CAVIAR [13] sequences are used in [7] to recognize a the set of activities, scenarios and roles. The approach generates a list of features and automatically chooses the smallest set, that accurately identifies the desired class. The design of the classifier, we use Bayesian classifier Functions are modeled as Gaussian mixture

Low level activities & features


The activities can be detected from a relatively short video sequences & are described below:
id 1 2 #frame Activity 3,211 1,974 Inactive Active Description A static person/object Person making movements but without translating in the image

3 4
5

9,831, 297
594

Walking

There are movements & overall image translation

Running As in walking but with larger translation


Fighting Large quantities of movement with few translation

Features There are two large sets of features, each organized in several subgroups. 1.Subset of features code the instantaneous position & velocity of the tracked subject. - Organized in 3 groups: i) instantaneous measurement ii) Avg. speed/velocity based features iii) 2nd order moments/energy related indicators.

2. Based on estimates of the optic flow or instantaneous pixel motion inside the bounding box. -Organized in 4 subgroups: i) instantaneous measurement ii) Spatial 2nd order moments iii) Temporal averaged quantities & iv) Temporal 2nd order moments/energy related indicators.

Feature Selection & Recognition


1.The recognition strategy: Given a set of activities Aj, j=1,n, the posterior probability of a certain activity taking place can be computed using Bayes rules: P(Aj|F(t))= p(F(t)|Aj)P(Aj)/p(F(t) Where, P(Aj|F(t)) is the likelihood of activity Aj P(Aj) is the prior probability of the same activity p(F(t) is the probability of observing F(t), irrespective of the underlying activity.

To build the Bayesian classifier, estimate the likelihood function of the features, given each class. The likelihood function is approximated by: p(F(t)|Ak)j N(j,j ) Where N(j,j ) denotes a normal distibution j represents the weight of that Gaussian in the mixture for each listed activity

2.Selecting promising features:


Three Approaches: 1.Brute-Search 2.Lite Search 3.Lite-lite search Following table summarizes the cost of these different method, for M=29
Brute-search Lite search CNfM=M!/Nf !(M- Nf )! M+(M-1)+(Nf term) Lite-lite M+1

Nf

1 2 3

29 406 3654

29 57 84

29 30 30

Relief algorithm creates a weight vector over all features to quantify their quality. This vector is updated according to: wi= wi+(xi-nearmiss(x)i)2-(xi-nearhit(x)i)2 where wi represents the weight vector Xi the ith feature for data point x nearmiss(x) & nearhit(x) denote the nearest point to x from the same & different class respectively.

Following table show the results obtained using these different feature search criteria & for 1,2, or 3 features. Brute Search
Feat .
1

Lite Search
Feat.
7

Lite-Lite Search
Feat.
7

Relief
Feat.
14

Feat.
7

R. rate
83,9%

R. rate
83,9%

R. rate
83,9%

R. rate
46,8 % 59,2 % 57,1 %

9 18

93,5%

7 25

89,8%

7 18

89,6%

14 18

3 9 20

94%

7 19 25

92,1%

7 18 23

86,7%

14 18 23

Classifier Structure
Group activities in subsets & perform classification in a hierarchical manner Figure shows binary hierarchical classifier
Active Inactive Walking Running Fighting

95,5% classifier1

99,3% Inactive classifier 2 Active

Walking 98,8% Running classifier 3 Fighting 100% classifier 4

Walking

Running

Human Activity Analysis


Categories: 1.Gestures elementary movements of a persons body parts and are the atomic components describing the meaningful motion of a person. For eg: stretching an arm, raising a leg 2.Actions- single person activities that may be composed of multiple gestures organized temporally such as walking, waving and punching. 3. Interactions involves 2 or more persons or object. Eg. 2 persons fighting 4. Group Activities conceptual group composed of multiple persons or objects. Eg. Group having meeting,2 groups fighting

Human Activity Recognition methodologies

1.Single layered Approach


Represent and recognize human activities directly based on sequences of images. Analyze sequential movements of humans such as walking, jumping and waving. Categorized into 2 classes
Space-time approach
Space-time volume Space-time trajectories Space-time features

Sequential approach
Exemplar based approach State model based approach

Space-time approach
Approaches that recognize human activities by analyzing space-time volumes of activity videos. The video volumes are constructed by concatenating image frames along a time axis, and are compared to measure their similarities. Figure 4 shows example 3-D XYT volumes corresponding to a human action of `punching'.

Various recognition algorithms using space-time representations


Template matching, which constructs a representative model (i.e. a volume) per action using training data. Neighbor-based matching, the system maintains a set of sample volumes (or trajectories) to describe an activity. Statistical modeling algorithms, which match videos by explicitly modeling a probability distribution of an activity.

1. Action Recognition with Space-time volumes:


The core of the recognition is in the similarity measurement between two volumes. Bobick & Davis constructed a real-time action recognition system using template matching.

Represents each action with a template composed of two 2dimensional images: a 2-dimensional binary motion-energy image (MEI) and a scalar-valued motion-history image (MHI). These images are constructed from a sequence of foreground images, which essentially are weighted 2-D (XY) projections of the original 3-D XYT space-time volume.

Shechtman and Irani have estimated motion flows from a 3D space-time volume to recognize human actions. Rodriguez have analyzed 3-D space-time volumes by synthesizing filters: adopted the maximum average correlation height (MACH) filters used for an analysis of images (e.g. object recognition), to solve the action recognition problem. Disadvantage : difficulty in recognizing actions when multiple persons are present in the scene.

2. Action recognition with space-time trajectories. Interpret an activity as a set of space-time trajectories. A person is generally represented as a set of 2dimensional (XY) or 3-dimensional (XYZ) points corresponding to his/her joint positions.

Advantage : ability to analyze detailed levels of human movements.

3. Action recognition using space-time local features:


Approaches using local features extracted from 3dimensional space-time volumes to represent and recognize activities. Focusing on three aspects: what 3-D local features the approaches extract, how they represent an activity in terms of the extracted features, and what methodology they use to classify activities. Advantages: By its nature, background subtraction or other low-level components are generally not required, and the local features are scale, rotation, and translation invariant in most cases. Suitable for recognizing simple periodic actions such as `walking' and `waving',

Sequential approaches
Recognize human activities by analyzing sequences of features. Two categories
Exemplar-based recognition approaches State model-based recognition approaches

Exemplar-based sequential approaches describe classes of human actions using training samples directly. State model-based sequential approaches are approaches that represent a human action by constructing a model which is trained to generate sequences of feature vectors corresponding to the activity.

*** THANK U ***

You might also like