The Gait Identification Challenge Problem: Data Sets and Baseline Algorithm

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

The Gait Identication Challenge Problem: Data Sets and Baseline Algorithm

P. Jonathon Phillips , Sudeep Sarkar , Isidro Robledo , Patrick Grother , and Kevin Bowyer NIST, Gaithersburg, MD 20899-8940 Computer Science and Engineering, University of South Florida, Tampa, Florida 33620-5399 Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556 jonathon@nist.gov, sarkar, irobledo @csee.usf.edu, pgrother@nist.gov, kwb@cse.nd.edu

Abstract
Recognition of people through gait analysis is an important research topic, with potential applications in video surveillance, tracking, and monitoring. Recognizing the importance of evaluating and comparing possible competing solutions to this problem, we previously introduced the HumanID challenge problem consisting of a set of experiments of increasing difculty, a baseline algorithm, and a large set of video sequences (about 300 GB of data related to 452 sequences from 74 subjects) acquired to investigate important dimensions of this problem, such as variations due to viewpoint, footwear, and walking surface. In this paper, we present a detailed investigation of the baseline algorithm, quantify the dependence of the various covariates on gait-based identication, and update the previous baseline performance with optimized ones. We establish that the performance of the baseline algorithm is robust with respect to its various parameters. The overall identication performance is also stable with respect to the quality of the silhouettes. We nd that the approximately lower 20% of the silhouette accounts for most of the recognition achieved. Viewpoint has barely statistically signicant effect on identication rates, whereas footwear and surface-type does have signicant effects with the effect due to surface-type being approximately 5 times that of shoe-type. The data set, the source code for the baseline algorithm, and UNIX scripts to reproduce the basic results reported here are available to the research community at marathon.csee.usf.edu/GaitBaseline/

1 Introduction
To assist in the advancement of human identication from gait analysis, which is currently an active area of computer vision [1, 2, 3, 5, 4, 6, 7], we had proposed a challenge problem, described a supporting database, and presented a baseline algorithm to solve the problem [9]. The challenge problem addresses the following questions: (1) Under what conditions is gait recognition solvable? (2) What are the

important factors affecting a persons gait? (3) What directions appear promising for improving the performance of gait-based recognition? Performance gures of one algorithm on a small proprietary database cannot answer these questions. Rather, answers will come from detailed analysis of performance statistics of multiple algorithms on a large common data set. The gait challenge problem provides this framework. In this paper, we present optimized performance of the gait baseline algorithm, along with detailed analysis of the factors that affect gait-based recognition, as measured by the gait baseline algorithm. The database associated with the HumanID challenge problem currently consists of 452 sequences from 74 individuals, with video collected for each individual in up to 8 conditions: two surface conditions, two shoe types, and two camera views. All of the data is collected outdoors; reecting the added complications of shadows from sunlight, motion in the background, and moving shadows due to cloud cover. This database is the largest available to date in terms of number of people, number of video sequences, and variety of conditions under which a persons gait is collected. Most works in gait-based recognition use data from 10 to 20 persons, under only one or two factor variations. The baseline algorithm is a simple one that only touches on some of the important aspects of the gait challenge problem such as comparing temporal signatures and gureground segmentation. It does not consider issues such as modeling of human motion and occlusions. However, even this version of the baseline algorithm has good performance in some instances and is able to tease out various factors that affect gait-based identication. It provides a base for measuring improvement in performance. Improvements in performance over the baseline will touch upon some of these areas ignored by the baseline algorithm. The connection with the challenge problem could serve as a basis for developing and improving algorithms in these areas. We make available the infrastructure tools associated with this HumanID gait challenge problem. The tools include the scripts for running small and large experiments,

1051-4651/02 $17.00 (c) 2002 IEEE

processing from intermediate steps, and methods for detailed performance analysis. The small and large experiments will provide a variety of levels for new researchers to start investigating recognition by gait. The availability of intermediate results will allow researchers to focus on different aspects of the problem. For example, the availability of the silhouette sequences means that a researcher can focus on the recognition part of problem. At the same time, another researcher could focus on the segmentation part of the problem. The analysis tools will provide a basis for determining under what conditions the problem is solvable, identifying the underlying reasons for these conditions, and pointing to future directions for research and investigation.

frame and for each pixel within the corresponding bounding boxes, we compute the Mahalanobis distance of the pixel value from the estimated background pixel statistics. We have found that if we smooth the distance image using a 9 by 9 pyramidal averaging lter, the resultant silhouettes have smooth boundaries. Any pixel with this smoothed disis declared tance above a user specied threshold of a foreground pixel. We then remove connected regions less than pixels and scale the silhouette to occupy a 128 by 88 sized block. This scaling offers some amount of scale invariance and facilitates the fast computation of the similarity measure, which is the third step of the processing. We denote the probe and the gallery silhouette se and quences by , respectively. We rst partition the probe sequence into disjoint subsequences of contiguous frames each, such that each subsequence contains roughly one stride. Let the -th probe subsequence be denoted . We then correby late each of these subsequences with the gallery sequence:

The similarity is the median value of the maximum correlation of the gallery sequence with each of these probesubsequences. Sim

Corr

FrameSim

2 The Data
Each subject walked two, one on concrete pavement and the other on a grass lawn, similar sized elliptical courses, with a major axis of about 15m and a minor axis of about 5m. Each course was viewed by two cameras. The cameras were located approximately 15 meters from each end of the ellipse, with lines of sight adjusted to view the whole ellipse. Subjects were asked to read, understand, and sign an IRB-approved consent form before participation. Information recorded in addition to the video includes sex (75% male), age (19 to 54 yrs), height (1.47 m to 1.91 m), weight (43.1 kg to 122.6 kg), foot dominance (mostly right), type of shoes (sneakers, sandal, etc.), and heel height. A little over half of the subjects walked in two different shoe types. Thus, for each subject there were up to eight video sequences: (grass (G) or concrete (C)) x (two cameras, L or R) x (shoe A or shoe B).

Median

Corr

(1)

At the core of the above computation is, of course, the need to compute the similarity between two silhouette frames, , which we simply compute to be FrameSim the ratio of the number of pixels in their intersection to their union.

4 The Challenge Experiments


We specied the challenge experiments, in increasing order of hardness, for gait-based recognition in terms of gallery and probe sets, patterned after the Face Recognition Technology (FERET) evaluations [8]. The data allows for 2 possible values for each of the three covariates, which are: concrete (C) or grass (G) walking surfaces, two shoe types (A and B), and left (L) and right (R) camera viewpoints. Based on the values of these covariates we can divide the dataset into 8 possible subsets: (G, A, L), (G, A, R), (G, B, L), (G, B, R), (C, A, L), (C, A, R), (C, B, L), (C, B, R) . Since not every subject was imaged under every possible combination of factors, the sizes of these sets are different. We choose one of the large subsets (G, A, R), i.e. (Grass, Shoe Type A, Right Camera), as the gallery set. The rest of the subsets are probe sets, differing in various ways from the gallery and listed in Table 1.

3 Baseline Algorithm
The baseline algorithm, which we designed to be simple and fast, is composed of three parts. First, using a Javabased GUI, we semi-automatically dene bounding boxes around the moving person in each frame of a sequence. Second, we extract the silhouette of the person by processing the portion only with the bounding boxes. For this step, we rst estimate the background statistics in terms of the mean and covariances of the RGB channels at each pixel, using the pixel values outside the bounding boxes. For each

5 Parameter Variation
There are three parameters that need to be chosen for the baseline algorithm : , which is used to threshold

1051-4651/02 $17.00 (c) 2002 IEEE


Exp. A B C D E F G Probe (G, A, L)[71] (G, B, R)[41] (G, B, L)[41] (C, A, R)[70] (C, B, R)[44] (C, A, L)[70] (C, B, L)[44] Difference View Shoe Shoe, View Surface Surface, Shoe Surface, View Surf, Shoe, View (at rank) 1 5 79% 96% 66% 81% 56% 76% 29% 61% 24% 55% 30% 46% 10% 33%

(a) Baseline (79%-96%)

(b) Interpolated (82%-93%)

(c) Not smoothed (69%-86%)

86% 76% 59% 42% 52% 41% 36%


Similarity Measure Variations (Eq. 1) Median Mean Min Max (rank 1) 79% 81% 66% 73% (rank 5) 96% 92% 85% 96% Mid 58% 83%

(rank 1) (rank 5) (

5 83% 93%

6 79% 96% 100 82% 96%

7 79% 96% 200 79% 96%

(rank 1) (rank 5) ( (rank 1) (rank 5)

1 78% 93%

10 82% 94%

20 80% 94%

30 79% 96%

8 75% 94% ) 300 78% 96% ) 40 75% 92%

9 72% 93% 400 78% 94% 50 79% 93%

6 Algorithm Variation
We also studied the variation in performance with minor variations in the baseline algorithm. What happens if we (bilinearly) interpolate the Mahalanobis distance and then threshold to obtain the silhouette, instead of interpolating the silhouette with a zero-order hold strategy, as we presently do? Fig. 2(b) shows the distance-interpolated version of the baseline silhouette shown in Fig. 2(a). The boundaries of the resulting silhouette are smoother, but the performance does not signicantly improve, as is seen in the identication rates at ranks 1 and 5 for experiment A, which we list below the silhouettes. On the other hand, if we do not spatially smooth the Mahalanobis distances the performance drops quite a bit. As we see in Fig. 2(c), the quality of the silhouette is indeed poor and the resultant performance for experiment A drops. Another variation that we consider is related to the silhouette similarity computation strategy as specied by Eq. 1. Instead of the median function, we consider the mean, minimum, maximum, or only the middle probe subset similarity. As we see from Table 3, the performance for the mean and the maximum overlap with that for the median, but the performance with the minimum and the middle is low.

the Mahalanobis distance; , which is used to delete small regions and ll in small holes in the thresholded difference image; and , which is the size of each subsequence obtained by partitioning the probe sequence. To optimize these values, we looked at the performance variation of challenge experiment A (Table 1) around the operating , , , which we point: show to be an optimal point. Table 2 lists the identication rates at ranks 1 and 5 for different values of , , and as one of the parameters is changed, keeping the other two constant. Based on the variation in the observed performance we can see that is the most sensitive of the three parameters. In addition, the chosen operating point is, at least locally, an optimal one. The last three columns of Table 1 lists the optimized performance of the baseline algorithm.

7 Silhouette Portions
What portion of the silhouette is important for gait-based recognition? Are all parts of the silhouette equally important? To study this we plot the identication rate (at rank 1) for experiment A as portions of the silhouette from the

1051-4651/02 $17.00 (c) 2002 IEEE

90 80 70 60 50 40 30 20 10 0 0 Portion kept from top Portion removed from top

Identification Rate for Experiment A

tistically signicant effect (P-values = 0.0001). In addition, the surface-type effect seems to be about 5 times that due to shoe-type.

9 Conclusions
Continuing on our work on establishing a gait-based identication challenge problem, here we presented the optimized performances of the gait baseline challenge experiments and analyzed its performance beyond just overall performance numbers. We found that lower 20% of the silhouette seems to account for most (90%) of the identication rates. We also found that the effects of shoe-type and surface-type are statistically signicant, whereas viewpoint variation is barely signicant. The effect of surface-type seems to be almost 5 times that of shoe-type.
Acknowledgment: This research was supported by funds from the DARPA Human ID program under contract AFOSR-F4962000-1-00388. Stan Janet and Karen Marshall from NIST very meticulously assembled the data for distribution and created the bounding boxes for the sequences. Thanks to the HumanID researchers at CMU, Maryland, MIT, Southampton, and Georgia Tech. for discussion about potentially important covariates for gait analysis. We also thank Dr. Pat Flynn from U. of Notre Dame for testing the baseline algorithm code and scripts before release.

10

20

30 40 50 60 70 Percentage of the silhouette from top

80

90

100


top are removed (or kept) in Fig. 3. As we consider increasing portions from the top, the recognition rate increases at a fairly constant rate. As portions are removed from the top, the identication rate falls gradually up to about 80% removal, after which the rate of fall is drastic. This 80% point (on the horizontal axis) is also the point where the two curves intersect: the identication rate for the lower 20% of the silhouette is the same (70%) as that for the upper 80%. In other words, the lower 20% of the silhouette, which is approximately the portion from the knee downwards, accounts ) of the achieved total recogfor approximately 90% ( nition rate.

References
[1] C. BenAbdelkader, A. Cutler, H. Nanda, and L. Davis. Eigengait: Motion-based recognition of people using image selfsimilarity. 3rd International Conference on Audio- and VideoBased Biometric Person Authentication, June 2001. [2] J. B. Hayfron-Acquah, M. S. Nixon, and J. N. Carter. Automatic gait recognition by symmetry analysis. 3rd International Conference on Audio- and Video-Based Biometric Person Authentication, June 2001. [3] A. Y. Johnsson and A. F. Bobick. A multi-view method for gait recognition using static body parameters. 3rd International Conference on Audio- and Video-Based Biometric Person Authentication, June 2001. [4] J. J. Little and J. E. Boyd. Recognizing people by their gait: The shape of motion. Videre, 1(2), 1998. [5] D. Meyer, J. Posl, and H. Niemann. Gait classication with HMMS for trajectories of body parts extracted by mixture densities. In BMVC98, 1998. [6] H. Murase and R. Sakai. Moving object recognition in eigenspace representation: Gait analysis and lip reading. PRL, 17(2):155162, February 1996. [7] S. A. Niyogi and E. H. Adelson. Analyzing gait with spatiotemporal surfaces. In Vismod, 1994. [8] P. J. Phillips, H. J. Moon, S. A. Rizvi, and P. J. Rauss. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(10):10901104, 2000. [9] P. J. Phillips, S. Sarkar, I. Robledo, P. Grother, and K. Bowyer. Baseline results for the challenge problem of human id using gait analysis. In International Conference on Face and Gesture Recognition, May 2002.

8 Covariate Effects
We quantify the effect of a covariate on recognition by comparing performances for two probes that differ with respect to that covariate, but are similar in all other aspects. Therefore, for instance, if we want to study the effect of viewpoint on performance, then we could consider the probes in experiments B and C, which differ with respect to just viewpoint. For shoe-type we can compare experiments A and C. Moreover, for surface-type we can compare experiments B and E. We quantify the effects as follows. Let the similarity between the probe and gallery sequences for subject be denoted by Sim . Let this similarity for the two choices of the probe sets, Probe and Sim , re1 and Probe 2, be Sim spectively. The drop in similarity for subject , given by Sim Sim Sim , quanties the effect of a covariate on subject . The distribution of these Sim for all the subjects that are common between the probes and the gallery would provide an idea of the net effect of the covariate. The median drop in similarity scores for viewpoint, shoe, and surface were -0.245, -0.350, and -1.627, respectively. Wilcoxons signed rank test showed that the drop in scores for viewpoint variation were barely signicantly different from zero (P-value = 0.0496), whereas the shoe-type and surface-type have sta-

1051-4651/02 $17.00 (c) 2002 IEEE

You might also like