Real World Activity Summary For Senior Home Monitoring: Multimedia Tools and Applications July 2011

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221263478

Real world activity summary for senior home monitoring

Conference Paper  in  Multimedia Tools and Applications · July 2011


DOI: 10.1109/ICME.2011.6012117 · Source: DBLP

CITATIONS READS

19 689

4 authors:

Hong Cheng Zicheng Liu


University of Electronic Science and Technology of China Macau University of Science and Technology
50 PUBLICATIONS   849 CITATIONS    133 PUBLICATIONS   5,935 CITATIONS   

SEE PROFILE SEE PROFILE

Yang Zhao Guo Ye


University of Electronic Science and Technology of China University of Electronic Science and Technology of China
2 PUBLICATIONS   82 CITATIONS    2 PUBLICATIONS   92 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-identification View project

Defect engineering of novel materials by ion beams View project

All content following this page was uploaded by Guo Ye on 08 April 2014.

The user has requested enhancement of the downloaded file.


REAL WORLD ACTIVITY SUMMARY FOR SENIOR HOME MONITORING

Hong Chengl, Zicheng Liu2, Yang Zhaol, Guo Yel

University of Electronics Science and Technology of Chinal, Microsoft Research2


Email: hcheng@uestc.edu.cn.zliu@microsoft.com.zhaoyangl025@gmail.com.yeguoOl12@gmail.com

ABSTRACT

From a senior person's daily activities, one can tell a lot about
the health condition of the senior person. Thus we believe that
senior home activity analysis will play an important role in the
health care of senior people. Toward this goal, we propose a
senior home activity summary system. One challenging prob­
lem in such a real world application is that senior's activities
are usually accompanied by nurse's walking. It is impractical
to predefine and label all the potential activities of all the po­
tential visitors. To address this problem, we propose a novel
feature filtering technique to reduce or eliminate the effects of
the interest points that belong to other people. To evaluate the
proposed activity summary system, we have collected a senior
home activity dataset (SAR), and performed activity recogni­
tion for eating and walking classes. The experimental results
show that the proposed system provides quite accurate activity
summaries for a real world application scenario.
Index Terms- Activity Recognition, Senior Home Moni­
toring, Health Care, Activity Summary, Feature Filtering, Tem­
poral Smoothing

1. INTRODUCTION
Fig. 1. Examples of Our Senior Activity Recognition Dataset.
The world is aging. Many countries will face severe population
aging problems in the near future. In Japan, one of the fastest
aging countries in the world, the ratio of the population under . . . .
plIed outdoors �d md�o�s thus provldmg a�c�rate reports. of
20 to the popUlation over 65 was 9.3 in 1950. By 2025, this
the elderly ph?,slcal actIvity: More�ver, their l!llplementatI?n
ratio is predicted to be 0.59. In China, the percentage of the el-
seems be feasible due to usmg mobIle ph�nes mtegrated With
derly people over 65 is 7.6%. As the world ages, the health care
many sensors, such �s accelerometer, audIO, event, GPS, �t�.
of the senior people is becoming a major social problem. Re-
searchers have proposed to use various types of sensors to mon- f,I0wever, �hese solutIOns do �O! handle ��re complex aC�lVI­
tIes, especlall?, for elderly ��tIVIty recogmtlOn. Our goal IS to
itor people's health conditions at home [1 2 3 4 5]. Moreover
people are investigating home design id�a� t� �nsure that th� �evelop a s�mor p�rson actIVIty sum�ary syste,m th�t ��tomat-
Ically prOVides daIly reports of a semor person s activItIes.
elderly live in their own homes comfortably [1].
We believe that human activity analysis will play an impor- In the past few years, human activity analysis has attracted
tant role in the health care of senior people. From the daily more and more attentions from researchers in computer vision
activities of a senior person, such as how many meals the per- and multimedia community [6, 7, 8, 9, 10]. However, there has
son eats each day, how much time the person sits on the sofa, been little work on senior home activity analysis partly because
how much walk the person does, etc., one can tell a lot about there is no dataset available. We hope that our dataset will
the person's health conditions. Such information will be very contribute to the research community and inspire more research
useful for the senior person's relatives as well as the medical activities in this direction. Fig. 1 shows some samples of our
doctors. Hence, some work have already been proposed to ac- dataset. One practical problem that we encountered in working
tivity recognition using on-body sensors integrated with mo- with the data is that when a senior person is doing non-walking
bile phones [3, 4, 5]. In general, those solutions could be ap- activities such as eating or not at home, the nurse sometimes
walks around in the room. As a result, the video clip may be
This research was partially supported by the grant from NSFC(No. incorrectly classified as walking. This is a general problem of
61075045), Program for New Century Excellent Talents in University, the Na­
tional Basic Research Program of China (No. 2011CB707000), and the Fun­
two different actions occurring at the same time. This problem
damental Research Funds for the Central Universities (No. ZYGX2009X0I3). has not been addressed before. In this paper, we propose to use
We also thank the anonymous reviewers for their valuable suggestions. a feature filtering approach to remove the effect of the nurse

978-1-61284-350-6/111$26.00 ©2011 IEEE


activities.
To generate the activity summary for a whole day, we first
cut a long video into many short clips, and perform classifi­
cation for each short clip. The classification result is usually
noisy. We then use temporal smoothing to filter out the in­
correct classifications before generating the activity summary
report.

Training Slrp Recognilion Slep

Tmimng Video Clips

Fig. 3. Activity recognition results on our Senior Activity


recognition dataset.

We observe that in the testing video clips, the nurse often


walks around in the room. When the senior person is eating
and at the same time the nurse is walking, the video clip may
be incorrectly classified as walking if we treat all the interest
points in the video clip equally.
One potential approach to address this problem is to use a
human tracker [11]. With a human tracker, one could remove
all the points that do not belong to the tracked region. But hu­
Fig. 2. The Proposed framework of Activity Summary. man tracking itself is an open problem. Our system typically
handles continuous video sequences that last multiple hours,
and the senior person may move in and out of the room many
times. These are very challenging problems for a human track­
2. THE PROPOSED ACTIVITY RECOGNITION
ing system. If the tracking fails, the activity summary system
APPROACH
would fail completely.
In order to reduce our system's dependence on human
2.1. System Overview
tracking, we propose a feature filtering approach to reduce or
Our activity summary system consists of three steps: training, eliminate the contributions of those interest points which are
recognizing video clips, and summarization. Fig.2 shows the less likely to belong to the senior person. As will become clear
three basic modules. In this paper, we assume that we are only later, this approach uses tracking to update the dynamic tem­
interested in the senior person at his/her room. There is an im­ plate. The system can still function even if the tracking fails.
portant difference between our system and the traditional ac­ Let Nk denote the number of STIPs in frame k, and
tivity recognition approaches. Traditional activity recognition let Pk,b ..,Pk,Nk denote the interest points. Let Pk,i
· =

approaches typically do not make distinctions between motions (Xk,i, Yk,i) denote the pixel position of Pk,i, 1 :S i :S Nk,
from different people. For example, it is perfectly fine to clas­ k 2: 1. Let Hk,i denote the HOG descriptor at Pk,i' We use
sify both senior walking and nurse walking as walking. But fk to denote the set of all the HOG descriptors for frame k.
in our application, the system would provide a wrong activity That is fk = {Hk,l, ..., Hk,Nk}'
summary of the senior if we did not make distinctions between Our system keeps two templates: a static template 80 and a
the senior and the other people. dynamic template 8k. The static template 80 is created offline.
Similar to [8], we represent an action as a space-time object It consists of the HOG descriptors of all the STIPs that are ex­
and characterize it by a collection of Spatio-Temporal Interest tracted from a small number of manually-selected frames. We
Points(STIPs) [7]. We denote a video sequence by V = {It} manually check to make sure that these STIPs belong to the
and its STIPs by Q ={di}. We use the NBMIM approach as senior person.
the action classifier due to its efficiency, and the detail refers to The dynamic template is maintained automatically and
[8]. changes from frame to frame. Assuming the current frame is k,
the dynamic template 8k-1 is the set of the HOG descriptors
2.2. Feature Filtering
of the STIPs in frame k - 1 that belong to the senior person.
Therefore the dynamic template at frame k is a subset of fk-l'
In our activity summary system, we are interested in classify­ To determine which STIPs belong to the senior at each frame,
ing the activities of a senior person into four categories: 'Se­ our system keeps track of the center of the senior person. We
nior Walking', 'Senior Eating', 'Senior OtherAction', and 'Se­ will describe the tracking algorithm later in this section. After
nior NoAction'. 'Senior NoAction' means the senior is not at we obtain the center of the senior person at frame k 1, we re­
-

home, which is detected when there is no motion. We collected move all the STIPs whose Euclidean distances from the center
training samples for the other three action categories: 'Senior are larger than a pre-specified threshold. The HOG descriptors
Walking', 'Senior Eating', 'Senior OtherAction'. The training of the remaining points form the dynamic template 8k-l.
data only contains the activities of the senior person. Given the static template 80 and dynamic template 8k-1.
for each STIP in frame k we compute a weight based on 3. OUR SENIOR ACTIVITY RECOGNITION
the matching error between the point and the two templates. DATASET
To simplify notation, we denote 0k-I = 80 U 8 k-l. Let
d(Hk,i, Ok-I) denote the closest distance between Hk,i and In order to evaluate our system on real world senior activity
the vectors in Ok-I, that is, data, we have collected a senior activity dataset of 4 month­
long (we named this dataset SAR, which is short for Senior
d(Hk,i, Ok-I) argminHEok_11lH - Hk,ill,
= ( 1) Activity Recognition, and it is located at Jinrui Honghe Gar-
. . .
where II . II is the L2 norm. The slffillanty between Hk,i and den, Chengdu, China)l. Daily activities in senior homes were
the templates is defined as recorded by using one SONY DCR-SR68E camera per room.
In total, there are 6 senior people involved in this project. The
recording lasts for 10 days for each person. The total size of the
(2) recorded data is approximately 1.8TB with 25f/s. Fig.l shows
some example images from the recorded video dataset.
where a is a variance parameter which is set empirically [ 12]. Data labeling is extremely labor intensive. It is still ongo­
tk,i is used as the contribution weight of Hk,! for activity ing. So far, we have finished labeling two activity categories
classification. Let sk i denote the score of Hk,i wIth respect to for one senior person. The two activity categories are 'Eating'
activity class c. The total weighted score of frame k is and 'Walking'. In this paper, we report the performance of our
system using the data of this senior person. Each video clip is
(3) classified as 'Senior Eating', 'Senior Walking', 'Senior Other­
Actions', and 'Senior NoActions'. A video clip is regarded as
no action if no motion is detected from the video clip.
Next we describe how to update the center of the senior so
that we can update the dynamic template to get ready for the
next frame. Let Pk denote the position of the center of the se­ 4. EXPERIMENTAL RESULTS AND ANALYSIS

nior person. The estimation of Pk depends on two different


constraints. The first is the motion smoothness constraint (also We implement activity summary on our Senior Activity Recog­
3. We use the senior persons video
called dynamics constraint). Based the history motion trajec­ nition Dataset in Sect.
tory Pk-l. Pk-2, ..., one can predict Pk by using any predic­ sequences of the first day as the training data, and u�e an 1
tion model. In our implementation, we fit a quadratic curve to hour video sequence each from the second day and thIrd day
Pk-3, Pk-2, Pk-I, and use the curve to obtain the predicted as the test data to evaluate our system. In our experiments, >.
in Eqn.(4) is set to 0.1; a takes 0.8 in Eqn.(2). The dimensions
position for Pk. The predicted position is denoted as Fk.
of HOG and HOF are 72 and 90, respectively. We use Ivan's
The second constraint is the matching constraint, that is, we
STIP implementation to extract interest points 2
would like to minimize the distances from Pk to the points that
are likely to belong to the senior. Formally, Pk is the solution Figure 3 shows the activity recognition results on our senior
to the following optimization problem activity recognition dataset in temporal orders. The red texts on
the images indicate the frame numbers. The green texts are the
Nk activity recognition results. The yellow circles are the detected
argminp(l - >')IIP - All + >. L tk,i11P - Pk,ill, (4) STIPs. Note that the second image on the first row, the last
i=1 image on the second row, and the second image on the third
row all have the nurse walking in the room. Without feature
where>. is a weighting factor that balances the two constraints. filtering, these video clips are incorrectly classified as nurse
The solution to this minimization problem can be derived as walking. With feature filtering, our system correctly recognize
Nk the senior's activities.
P ( 1 - >.)A + >. '"' (5) Table 1 compare activity recognition performance before
=
� tk,iPk,i.
t=I and after feature filtering for the second day . Note that in those
tables we use 'PA' to represent the Action of a specific Person,
When the system starts, the dynamic template is initialized wher; 'P' (Person) can take 'S'(Senior) and 'N' (Nurse), and
to be empty, and it relies o� the static template to ide�tify which 'A'(Action) can take 'E'(Eating), 'W'(Walking), '0' (Other ac­
interest points are more hkely to belong to the semor perso�. tions), and 'N' (No action). The first row consists of two sub­
Each time when the senior walks out of the room, the dynamIC rows where the first sub-row indicates the activities of the se­
template will become empty again. After the senior walks back nior person while the second sub-row indicates the activities of
to the room, the system will again use the static template to the nurse. For example, the column of 'SE' and 'NW' consists
identify the points that belong to the senior person. Sometimes of video clips with the Senior Eating and the Nurse Walking.
the tracked position Pk may be incorrect, for example, Pk may Note that some entries contain two integers separated by
follow the nurse after the nurse and senior overlap. When this '/' such as ' 14/2' in the column of 'SO' plus 'NW' and row
happens, the dynamic template will consist of the points fro� 'SW' in Table 1. The first integer ' 14' is the number of video
the nurse. Even in this situation, the STIPs of the senior wIll clips in this column (Senior Other-action plus Nurse Walking)
still contribute to the activity classification because they match which are classified as 'SW' (Senior Walking) without feature
the points in the static template. As the points on the senior _________

start to dominate (e.g. when the nurse walks out of the room), Ihttp://www.uestcrobot.netlsenioractivity/
the dynamic template will be corrected. 2http://www.irisa.fr/vistalEquipelPeoplelLaptev/interestpoints.html
SE SW SO SN
NW NO NN NW I NN NW NO NN NW NO NN
SE 10 5 53 0 0 110 2 23 0/0 2/0 0
SW 1 0 0 2 12 14/2 3 9 5/0 0/0 0
SO 1 1 1 0 0 7/20 16 79 2/2 3/0 0
SN 0 0 0 0 0 0/0 0 2 0/5 0/5 50
# Clips 12 6 54 2 12 22 21 III 7 5 50
31.82% 0% 0%
Rate 83.30% 83.30% 98.15% 100% 100% 76.19% 71.17% 100%
/90.91% /71.42% 1100%

Table 1. The activity recognition results of the second day

filtering. The second integer '2' is the number of video clips [3] S. Consolvo, D.W. McDonald, T. Toscos, M.Y. Chen,
being classified as 'SW' with feature filtering. For those en­ J. Froehlich, B. Harrison, P. Klasnja, A. LaMarca,
tries with just a single integer, the results are the same with or L. LeGrand, R. Libby, et aI., "Activity sensing in the
without feature filtering. For example, the nurse walking usu­ wild: a field trial of ubifit garden, " in Proceeding of the
ally ('NW') does not affect the results of the senior walking, twenty-sixth annual SIGCHI coriference on Human fac­
thus the results in column 'SW'+'NW' all have a single num­ tors in computing systems. ACM, 2008, pp. 1797-1806.
ber. The final row shows the recognition rates of the activities
in each column. We summarize the results of Table 1 and Table [4] T. Gu, S. Chen, X. Tao, and J. Lu, "An unsupervised ap­
2 into Table 3. From Table 3, the absolute improvements of the proach to activity recognition and segmentation based on
activity recognition rates are 8.27% and 6.7% for the second object-use fingerprints, " Data & Knowledge Engineering,
day and third day, respectively. In total, the activity recognition vol. 69, no. 6, pp. 533-544, 2010.
rate improved from 80.82% without feature filtering to 88.45%
[5] E. Miluzzo, N.D. Lane, K. Fodor, R. Peterson, H. Lu,
with feature filtering. Furthermore, with temporal smoothing,
M. Musolesi, S.B. Eisenman, X. Zheng, and A.T. Camp­
the recognition rate improved to 91.59%.
bell, "Sensing meets mobile social networks: the design,
implementation and evaluation of the cenceme applica­
Before After Improve(%) tion, " in Proceedings of the 6th ACM coriference on Em­
2th Day � =76.82% � =85.1O% 8.27% bedded network sensor systems. ACM, 2008, pp. 337-
3th Day � =86.60% frffi =88.52% 6.70% 350.

Table 2. The total activity recognition performance compari­ [6] J.C. Niebles, H. Wang, and Fei-Fei Li, "Unsuper-
son. vised learning of human action categories using spatial­
temporal words, " International Journal of Computer Vi­
sion, vol. 3, pp. 299-318, 2008.

5. CONCLUSIONS [7] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld,


"Learning realistic human actions from movies, " in IEEE
In this paper, we have presented an activity summary system CVPR, 2008.
for senior home monitoring. We developed a novel feature fil­
tering technique to eliminate the effect of those interest points [8] J. Yuan, Z. Liu, and Y. Wu, "Discriminative subvolume
that belong to other people. To evaluate our system, we have search for efficient action detection, " in IEEE CVPR,
collected a real world dataset consisting of senior people's daily 2009.
activities. The experimental results show that our system pro­
duces accurate activity summaries for a real world application [9] L. Duan, D. Xu, I.w. Tsang, and J. Luo, "Visual Event
scenario. In the future, we are planning on labeling all the peo­ Recognition in Videos by Learning from Web Data, " in
IEEE CVPR, 2010.
ple with more activity categories, and making the dataset pub­
lic. [10] J. Liu, J. Luo, and M. Shah, "Recognizing realistic action
from videos "in the wild ", " in IEEE CVPR, 2009.
6. REFERENCES
[11] R. Messing, C. Pal, and H. Kautz, "Activity Recognition
Using the Velocity Histories of Tracked Keypoints, " in
[1] M. Krafft and K. Coskun, "Design aspects for elderly us­
IEEE ICC V, 2009.
ing a health smart home, " Report, Department of Applied
Iriformation Technology, 2009. [12] C.M. Bishop, Pattern recognition and machine learning,
vol. 4, Springer New York, 2006.
[2] M. Nambu, K. Nakajima, M. Noshiro, and T. Tamura,
"An algorithm for the automatic detection of health con­
ditions, " IEEE Engineering in Medicine and Biology
Magazine, vol. 24, no. 4, pp. 38-42, 2005.

View publication stats

You might also like