Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Human Fall Detection during Activities of Daily


Living using Extended CORE9
Shobhanjana Kalita Arindam Karmakar Shyamanta M Hazarika
Dept. of Computer Science & Engg Dept. of Computer Science & Engg Dept. of Mechanical Engg
Tezpur University, Assam, India Tezpur University, Assam, India IIT Guwahati, Assam, India
kalitas@tezu.ernet.in arindam@tezu.ernet.in s.m.hazarika@iitg.ac.in

Abstract—There is an ever-increasing need for automated mon- is often incapacitated by inadvertent neglect to wear the
itoring systems to enable independent living in today’s elderly sensors [3]. On the other hand, vision based approaches use
population. Human fall detection is widely researched within only video data and therefore are much less intrusive. Further,
the field of assistive technologies. However, fall detection systems
based on wearable sensors is often incapacitated by inadvertent it does not require active participation of the person being
neglect to wear the sensors. Computer vision based approaches monitored.
to detect a human fall in video holds promise. It has been Vision-based fall detection may use videos from a single
noted that an informative yet compact representation schema can RGB camera, multiple RGB cameras for depth perception or
significantly improve the performance of video understanding. depth cameras. Fall detections using a single RGB camera
For human activity recognition, Extended CORE9 has been used
for obtaining a qualitative spatial description of the video activity. have been extensively studied, as such systems are easy to set
The spatial description of an activity obtained using Extended up and are inexpensive [2]. In this paper we present a vision-
CORE9 along with the temporal information can be encoded based approach for human fall detection that use the tracking
within a graph structure. Extended CORE9 has been applied for data of the videos for detecting fall. We use videos from
human-human and human-object interactions. In this paper, we a single camera but do not specify any restrictions whether
show how Extended CORE9 can be applied for single-person
activities. We evaluate our approach for detecting fall in an the videos are from simple RGB cameras or depth cameras.
assisted living environment. Experiments performed on the UR The only requirement of our approach is a decent underlying
Fall Detection dataset shows promising results. part-based tracking mechanism [4], [5]. For videos obtained
Index Terms—Human fall detection, temporal graph, activities using Kinect RGBD cameras, skeletal tracking is an inbuilt
of daily living, Extended CORE9 functionality that can be used for obtaining a part-based model
of the human body.
I. I NTRODUCTION Fig. 1 shows an example of a fall activity from the UR
One of the fundamental aims of assistive technology is to Fall Detection dataset [6]. For human activity recognition,
provide better health care to those in need. Such technologies abstracting the human body as a set of rectangles such that
are expected to allow elderly, disabled, over weight or obese each rectangle bounds a body part, performs better than
people to live independently in their own home without chang- when the human body is abstracted using a single rectangular
ing their life style. To make that possible there is an increasing bounding box [7]. Such an abstraction of the human body as
demand for intelligent monitoring systems that can detect a set of related rectangles is termed extended object. Kalita
emergency situations and notify concerned relatives or health- et. al. [7], proposed a framework for computing qualitative
care representatives [1]. This makes human fall detection spatial relations between a pair of extended objects. The
(during activities of daily living) an important functionality spatial description together with the temporal information can
in assisted living technologies. Research has shown that falls be encoded within a graph representation [8]. Experiments
are one of the main reasons for injuries and even death for reported therein for multi-class human-human interactions and
seniors or people in need [1]. Monitoring systems that are human-object interactions are comparable to the state-of-the-
able to detect human falls and raise appropriate alerts are being art. In this paper, we present an approach that makes it possible
widely researched. to apply Extended CORE9 within the graph representation for
Existing approaches for human fall detection can be widely single-person activities. The paper explores a purely vision
categorized into two groups: non-vision based and vision based approach that uses Extended CORE9 based spatial
based [2]. Non vision based technologies use a variety of description that has been reported to give good results for
sensors, principal among which is the accelerometer. However, human interactions. We evaluate our approach for detecting
such approaches require the patient or elderly being monitored fall in an assisted living environment. Experiments performed
to actively cooperate or participate by wearing the sensors at on the UR Fall Detection dataset [6] show promising results.
all times. This may be an inconvenience, because of several The rest of the paper is organized as follows: a brief
reasons - the person may find it uncomfortable or forget to overview of vision based fall detection systems and human ac-
wear it. Fall detection systems based on wearable sensors tivity recognition systems is given in Section II. In Section III

978-1-5386-7989-0/19/$31.00 ©2019 IEEE


2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Fig. 1. Fall activity from the UR Fall Detection dataset [6]

we discuss the Extended CORE9 framework and the proposed considered to be more reliable [11]. The approach presented
approach for using it in the context of single person activities. in this paper for fall detection, is a combination of modelling
The experimental set-up and the classification results obtained change of body shape and motion pattern of the head. We
are discussed in Section IV; Section V concludes the paper. consider how the relative position of the head changes with
respect to the rest of the body during a fall scenario. This is
II. L ITERATURE R EVIEW done by analyzing the change of qualitative direction relation
Human activity recognition (HAR) deals with recognition of of the head with the rest of the body over time.
activities involving humans within a video [9]. In literature,
human activities have been classified into several categories B. KR & R for Activity Recognition in Video
depending on their complexity and the number of people Knowledge Representation and Reasoning (KR & R) deals
involved. Interactions have been defined as human activities with how symbolic knowledge can be used for automated
involving two or more humans or humans and objects; whereas reasoning instead of quantitative information. Within KR & R,
single person activities have been termed as actions [9], [10]. Qualitative Reasoning deals with qualitative abstractions of the
HAR is a well-researched area of computer vision because quantitative information (obtained from the physical world) to
of its wide range of applications. Applications in automated be used as symbolic knowledge. Qualitative Spatio-Temporal
surveillance systems require recognition of suspicious human Reasoning (QSTR), which deals with qualitative abstractions
interactions or group actions; patient monitoring systems and of space and time, have often been used for a more intuitive
assistive technologies require recognition of single person description of video activities [15], [16].
activities or human-object interactions; human computer inter- Within QSTR several different aspects of space and time
actions require recognition of human actions or gestures [9]. have been discussed in literature [17]. Topology is a popular
aspect of space that deals with relations unaffected by change
A. Vision based Fall Detection of shape or size of objects. The Region Connection Calculus
Fall detection is an important aspect of patient monitoring (RCC) is a notable framework for topological relations [18].
systems and assistive technologies. Fall has been classified Fig. 2 shows the eight basic relations of the RCC framework
into - fall from sleeping, fall from standing or walking, fall (termed RCC8). Another aspect of space for which qualita-
from sitting and fall from standing on support [1]. The UR Fall tive abstraction has been extensively discussed is direction.
detection dataset on which we have performed our experiments In this context, the Cardinal Direction Calculus (CDC) is
include fall from standing or walking and fall from sitting. one of the most intuitive notions [19]. Fig. 3 shows how
Vision based fall detection techniques have three broad cardinal direction relations can be specified for a pair of
approaches - detecting inactivity period of a person on the objects. Qualitative distance relations has also been discussed
floor, change of shape of the persons body when falling down in [20]. Such qualitative spatial relations computed using
and analysis of the motion pattern of the head of the human Extended CORE9, are used within this work for the spatial
body [11]. Detecting the period of inactivity for fall detection description of activities.
requires tracking of the human body and determining whether
III. P ROPOSED M ETHOD
the person is actively changing position or orientation [12].
Detection of change of body shape is a popular approach A. Extended CORE9
that focuses on modelling how the shape of the body changes CORE9 is a comprehensive rectangle representation that al-
during a fall event [13]. Analysis of the motion pattern of the lows an integrated representation of several interesting spatial
head involves the idea that in a fall activity the head of the information between a pair of single-piece, axis-aligned rect-
person moves downwards to the floor very fast, in contrast to angles, viz. topology, direction, size, distance, and motion [21].
normal activities [14]. However, inactivity of a person could It uses the state information of the nine cores formed using
also be due to sleeping or lying down. Therefore, detection the pair of rectangles (see Fig. 4) for obtaining the qualitative
of body shape change and motion pattern of the head are spatial relations. Since CORE9 is incapable of providing an
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Fig. 2. The base relations of RCC8 [18]

Fig. 5. The cores of Extended CORE9 for two extended objects A and B [7].
. xy is {ai , bj |ai , bj ∈ corexy } or φ
The extended state information of core

it is possible to use such a framework for description and


classification of single person human activities.
An extended object, A, comprising of m components can
Fig. 3. (a) Cardinal directions of b (b) a S b (c) a NE:E b be written as,
(d) a B:S:SW:W:NW:E:SE b [19]
A = {ai |ai is an axis-aligned rectangle
bounding a component of A, 1 ≤ i ≤ m}
efficient representation of extended objects, an extension of
CORE9 was proposed in [7]. In this context, extended objects We can define extended objects as follows:
are defined as a set of axis-aligned rectangles, where each
Definition 1. The collection of extended objects over a set of
rectangle is an approximation of a component of the object. An
axis-aligned rectangles can be recursively defined as follows:
extended object abstraction of the human body uses separate
bounding boxes for separate body parts allowing for a more 1) the empty set φ is an extended object and is termed a
precise description of the body than compared to abstraction null object
using a single MBR for the whole body. Within the Extended 2) a singleton set consisting of one axis-aligned rectangle
CORE9 framework, the extended state information of the nine is an extended object
cores using the MBRs of the set of rectangles (as shown in 3) if A and B are extended objects then, A ∪ B, A ∩ B and
Fig. 5) is used along with a recursive algorithm to opportunisti- A − B are extended objects
cally compute the qualitative spatial relations, viz., topological, Within the Extended CORE9 framework, humans and ob-
directional and distance relations, between the objects. It has jects involved in human-human and human-object interactions
been found that such an extended object representation leads are abstracted as extended objects. The Extended CORE9
to better classification of human activities [7]. framework is then used to compute binary relations between
the pair of extended objects. In a single person activity, there is
only one human body involved in the activity, and traditionally
that would be abstracted as a single extended object. However,
to use the Extended CORE9 framework for such activities, we
decompose the set of axis-aligned rectangles into two sets. It
follows from Definition 1, that these subsets of the extended
object are also extended objects.
We partition the set of rectangles A abstracting the human
body into two disjoint subsets, H and B such that,
• Union of H and is A; H ∪ B = A
Fig. 4. The nine cores of CORE9 for a pair of rectangles A and B [21]. The • Intersection of H and B is an empty set; H ∩ B 6= φ
state information of a core can be: A, B, AB,  or φ • H and B are non empty sets; H 6= φ; B 6= φ
From Definition 1, H and B are extended objects. The set
of rectangles comprising H corresponds to a set of reference
B. Single Person Activities through Extended CORE9 components of A. The set B is obtained using the set differ-
The Extended CORE9 framework described in [7], com- ence A − H.
putes binary qualitative spatial relations between a pair of The Extended CORE9 framework can now be applied on
extended objects for description and classification of human- the two new extended objects, H and B, to compute the
human interactions and human-object interactions. However, topological, directional and distance relations between them.
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

In this paper, since we are trying to classify fall activities proposed method only requires a differentiation of the parts
within the UR Fall detection dataset we choose the head of into reference components and non-reference components.
the human body to be the reference component. H is the set The qualitative spatial relations between the two sets of
of one rectangle corresponding to the head and B is the set components is computed using Extended CORE9 and encoded
of rectangles corresponding to the rest of body parts. within a graph structure [8]. For example, the activity shown
Fig. 6 shows the sets H and B for the fall activity depicted in Fig. 1 can be represented as a temporal graph as shown
in Fig. 1. The corresponding Extended CORE9 matrix is in Fig. 8. Here, nodes correspond to components; labels on
shown in Fig. 7. The qualitative relations computed using the solid edges represent the qualitative spatial relation between
Extended CORE9 framework between the head and rest of the a pair of components; dotted edges represent the temporal
body parts give a structural description of the body posture evolution of the component throughout the activity. The labels
during a fall. We represent the sequence of spatial relations on the solid edges between a pair of components, a and b,
between the extended objects thus computed using graphs; a representing the qualitative spatial relation between them is
graph-kernel based SVM classification is then used to classify a three tuple, htop, dir, disi. Where top is the RCC5 relation
fall activities. between a and b, dir is the CDC relation and dis is qualitative
For example, the human bodies in the activities are ab- distance relation. We then use a graph-kernel based SVM to
stracted as extended objects which is a set of components classify the activities represented as graphs.
corresponding to the head, hands and legs. As discussed above,
we partition this set into two disjoint sets - set of reference A. Experimental Results
components and set of non-reference components. We consider We perform a 10-fold cross-validation and have obtained a
the head to be the reference component. The partitioned sets classification accuracy of 94.28%. The confusion matrix for
- {head} and {lefthand, righthand, leftleg, rightleg} - are the experiments is given in Table I.
also extended objects. The recursive algorithm of Extended
CORE9 is then used to compute the topological, directional ADL Fall
ADL 38 2
and distance relations between the two extended objects for Fall 2 28
each frame considered [7]. The activities are represented as TABLE I
graphs and the graph-kernel based SVM classifier discussed C ONFUSION MATRIX FOR ACTIVITIES IN UR FALL D ETECTION DATASET
in [8] is used to classify fall activities from ADL.
IV. E XPERIMENTAL S ET- UP Features Sens(%) Spec(%) Acc (%)
We evaluate our proposed approach for representing single Kwolek et al [6] Depth + 100 96.67 98.33
Acceleration
person activities using Extended CORE9 on the UR Fall Kwolek et al [6] Depth 100 83 90
Detection Dataset [6]. The UR Fall Detection dataset consists Yun et al [22] RGB 96.77 89.77 94
of RGBD videos for 30 instances of fall activities and 40 Proposed method RGB 93.33 95 94.28
TABLE II
instances of activities of daily living. The activities of daily C OMPARISON OF RESULTS REPORTED FOR UR FALL D ETECTION DATASET
living videos include instances that involve a single person
walking, sitting, lying down and picking up an object from the
floor. All activities in the dataset are single-person activities
that may or may not involve another object. For each fall ac- B. Discussion
tivity, there are two videos captured using cameras positioned In our experiments, we have considered only the tracking
parallel to the ground and mounted on the ceiling. In addition data for RGBD videos obtained from a single camera that is
to the video data captured for each activity, accelerometric parallel to the floor. Even though tracking is done manually for
data is captured using an accelerometer that is carried by the our experiments, automated tracking can be easily done given
person performing the activity. However, for our experiments the RGBD videos obtained using a Kinect. As mentioned ear-
we consider only videos from camera parallel to the ground. lier, we do not use the accelerometric data in our experiments.
It has been shown that using the additional accelerometric Further, we perform our experiments on all 30 Fall sequences
data significantly improves classification accuracy [6]. But we and 40 ADL sequences, of which 11 ADL instances involve
do not consider the accelerometric data because in a general the person lying down on the bed or couch or floor.
assisted living environment, additional equipment may not We have achieved a classification accuracy of 94.28% from
always be available or may raise the cost. our experiments. The results reported in [6], show a very high
We take the RGBD videos and manually label the humans classification accuracy of 98.33% using depth and accelero-
for every 10th frame of the videos. As discussed in Sec- metric data; the classification accuracy when using only depth
tion III-B, the reference components and non-reference com- data is reported to be 90.0%. However, on close examination
ponents of the extended objects are identified. It is to be noted our results are comparable to theirs. This is because their
that there are several automated part-based tracking algorithms results are based on experiments performed on 30 fall activity
that can be used obtain the annotated body parts [4], [5]. Even sequences and only 30 ADL sequences. As shown in Table II,
if the body parts are not labeled as head, hands, or legs, the the proposed method outperforms all results reported for UR
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

Fig. 6. Extended object abstraction of Fall activity shown in Fig. 1

Fig. 7. Extended CORE9 matrix of Fall activity shown in Fig. 1 and Fig. 6

Fig. 8. The temporal graph representation for the Fall activity depicted in Fig. 1

Fall detection dataset using only the video data, to the best of bounding box [7]. Encoding the spatial information within
our knowledge. a graph representation provides classification results that are
We have noted that the two misclassified ADL activities comparable to the state-of-the-art for human-human or human-
in our experiments correspond to activities that involve the object interactions [8]. In this paper we have given an ap-
person lying down on the floor. However, the difference proach by which the Extended CORE9 framework can also
between a person falling down and lying down on the floor be applied for single person activities. We have shown via
is the speed at which the person reaches the floor. Therefore, experiments performed on the UR Fall Detection dataset that
the accelerometric data could have provided the necessary such an approach can be effectively applied for detecting
information to give a proper classification. fall scenarios during activities of daily living in an assisted
living environment. We have noted that our approach is not
V. C ONCLUSION always able to distinguish between a lying down activity
For human activity recognition, abstraction human bodies and fall activity because the speed of the activity has not
as extended objects has been shown to give better results been taken into account. Research towards incorporating speed
compared to abstracting human bodies only using a single information within the qualitative framework is part of future
2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)

work. Another interesting and important direction of future [22] Y. Yun and I. Y. Gu, “Human fall detection via shape analysis on
work could be towards online recognition of fall activities. riemannian manifolds with applications to elderly care,” in 2015 IEEE
International Conference on Image Processing (ICIP), 2015, pp. 3280–
3284.
R EFERENCES
[1] M. Mubashir, L. Shao, and L. Seed, “A survey on fall detection:
Principles and approaches,” Neurocomputing, vol. 100, pp. 144 – 152,
2013.
[2] Z. Zhang, C. Conly, and V. Athitsos, “A survey on vision-based fall
detection,” in Proceedings of the 8th ACM International Conference on
PErvasive Technologies Related to Assistive Environments. ACM, 2015,
pp. 46:1–46:7.
[3] C. Lin, S. Wang, J. Hong, L. Kang, and C. Huang, “Vision-based fall
detection through shape features,” in 2016 IEEE Second International
Conference on Multimedia Big Data (BigMM), 2016, pp. 237–240.
[4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
“Object Detection with Discriminatively Trained Part-Based Models,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
(PAMI), vol. 32, no. 9, pp. 1627–1645, 2010.
[5] Y. Yang and D. Ramanan, “Articulated Pose Estimation with Flexible
Mixtures-of-parts,” in IEEE Conference on Computer Vision and Pattern
Recognition, ser. CVPR. IEEE Computer Society, 2011, pp. 1385–1392.
[6] B. Kwolek and M. Kepski, “Human fall detection on embedded platform
using depth maps and wireless accelerometer,” Computer Methods and
Programs in Biomedicine, vol. 117, pp. 489 – 501, 2014.
[7] S. Kalita, A. Karmakar, and S. M. Hazarika, “Efficient extraction of
spatial relations for extended objects vis-á-vis human activity recognition
in video,” Applied Intelligence, vol. 48, no. 1, pp. 204–219, 2018.
[8] ——, “A temporal activity graph kernel for human activity classifica-
tion,” in 11th Indian Conference on Computer Vision, Graphics, and
Image Processing, 2018.
[9] J. Aggarwal and M. Ryoo, “Human Activity Analysis: A Review,” ACM
Computing Surveys, vol. 43, no. 3, pp. 16:1–16:43, 2011.
[10] M. Vrigkas, C. Nikou, and I. A. Kakadiaris, “A review of human activity
recognition methods,” Frontiers in Robotics and AI, vol. 2, p. 28, 2015.
[11] M. M. A. J. Yoosuf Nizam, Mohd Norzali Haji Mohd, “A study on
human fall detection systems: Daily activity classification and sensing
techniques,” International Journal of Integrated Engineering, vol. 8,
no. 1, pp. 35 – 43, 2016.
[12] B. Jansen and R. Deklerck, “Context aware inactivity recognition for
visual fall detection,” in Pervasive Health Conference and Workshops,
2006.
[13] N. Thome and S. Miguet, “A hhmm-based approach for robust fall de-
tection,” in 2006 9th International Conference on Control, Automation,
Robotics and Vision, 2006, pp. 1–8.
[14] C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau, “Fall detection
from human shape and motion history using video surveillance,” in Pro-
ceedings of the 21st International Conference on Advanced Information
Networking and Applications Workshops - Volume 02, ser. AINAW ’07.
IEEE Computer Society, 2007, pp. 875–880.
[15] K. S. R. Dubba, M. Bhatt, F. Dylla, D. C. Hogg, and A. G. Cohn, “In-
terleaved Inductive-Abductive Reasoning for Learning Complex Event
Models,” in International Conference on Inductive Logic Programming
(ILP), ser. LNCS, vol. 7207. Springer, 2012, pp. 113–129.
[16] M. Sridhar, A. G. Cohn, and D. C. Hogg, “Benchmarking Qualitative
Spatial Calculi for Video Activity Analysis,” in IJCAI Workshop Bench-
marks and Applications of Spatial Reasoning, 2011, pp. 15–20.
[17] A. G. Cohn and S. M. Hazarika, “Qualitative spatial representation and
reasoning: An overview,” Fundamenta Informaticae, vol. 46, no. 1-2,
pp. 1–29, 2001.
[18] D. A. Randell, Z. Cui, and A. Cohn, “A Spatial Logic Based on Regions
and Connection,” in 3rd International Conference on Principles of
Knowledge Representation and Reasoning (KR’92), B. Nebel, C. Rich,
and W. Swartout, Eds. Morgan Kaufmann, 1992, pp. 165–176.
[19] S. Skiadopoulos and M. Koubarakis, “On the consistency of cardinal
directions constraints,” Artificial Intelligence, vol. 163, pp. 91 – 135,
2005.
[20] T. Bittner and M. Donnelly, “A formal theory of qualitative size
and distance relations between regions,” in 21st Annual Workshop on
Qualitative Reasoning (QR07), 2007.
[21] A. G. Cohn, J. Renz, and M. Sridhar, “Thinking Inside the Box: A
Comprehensive Spatial Representation for Video Analysis,” in the 13th
International Conference on Principles of Knowledge Representation
and Reasoning (KR2012). AAAI Press, 2012, pp. 588 – 592.

You might also like