Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

A Study on the Effect of Lighting Conditions on the Accuracy of

Heart-Rate estimation From Facial Video


Vishay Raina1

Abstract— Estimating the heart rate (HR) of a subject from peak tracking (SSPT) to address such drawback. In SSPT, a
a facial video is a very interesting problem. A typical approach sparse representation of the spectrum of each source signal
for this problem is to first use face detection to extract the is obtained using the top few significant peaks and these are
facial region from each frame of the video, average the pixel
intensities in each such region to get the Red Blue and Green used for HR estimation by exploiting the slow-varying nature
(RGB) intensity contours and then run Independent Component of HR.
Analysis (ICA) on these contours. In this way, the estimates Raseena et al. [1] propose a maximum likelihood formula-
of the underlying source signals are obtained, whose spectral tion to optimally select a source signal in each window such
peaks are used to predict HR in every analysis window. Here
an effort has been made in the direction of studying the effects
that the predicted HR trajectory not only corresponds to the
of light conditions on the accuracy of the estimated heart rate. most likely spectral peaks but also ensures a realistic HR
Using two different DC operated LED lights, we illuminate the variability (HRV) across analysis windows. The likelihood
faces of the subjects in all sixteen possible permutations of the function was efficiently optimized using dynamic program-
four modes of operation of the lights, and record one minute ming in a manner similar to Viterbi decoding. The proposed
videos in each configuration. Then we will be performing the
processing on the videos to get the intensity contours and finally
scheme for HR estimation was denoted by vICA.
use the vICA scheme proposed by Raseena et al [1] to obtain
the estimated heart rate and the mean absolute error.
II. E FFECT OF LIGHTING CONDITION ON HR
ESTIMATION .
I. INTRODUCTION
Non contact methods of Heart Rate measurement are very In this project we are trying to study the effect of lighting
important in various situations such as: condition on HR estimation.
• A burn victim, on whose body, the electrodes for
ECG can either not be placed or, placing them causes A. Lighting conditions used in previous work
unbearable pain and discomfort
• An athlete whose heart rate needs to be measured while The various light conditions used in the different experi-
performing some dynamic exercises, in order to measure ments conducted have been tabulated below. The most popu-
their maximum heart rate. lar choices were found to be ambient light and white LEDs.
• A patient whose heart rate needs to be monitored over A few researchers have also used studio lights and LEDs of
night and in their sleep they tend to turn towards their specific wavelengths ranging from 650 Hz to 875 Hz. For
side and don’t stay in one position all night. instance, a dual wavelength array of LED light source has
A very good, non-invasive way to measure heart rate is been adopted for pulse oximetry and HR measurements in
photoplethysmography (PPG), where a source of light is [8]
illuminated on subjects skin and the amount of light reflected
back is measured. Here we are using two dedicated light
sources, and arranging them in different permutations based
on their illumination levels and recording the facial video,
to measure the amount of light reflected back. Such a setup
comes under the methodology of Imaging photoplethysmog-
raphy (iPPG).
As we can see, iPPG is a vast problem, with a lot of
good solutions provided to solve the various sub-problems
in this domain. In their paper [2] Poh et al. provided a
way to automate the iPPG process and also take care of
the motion artifacts. Their approach was based on automatic
face tracking along with blind source separation of the
color channels into independent components. But in this
approach, HR is estimated independently in each analysis
Fig. 1. Summary of lighting condition used in previous work
window of the data, thus making it less robust to artifacts
in the video. Gaonkar et al. [3] proposed sparse spectral
Fig. 2. The various configurations of lights shown from 1 to 16.

B. Previous work aligned with the project topic C. Challenges in real life scenarios due to variability in the
ambient light conditions
In the paper by Prakash et al. five different illumination In real life scenarios, the subject’s facial orientation may
intensities were studied based on a combination of sunlight not be optimal for detection. or the lighting condition may
and artificial light. Also it must be noted that different not be suitable. The light intensity may be too low, during
portions of the face were illuminated. Five different settings evening or in a darker place. Similarly it could also be too
were studied, as described below: high, eg. when sitting in front of a screen or in direct sunlight.
• Subjects were not exposed to the sunlight. The constant Also sometimes only a part of the face might be illuminated
ambient (room) light was the only source of illumina- base on the orientation of the subjects face in accordance
tion. with the light source. In our recorded data set we are trying
• Subjects were backlit by the sunlight. Insufficient illu- to model all such challenges by using two different LED
mination projected on subject’s face (the face is darker). lights of variable light intensity, and then recording videos in
• Front lighting (illuminated by the sunlight) was used, different configurations that simulate such real life scenarios.
Sufficient illumination projected on subject’s face (the D. Strategic and systematic variation of lighting conditions
face was brighter and the subject was over-exposed).
The lighting conditions were varied to simulate the fol-
• Subjects were seated perpendicular to the window.
lowing real life scenarios:
Hence, half of the face was exposed to the sunlight
• Subjects were not exposed to the LED light. The
(over-exposed) while another half of the face was less
exposed (under-exposed). constant ambient (room) light was the only source of
• Subjects were seated diagonally to the window. Hence, illumination.
• Subjects faces were partly lit by the LED light. Hence,
almost of three quarters of the face was exposed to the
sunlight (over-exposed) while another quarter of the face almost of three quarters of the face was exposed to the
was less exposed (under-exposed). LED light (over-exposed) while another quarter of the
face was less exposed (under-exposed). Simulating the
In another paper by Liu et al. The LEDs was attached to sub optimal orientation of the subjects face with the
a lens of a digital camera as light source and it supplied light source.
different light intensities and wavelength lights among visible • The step number 2 was repeated by varying the differ-
light spectrum. Light intensity was controlled by Pulse-Width ence in the light intensities if the over exposed and the
Modulation (PWM) current. Extensive experiments, using under exposed portions of the face. And this was done
430nm, 450nm, 470nm, 490nm, 505nm, 525nm, 535nm, for both the left and the right sides of the face
545nm, 570nm, 590nm, 600nm, 610nm, 625nm, 660nm • Both the sides of the subjects were over exposed to
wavelength lights and white light with different light inten- the LED lights. Sufficient illumination projected on
sities subject’s face
The different modes of operation of the lights were num-
bered as:

Fig. 3. The numbers were given to different modes along with intensity
values
Fig. 5. The arrangement of various devices on the table
The light variations were also done very systematically as
explained in the following table. Here L1 in the LED light
to the left of the subject and L2 is the LED light to the right -33 degrees and 0 degrees from the center line, where counter
of the subject. clockwise is considered positive and clockwise is considered
negative. The Sony cybershot camera was placed in the
center, the asus and lenovo cameras to the left and right of
the subject. The light sources were placed 1.31 ft distance
in front of the subject and 45 cm to either side of the center
line at an angle of 48 degrees The oximeter was fixed on the
table towards the right side of the subject and according to
where the subject’s would lie if his/her arm was allowed to
rest on the arm of the chair. The Samsung camera was fixed
on to a tripod right above the oximeter screen in order to
Fig. 4. The sequence in which videos were captured
capture its video in order to obtain the ground truth values.
The subject was requested to relax and look forward and
keep their head movements natural (non-jerky).
III. T HE DATASET
A. Equipment Used
Two different camera phones (Asus and Lenovo), and
one Sony Cybershot camera were used to capture the video
recordings of the faces of the subjects. These were supported
using tripod stands, and their height was adjusted till the
camera was directly in line with the face of the subject such
that no skewness was observed in the captured videos. The
videos were captured at a resolution of 3840 x 2160 for the
Asus videos, 1920 x 1080 for Lenovo Videos, 1280 x 720
for Sony videos. All the videos were recorded at a frame rate
of 30 fps. Battery operated LED Lights with three different
levels of illumination (low, medium, high) were used to Fig. 6. The schematic diagram the setup
illuminate the face of the subject.
Using two such lights with four different modes of opera-
tion (zero, low, medium and high levels of illumination),the C. Recording Procedure
videos could be recorded in 16 different configurations of The subject’s finger was placed in the oximeter, and the
lighting conditions. A commercial pulse oximetry sensor readings in the oximeter were allowed to stabilize. Then all
which is attached to the finger tips of the subject was used the cameras were switched on one by one and then a tone
to determine the true Heart Rate values. A video of the of 1000 Hz frequency and 5 second duration was played.
LED screen of the oximeter was captured using a Samsung This was done such that the microphones in each of the
phone, which was processed and then fed to a Convolutional camera devices would record the audio and this could be later
Neural Network to get the annotated values of the Heart rate used to synchronize the videos to the same starting point.
captured by the oximeter. Then the cameras were allowed to record for one minute and
the cameras were turned off. This was repeated for sixteen
B. Recording Setup different lighting conditions and for two different distances
The three cameras (Asus, Lenovo, Sony) were placed i.e. 1 ft and 1.5 ft, thus making a total of 32 videos per
along the 1 ft or 1.5 ft arc on a table at the angles 33 degrees, subject.
Fig. 7. Architecture of the CNN used for annotations.

30 such subjects were recorded out of which 23 were A. Convolutional Neural Network
males and 7 were females. After the digits were segregated,the image was rescaled
D. Recorded Data and Arrangement to 28x28. The resized images were fed into a CNN also
for recognition. The CNN architecture consisted of two
The arrangement was done in such a way that the DATA convolutional layers (conv1 and conv2), two max pooling
folder has 30 sub-folders, each named after the subject. Each layers (mpool1 and mpool2) and two fully-connected layers
of those folders contain 4 folders for 4 different cameras. (fc1 and out). The output of fc1 is passed to a 10-way
Each of those have 2 folders based on the 2 distances. softmax, which produces a probability distribution over ten
The distance folders contain the corresponding videos And labels (i.e. 0-9).
another folder named Features for saving the RGB intensity The first convolution Layer has 32 filters and a kernel
contours .mat files. The folder delay saves the mat files after size of 5, the activation function used is ReLU. The second
the videos are synchronized for having the same starting Convolution Layer has 64 filters and a kernel size of 3 and
point. the activation function used is ReLU.
Both convolutional layers use a unit stride. Both the max
pooling layers perform down-sampling with strides of 2 and
kernel size of 2.
The the data is flattened to a 1-D vector for the fully
connected layer (fc1). Fully connected layer fc1 has 1024
neurons.
To reduce over-fitting, a dropout layer was inserted be-
tween the two fully connected layers with a dropout rate of
0.75.
B. Training Data

Fig. 8. Images of all 30 subjects recorded for the dataset

Fig. 9. Training data examples


IV. A NNOTATIONS
To make the annotations easy to do, a Convolutional The training data was generated by using a thresholding
Neural Network was trained for digit recognition using based ground truth generation algorithm described as fol-
around three hundred thousand images of digits from 0 to lows:
9. This data was generated using videos captured of the • Manually annotate the HR values displayed in the
oximeter by the same camera, then manually annotating it oximeter in one particular frame.
and then segregating the digits. The images were transformed • Choose a rectangle around the HR values in the frame.
into grayscale. Then they were fed into a CNN to train it to • For each frame find the Frobenius norm of the difference
recognize the different digits. between the two adjacent frames.
• Plot these difference values and choose a threshold the underlying source signals ICA finds a demixing matrix
determining where there is a change in the heart rate W that is an approximation of the inverse of the original
between two adjacent frames. mixing matrix A in the formulation given below.
• Manually input the value of heart rate in case of such y = Ax (1)
changes, else use the previous values of the heart rate. where y=[y1, y2, y3]’ , x=[x1, x2, x3]’
and A is the mixing matrix.Thus
C. Testing Data x* = W y (2)
Ten thousand image, value pairs from the training data is an estimate of the vector x(t) containing the underlying
were kept for testing. Keeping the learning rate at 0.001, source signals.
batch size of 128 images, an accuracy of 98.43 % was
achieved. C. Frequency Domain Analysis of Extracted Sources
After extracting underlying source signals using ICA, fre-
D. Getting Predictions quency domain analysis of all source signals is carried out to
To annotate the HR values obtained from the oximeter find the frequency with the maximum amplitude in the range
which were captured in the videos recorded by Samsung 0.75Hz to 4Hz which correspond to the heartbeat frequency
camera phone, the following steps were followed: of 45 beats per minute (bpm) and 240 bpm respectively.
• Read image frames of a video file. R EFERENCES
• Extract the portion of the image with HR digits.
[1] Raseena K T, Prasanta Kumar Ghosh. ”A Maximum Likelihood
• Find out if its a two or three digit number with template Formulation To Exploit Heart Rate Variability for Robust Heart Rate
matching. Estimation From Facial Video.”
• Segregate each of the digits and make a text file con- [2] Ming-Zher Poh, Daniel J. McDuff, Rosalind W. Picard. ”Non-contact,
automated cardiac pulse measurements using video imaging and blind
taining their location. source separation.”
• Feed this text file into the predict function which feeds [3] Aditya Gaonkar P, Bhuthesh R, Dipanjan Gope, Prasanta Kumar
the images into the CNN. Ghosh. ”Robust real-time pulse rate estimation from facial video using
sparse spectral peak tracking.”
• Then put the digits together to return a two or three [4] Xuan Yang, Jing Pu. ”MDig: Multi-digit Recognition using Convolu-
digit number. tional Nerual Network on Mobile.”
• Append this number to list and save this file. [5] Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote
plethysmographic imaging using ambient light. In: Optics express
16.26 (2008), pp. 2143421445.
V. T ECHNIQUES [6] Video based heart rate estimation under different light illumination
In this section the major steps in the methodology to intensities Yong-Poh Yu ; Raveendran Paramesran ; Chern-Loon Lim
[7] The Effect of Light Conditions on Photoplethysmographic Image
estimate the heart rate are explained. First, the facial region Acquisition Using a Commercial Camera He Liu ; Yadong Wang ;
of interest (ROI) is detected for all the frames in the video Lei Wan
and the pixel intensities are averaged over the ROI and a [8] K. Humphreys et al., ”Noncontact simultaneous dual wavelength pho-
toplethysmography: A further step toward noncontact pulse oximetry”,
3xN matrix (containing the R G B intensity contours), is Rev. Sci. Instrum., vol. 78, no. 4, pp. 044304-1-044304-6, 2007.
generated, where N is the number of frames. Then, source [9] M.-Z. Poh, D. J. McDuff, and R. W. Picard, Advancements in non-
signal are separation is done by performing Independent contact, multiparameter physiological measurements using a webcam,
IEEE transactions on biomedical engineering, vol. 58, no. 1, pp. 711,
Component Analysis (ICA) is performed on the intensity 2011.
contours . The spectra of all source signals are analysed
to estimate heart rate from each underlying signal. And
then maximum likelihood estimation is carried out to obtain
the target heart rate variation. The details of each step is
explained as follows:
A. ROI Detection
Facial region is identified from each frame of the video
using the built in implementation of the Viola-Jones face
detection algorithm. An average of RGB values at all the
pixels within the facial ROI is calculated and concatenated to
get pixel intensity contours corresponding to the Red, Green
and Blue channel, and t corresponds to the frame number.
B. Extracting BVP from facial ROI data
ICA is a technique for extracting underlying independent
signals from a set of observations which are linear combi-
nations of the independent components. It is used to get the
BVP (blood volume pulse) signal from the intensity contours.
Let y1, y2, y3 are the intensity contours and x1, x2, x3 are

You might also like