Professional Documents
Culture Documents
Vishay Report
Vishay Report
Abstract— Estimating the heart rate (HR) of a subject from peak tracking (SSPT) to address such drawback. In SSPT, a
a facial video is a very interesting problem. A typical approach sparse representation of the spectrum of each source signal
for this problem is to first use face detection to extract the is obtained using the top few significant peaks and these are
facial region from each frame of the video, average the pixel
intensities in each such region to get the Red Blue and Green used for HR estimation by exploiting the slow-varying nature
(RGB) intensity contours and then run Independent Component of HR.
Analysis (ICA) on these contours. In this way, the estimates Raseena et al. [1] propose a maximum likelihood formula-
of the underlying source signals are obtained, whose spectral tion to optimally select a source signal in each window such
peaks are used to predict HR in every analysis window. Here
an effort has been made in the direction of studying the effects
that the predicted HR trajectory not only corresponds to the
of light conditions on the accuracy of the estimated heart rate. most likely spectral peaks but also ensures a realistic HR
Using two different DC operated LED lights, we illuminate the variability (HRV) across analysis windows. The likelihood
faces of the subjects in all sixteen possible permutations of the function was efficiently optimized using dynamic program-
four modes of operation of the lights, and record one minute ming in a manner similar to Viterbi decoding. The proposed
videos in each configuration. Then we will be performing the
processing on the videos to get the intensity contours and finally
scheme for HR estimation was denoted by vICA.
use the vICA scheme proposed by Raseena et al [1] to obtain
the estimated heart rate and the mean absolute error.
II. E FFECT OF LIGHTING CONDITION ON HR
ESTIMATION .
I. INTRODUCTION
Non contact methods of Heart Rate measurement are very In this project we are trying to study the effect of lighting
important in various situations such as: condition on HR estimation.
• A burn victim, on whose body, the electrodes for
ECG can either not be placed or, placing them causes A. Lighting conditions used in previous work
unbearable pain and discomfort
• An athlete whose heart rate needs to be measured while The various light conditions used in the different experi-
performing some dynamic exercises, in order to measure ments conducted have been tabulated below. The most popu-
their maximum heart rate. lar choices were found to be ambient light and white LEDs.
• A patient whose heart rate needs to be monitored over A few researchers have also used studio lights and LEDs of
night and in their sleep they tend to turn towards their specific wavelengths ranging from 650 Hz to 875 Hz. For
side and don’t stay in one position all night. instance, a dual wavelength array of LED light source has
A very good, non-invasive way to measure heart rate is been adopted for pulse oximetry and HR measurements in
photoplethysmography (PPG), where a source of light is [8]
illuminated on subjects skin and the amount of light reflected
back is measured. Here we are using two dedicated light
sources, and arranging them in different permutations based
on their illumination levels and recording the facial video,
to measure the amount of light reflected back. Such a setup
comes under the methodology of Imaging photoplethysmog-
raphy (iPPG).
As we can see, iPPG is a vast problem, with a lot of
good solutions provided to solve the various sub-problems
in this domain. In their paper [2] Poh et al. provided a
way to automate the iPPG process and also take care of
the motion artifacts. Their approach was based on automatic
face tracking along with blind source separation of the
color channels into independent components. But in this
approach, HR is estimated independently in each analysis
Fig. 1. Summary of lighting condition used in previous work
window of the data, thus making it less robust to artifacts
in the video. Gaonkar et al. [3] proposed sparse spectral
Fig. 2. The various configurations of lights shown from 1 to 16.
B. Previous work aligned with the project topic C. Challenges in real life scenarios due to variability in the
ambient light conditions
In the paper by Prakash et al. five different illumination In real life scenarios, the subject’s facial orientation may
intensities were studied based on a combination of sunlight not be optimal for detection. or the lighting condition may
and artificial light. Also it must be noted that different not be suitable. The light intensity may be too low, during
portions of the face were illuminated. Five different settings evening or in a darker place. Similarly it could also be too
were studied, as described below: high, eg. when sitting in front of a screen or in direct sunlight.
• Subjects were not exposed to the sunlight. The constant Also sometimes only a part of the face might be illuminated
ambient (room) light was the only source of illumina- base on the orientation of the subjects face in accordance
tion. with the light source. In our recorded data set we are trying
• Subjects were backlit by the sunlight. Insufficient illu- to model all such challenges by using two different LED
mination projected on subject’s face (the face is darker). lights of variable light intensity, and then recording videos in
• Front lighting (illuminated by the sunlight) was used, different configurations that simulate such real life scenarios.
Sufficient illumination projected on subject’s face (the D. Strategic and systematic variation of lighting conditions
face was brighter and the subject was over-exposed).
The lighting conditions were varied to simulate the fol-
• Subjects were seated perpendicular to the window.
lowing real life scenarios:
Hence, half of the face was exposed to the sunlight
• Subjects were not exposed to the LED light. The
(over-exposed) while another half of the face was less
exposed (under-exposed). constant ambient (room) light was the only source of
• Subjects were seated diagonally to the window. Hence, illumination.
• Subjects faces were partly lit by the LED light. Hence,
almost of three quarters of the face was exposed to the
sunlight (over-exposed) while another quarter of the face almost of three quarters of the face was exposed to the
was less exposed (under-exposed). LED light (over-exposed) while another quarter of the
face was less exposed (under-exposed). Simulating the
In another paper by Liu et al. The LEDs was attached to sub optimal orientation of the subjects face with the
a lens of a digital camera as light source and it supplied light source.
different light intensities and wavelength lights among visible • The step number 2 was repeated by varying the differ-
light spectrum. Light intensity was controlled by Pulse-Width ence in the light intensities if the over exposed and the
Modulation (PWM) current. Extensive experiments, using under exposed portions of the face. And this was done
430nm, 450nm, 470nm, 490nm, 505nm, 525nm, 535nm, for both the left and the right sides of the face
545nm, 570nm, 590nm, 600nm, 610nm, 625nm, 660nm • Both the sides of the subjects were over exposed to
wavelength lights and white light with different light inten- the LED lights. Sufficient illumination projected on
sities subject’s face
The different modes of operation of the lights were num-
bered as:
Fig. 3. The numbers were given to different modes along with intensity
values
Fig. 5. The arrangement of various devices on the table
The light variations were also done very systematically as
explained in the following table. Here L1 in the LED light
to the left of the subject and L2 is the LED light to the right -33 degrees and 0 degrees from the center line, where counter
of the subject. clockwise is considered positive and clockwise is considered
negative. The Sony cybershot camera was placed in the
center, the asus and lenovo cameras to the left and right of
the subject. The light sources were placed 1.31 ft distance
in front of the subject and 45 cm to either side of the center
line at an angle of 48 degrees The oximeter was fixed on the
table towards the right side of the subject and according to
where the subject’s would lie if his/her arm was allowed to
rest on the arm of the chair. The Samsung camera was fixed
on to a tripod right above the oximeter screen in order to
Fig. 4. The sequence in which videos were captured
capture its video in order to obtain the ground truth values.
The subject was requested to relax and look forward and
keep their head movements natural (non-jerky).
III. T HE DATASET
A. Equipment Used
Two different camera phones (Asus and Lenovo), and
one Sony Cybershot camera were used to capture the video
recordings of the faces of the subjects. These were supported
using tripod stands, and their height was adjusted till the
camera was directly in line with the face of the subject such
that no skewness was observed in the captured videos. The
videos were captured at a resolution of 3840 x 2160 for the
Asus videos, 1920 x 1080 for Lenovo Videos, 1280 x 720
for Sony videos. All the videos were recorded at a frame rate
of 30 fps. Battery operated LED Lights with three different
levels of illumination (low, medium, high) were used to Fig. 6. The schematic diagram the setup
illuminate the face of the subject.
Using two such lights with four different modes of opera-
tion (zero, low, medium and high levels of illumination),the C. Recording Procedure
videos could be recorded in 16 different configurations of The subject’s finger was placed in the oximeter, and the
lighting conditions. A commercial pulse oximetry sensor readings in the oximeter were allowed to stabilize. Then all
which is attached to the finger tips of the subject was used the cameras were switched on one by one and then a tone
to determine the true Heart Rate values. A video of the of 1000 Hz frequency and 5 second duration was played.
LED screen of the oximeter was captured using a Samsung This was done such that the microphones in each of the
phone, which was processed and then fed to a Convolutional camera devices would record the audio and this could be later
Neural Network to get the annotated values of the Heart rate used to synchronize the videos to the same starting point.
captured by the oximeter. Then the cameras were allowed to record for one minute and
the cameras were turned off. This was repeated for sixteen
B. Recording Setup different lighting conditions and for two different distances
The three cameras (Asus, Lenovo, Sony) were placed i.e. 1 ft and 1.5 ft, thus making a total of 32 videos per
along the 1 ft or 1.5 ft arc on a table at the angles 33 degrees, subject.
Fig. 7. Architecture of the CNN used for annotations.
30 such subjects were recorded out of which 23 were A. Convolutional Neural Network
males and 7 were females. After the digits were segregated,the image was rescaled
D. Recorded Data and Arrangement to 28x28. The resized images were fed into a CNN also
for recognition. The CNN architecture consisted of two
The arrangement was done in such a way that the DATA convolutional layers (conv1 and conv2), two max pooling
folder has 30 sub-folders, each named after the subject. Each layers (mpool1 and mpool2) and two fully-connected layers
of those folders contain 4 folders for 4 different cameras. (fc1 and out). The output of fc1 is passed to a 10-way
Each of those have 2 folders based on the 2 distances. softmax, which produces a probability distribution over ten
The distance folders contain the corresponding videos And labels (i.e. 0-9).
another folder named Features for saving the RGB intensity The first convolution Layer has 32 filters and a kernel
contours .mat files. The folder delay saves the mat files after size of 5, the activation function used is ReLU. The second
the videos are synchronized for having the same starting Convolution Layer has 64 filters and a kernel size of 3 and
point. the activation function used is ReLU.
Both convolutional layers use a unit stride. Both the max
pooling layers perform down-sampling with strides of 2 and
kernel size of 2.
The the data is flattened to a 1-D vector for the fully
connected layer (fc1). Fully connected layer fc1 has 1024
neurons.
To reduce over-fitting, a dropout layer was inserted be-
tween the two fully connected layers with a dropout rate of
0.75.
B. Training Data