Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

WESAD

Introduction:

The objective of this use case is to predict the stress level of a person and categorize the stress
level as amusement, baseline and stress.

Explanation of the dataset:

The WESAD stands for WEarable Stress and Affect Detection. As the name indicates the
data is used to detect the stress level in a person. The data is collected by conducting an
experiment on 17 subjects. Out of 17, 2 subjects(S1 and S12) had been discarded, due to
sensor malfunction. For each subject, there are five sub files. They are

SX_readme contains preliminary x(features)


question and answer given
by subjects

SX_respiban contains data respiban x(features)


device

SX_E4_Data contains data Empatica E4 x(features)


device

SX_Quest contains self-report y(label)


questionnaires and answer
given by subjects

SX.pickle contains synchronised data x(features)


that comprises the data from
SX_respiban and
SX_E4_data

Explanation of Columns :
Columns Explanation
net_acc_mean Mean of the net accelerometer signal
net_acc_std Standard deviation of the net accelerometer signal
net_acc_min
net_acc_max
Minimum of the net accelerometer signal
Maximum of the net accelerometer signal
ACC_x_mean Mean of the accelerometer X signal
ACC_x_std Standard deviation of the accelerometer X signal
ACC_x_min
ACC_x_max
Minimum of the accelerometer X signal
Maximum of the accelerometer X signal
ACC_y_mean Mean of the accelerometer Y signal
ACC_y_std Standard deviation of the accelerometer Y signal
ACC_y_min
ACC_y_max
Minimum of the accelerometer Y signal
Maximum of the accelerometer Y signal
ACC_z_mean Mean of the accelerometer Z signal
ACC_z_std Standard deviation of the accelerometer Z signal
ACC_z_min
ACC_z_max
Minimum of the accelerometer Z signal
Maximum of the accelerometer Z signal
BVP_mean Mean of the BVP signal.
BVP_std Standard deviation of the BVP signal.
BVP_min
BVP_max
Minimum of the BVP signal.
Maximum of the BVP signal.
EDA_mean Mean of the EDA signal
EDA_std Standard deviation of the EDA signal.
EDA_min
EDA_max
Minimum of the EDA signal.
Maximum of the EDA signal.
EDA_phasic_mean Mean of the EDA phasic signal.
EDA_phasic_std Standard deviation of the EDA phasic signal.
EDA_phasic_min
EDA_phasic_max
Minimum of the EDA phasic signal.
Maximum of the EDA phasic signal.
EDA_smna_mean Mean of the EDA smna signal.
EDA_smna_std Standard deviation of the EDA smna signal.
EDA_smna_min
EDA_smna_max
Minimum of the EDA smna signal.
Maximum of the EDA smna signal.
EDA_tonic_mean Mean of the EDA tonic signal.
EDA_tonic_std Standard deviation of the EDA tonic signal.
EDA_tonic_min
EDA_tonic_max
Minimum of the EDA tonic signal.
Maximum of the EDA tonic signal.
Resp_mean Mean of the Respiration signal.
Resp_std Standard deviation of the Respiration signal.
Resp_min
Resp_max
Minimum of the Respiration signal.
Maximum of the Respiration signal.

TEMP_mean Mean of the temperature signal.


TEMP_std Standard deviation of the temperature signal.
TEMP_min
TEMP_max
Minimum of the temperature signal.
Maximum of the temperature signal.

BVP_peak_freq Data from BVP signal.

TEMP_slope Data from temperature signal.

subject Subject ID

label The target variable such as 0(Amusement),


1(Baseline/neutral), 2(Stress).
age Age of the subject.

height Height of the subject.

weight Weight of the subject.

gender_ female Contains 1 if the gender is female

gender_ male Contains 0 if the gender is male

coffee_today_YES Contains 1 if the subject takes coffee

sport_today_YES Contains 1 if the subject plays sports

smoker_NO Contains 1 if the subject is not smoking

smoker_YES Contains 1 if the subject is smoking

feel_ill_today_YES Contains 1 if the subject is ill on the current day

Feature Engineering (Preprocessing):


The raw data is not in the form to apply ML models. Therefore the data should be cleaned.
Since the raw data contains the signal values from 2 different devices with varying frequency
it should be converted into the same frequency and further into seconds and all these
converted data are grouped in a csv file.

The data from the SX_readme is also added to the file. The data in the SX_Quest is added to
the file and this denotes the label i.e. the value to be predicted.
The WESAD datasets—are inherently unbalanced because their experimental protocols
dictated different duration. SMOTE technique of upsampling was used to deal with the
unbalanced data.
The EDA is controlled by the sympathetic nervous system (SNS), and hence it is particularly
sensitive to high arousal states. First, a 5 Hz lowpass filter was applied to the raw EDA
signal]. Then, statistical features were computed (e.g. mean, standard deviation, dynamic
range, etc.). Furthermore, the raw EDA signal consists of a tonic (referred to as skin
conductance level (SCL)) and a phasic (skin conductance response (SCR))
component. ).cvxEDA library was used to process the EDA
biosignal(reference :lciti/cvxEDA: Algorithm for the analysis of electrodermal activity
(EDA) using convex optimization).
On the raw ACC signal different statistical features, e.g. the mean µacc,i and standard
deviation σacc,i were computed. These features were computed both for each axis separately
(i ∈ {x,y, z})
On the raw TEMP signal common statistical features (mean, standard deviation, min, max,
etc.) were computed.
Statistical features of the biosignals such as min,max,standard deviation,quartiles were
calculated .The signals were present in varying frequency.The frequency was converted in
time(seconds).Window size of 30 seconds was taken .These 30 seconds of data which
consisted of statistical features of the biosignals were flattened to form a single row.This is
how we created 1177 rows ,each row depicting the processed biosignals for 30 second
period.We have saved it in cleantable.csv file. The final Wesad data set contains 59 columns.
Idea for processing shown in block diagram:
Feature Selection:

The correlation matrix is plotted for the dataset and the following features are selected as they
are highly correlated.

BVP_mean BVP_std EDA_phasic_mean EDA_phasic_min

EDA_tonic_mean Resp_mean Resp_std TEMP_mean

TEMP_std TEMP_slope BVP_peak_freq age

height weight EDA_smna_min label

Model Summary:

The extracted features, detailed above, serve as input for the classification step. Five machine
learning algorithms were applied and compared within our benchmark: Decision Tree (DT),
Random Forest (RF), AdaBoost (AB), Linear Discriminant Analysis (LDA), and k-Nearest
Neighbour (kNN). As the entire data processing chain was implemented in Python, we used
the scikit-learn implementation of the aforementioned classifiers.
A benchmark is created using a large amount of well-known features (extracted from
physiological and motion signals) and common machine learning methods (Decision Tree
(DT), Random Forest (RF), AdaBoost (AB), Linear Discriminant Analysis (LDA) .
The subjects (n = 15) were exposed to different affective stimuli (stress and amusement).
Finally, the data of each subject is linked to several self-reports, which represent the
subjective experience during an affective stimulus
Input & Output

Input: Output:

1.Data from 3 axis accelerometer sensor The output are of 3 types:


2.Data from electrocardiogram sensor 1 baseline/neutral
3.Data from electromyography sensor 2 stress
4.Data from electrodermal activity sensor 3 amusement
5.Data from temperature sensor
6.Respiration rate The graph will also shown for
7.Contains answer for the some preliminary Electrodermal activity (EDA)
question such as: Blood Volume Pulse (BVP)
● Age
● Height The pie chart which depicts the
● Weight proportion of baseline, stress and
● Gender amusement.
● Dominant hand
● Did you drink coffee today?
● Did you drink coffee within the last hour?
● Did you do any sports today?
● Are you a smoker?
● Did you smoke within the last hour?
● Do you feel ill today?
Reason for the selection of algorithm:

Logistic Regression:
Logistic Regression uses a different method for estimating the parameters, which
gives better results–better meaning unbiased, with lower variances.

Naive Bayes:
It is easy and fast to predict the class of the test data set. It also performs well in
multi-class prediction.

Decision Tree:
A significant advantage of a decision tree is that it forces the consideration of all
possible outcomes of a decision and traces each path to a conclusion.

Random Forest:
Random Forest increases predictive power of the algorithm and also helps prevent
overfitting.

AdaBoost:
AdaBoost can be used to boost the performance of any machine learning algorithm. It is
best used with weak learners. These are models that achieve accuracy just above random
chance on a classification problem.

GradBoost:
The GRADBOOST procedure creates a predictive model by fitting a set of additive
trees.

UI:

The UI is designed in such a way that everyone who uses the smartphone and laptop can
understand. The user has to sign up if he/she uses the website for the first time or login.
The main page displays the button “check your stress level” . When the user clicks the button,
the browser navigates to a new page where the user needs to fill some preliminary details and
press the button “connect the sensor”. Then the website shows an emoji and text which
indicates the stress level of the user.For Further understanding the website also displays a
graph.
The user can use the information provided in the website through buying the API.
UI depiction through block model:

Some snapshots of UI (API and website):


Conclusion:

40% of workers reported their job was very or extremely stressful. 25% view their jobs as
the number one stressor in their lives. Through our project we tried to take an initiative to
bring the percentage down by detecting the stress level at the earliest.
The data was handled in the correct way and used very carefully. The models were built for
prediction. We are able to categorize the person’s stress level as stress, amusement and
baseline. Finally an API concept is included in order to attract the user.

You might also like