Basepaper

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

electronics

Article
Research on a Non-Invasive Hemoglobin Measurement System
Based on Four-Wavelength Photoplethysmography
Zhencheng Chen 1 , Huishan Qin 1 , Wenjun Ge 1 , Shiyong Li 2, * and Yongbo Liang 1, *

1 School of Life and Environmental Sciences, Guilin University of Electronic Technology, Guilin 541004, China
2 School of Electronic Engineering and Automation, Guilin University of Electronic Technology,
Guilin 541004, China
* Correspondence: lishiyong@guet.edu.cn (S.L.); liangyongbo@guet.edu.cn (Y.L.)

Abstract: Hemoglobin is an essential parameter in human blood. This paper proposes a non-
invasive hemoglobin concentration measurement method based on the characteristic parameters
of four-wavelength photoplethysmography (PPG) signals combined with machine learning. The
DCM08 sensor and NRF52840 chip form a data acquisition system to collect 58 human fingertip
photoelectric volumetric pulse wave signals. The 160 four-wavelength PPG signal feature parameters
were constructed and extracted. The feature parameters were screened by combining three feature
selection methods: reliefF, Chi-square score, and information gain. The top 10, 20, and 30 features
screened were used as input to evaluate the prediction performance of different feature sets for
hemoglobin. The prediction models used were XGBoost, support vector machines, and logistic
regression. The results showed that the optimal performance of the 30 feature sets screened using the
Chi-square test was achieved by the XGBoost model with a coefficient of determination (R2 ) of 0.997,
root mean square error (RMSE) of 0.762 g/L, and mean absolute error (MAE) of 0.325 g/L. The study
showed that the four-wavelength-based PPG signal feature parameters with the XGBoost algorithm
could effectively achieve non-invasive detection of hemoglobin, providing a new measurement
method in clinical practice.

Citation: Chen, Z.; Qin, H.; Ge, W.; Keywords: photoplethysmography; hemoglobin; feature selection; machine learning
Li, S.; Liang, Y. Research on a
Non-Invasive Hemoglobin
Measurement System Based on
Four-Wavelength 1. Introduction
Photoplethysmography. Electronics
Hemoglobin (Hb) is one of the important components of red blood cells. It consists of
2023, 12, 1346. https://doi.org/
four protein molecules called globulin chains, each of which contains an important central
10.3390/electronics12061346
structure called the hemoglobin molecule, embedded in iron [1]. Hb is a crucial indicator
Academic Editors: Radu Ciorap, Jiri of anemia, blood loss, and other body symptoms. The primary function of hemoglobin
Hozman and Jan Vrba is to deliver oxygen to the whole body [2]. According to the World Health Organization
Received: 13 February 2023
(WHO), an estimated 1.6 billion people, approximately 30% of the total population, are
Revised: 6 March 2023
suffering from anaemia. This vulnerable group of anaemia includes pregnant women,
Accepted: 9 March 2023
preschool children, and teenagers [3]. The general symptoms of anemia are tiredness,
Published: 12 March 2023 lethargy, weakness, pale lips, shortness of breath, slippery tongue, increased heart rate,
loss of appetite, dizziness, and lethargy [4]. Therefore, the detection of Hb is essential for
preventing and diagnosing related diseases.
Current assays for hemoglobin concentration include mainly invasive and minimally
Copyright: © 2023 by the authors. invasive methods, both of which require collecting a blood sample from the subject, which
Licensee MDPI, Basel, Switzerland. can be painful for the issue. At the same time, there is a risk of cross-infection, the need
This article is an open access article
for professionals to operate, and the inability to detect in real time. The emergence of
distributed under the terms and
noninvasive testing technology is a better solution to the above problems, and currently,
conditions of the Creative Commons
noninvasive hemoglobin testing is mainly based on photoplethysmography (PPG). PPG
Attribution (CC BY) license (https://
is another signal that reflects the state of the cardiovascular system, and has received a
creativecommons.org/licenses/by/
great deal of attention in recent years due to its ease regarding collection, small sensor
4.0/).

Electronics 2023, 12, 1346. https://doi.org/10.3390/electronics12061346 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 1346 2 of 12

size, and non-invasiveness [5]. The pulse wave is highly critical in human life and health
detection and contains rich physiological information. The human pulse wave signal is
collected, and the PPG signal is processed in various ways to extract useful human physi-
ological information. In addition, it is of great significance for detecting related diseases.
PPG can be used not only to assess hemoglobin levels but also to evaluate several aspects,
such as SPO2 [6], heart rate estimation [7], respiratory rate [8], continuous blood pressure
measurement [9], sleep assessment [10], and arrhythmia detection [11]. Thus, clinical
monitoring of PPG and hemoglobin parameters provides a timely diagnostic reference
for the disease and can be used for subsequent studies on various disease and physio-
logical state assessment methods. With the development of machine learning, scholars
have conducted much research on non-invasive hemoglobin detection methods based
on machine learning. Kavsaoglu et al. proposed a non-invasive method for predicting
hemoglobin that utilizes features of the PPG signal using classification and regression
trees (CART), least squares regression (LSR), support vector regression (SVR), and eight
other machine learning regression methods. The results showed good results using RFS
feature selection method combined with SVR (MSE = −0.0027) [12]. Acharya et al. used a
Multi-Model Stacking Regressor such as Selection Operator (LASSO), Ridge, Elastic Net,
and five other machine learning methods to achieve non-invasive hemoglobin prediction.
They suggest that this approach could form the basis of a public health screening tool for
the detection and treatment of maternal anaemia and could complement global health
intervention strategies [13]. Lakshmi et al. used PPG signals and a generalized linear
regression technique to monitor hemoglobin levels in pregnant women. They showed an
absolute deviation of 0.73 g/dL between the predicted and actual hemoglobin concentra-
tion values [14]. Pinto et al. applied Multivariate Partial Least Square Regression (PLSR) to
predict hemoglobin concentration and validated the designed system by Bland–Altman
analysis, which showed good agreement between predicted hemoglobin and reference
hemoglobin [15].
In this paper, the four-wavelength pulse wave signals of fingertips were collected by
photoelectric sensors, 160 morphological feature parameters based on the four-wavelength
pulse wave signals were constructed and extracted, and then the main feature parameters
were screened using reliefF, Chi-square Score, and Information Gain. Next, the hemoglobin
concentration was predicted using XGBoost, support vector machine regression (SVR),
and logistic regression (LR) models with the screened feature set as input. Finally, the pre-
diction performance was evaluated using RMSE, R2 , and MAE.

2. Materials and Methods


2.1. Four-Wavelength Non-Invasive Hemoglobin Testing System Design
2.1.1. Hardware System
This paper uses a four-wavelength reflective oxygen sensor DCM08 for finger-end
pulse wave signal detection at 660 nm, 730 nm, 850 nm, and 940 nm. The four-wavelength
acquisition system combines ADPD4100, a multi-mode sensor analog front-end chip from
ADI, and NRF52840, an ultra-low-power Bluetooth from Nordic. As a complete multi-
mode sensor front-end chip, the ADPD4100 can excite up to eight light-emitting diodes and
measure the return signals of up to eight independent current inputs while suppressing
signal shifts and corruption from asynchronous modulated interference from ambient
light, eliminating the need for filters or externally controlled DC offset circuits. In ad-
dition, the acquisition system is equipped with the zephyr real-time operating system.
Zephyr is a lightweight open-source operating system for the Internet of Things, targeted
at building a small, tailorable real-time operating system (RTOS) for resource-constrained
devices, providing a low footprint, high-performance, multi-threaded execution environ-
ment. The block diagram of the described four-wavelength PPG signal acquisition system
is shown in Figure 1. The ADPD4100 provides current excitation for the four LEDs of the
DCM08 and starts four time slots to control the four wavelengths of the DCM08 in turn,
which are then transmitted through the SPI interface. First, the raw pulse wave data are
Electronics 2023, 12, 1346 3 of 12

pre-processed, and then the four wavelengths are sent to the PC host computer by the serial
port. The system framework of the specific design is shown in Figure 2.

Figure 1. PPG signal acquisition experimental system.

Figure 2. Block diagram of a four-wavelength PPG signal acquisition system.

2.1.2. Software Design for the Host Computer


In this study, Qt Creator implements the upper computer software, which displays and
stores the four pulse wave signals transmitted by the serial port for subsequent processing
and calculation. Figure 3 shows the program workflow diagram.
Electronics 2023, 12, 1346 4 of 12

Figure 3. The program workflow diagram.

2.2. Data Collection and Preprocessing


2.2.1. Data Collection
The experimental measurement subjects were 58 volunteers recruited for this project
to the University Hospital of the Guilin University Of Electronic Technology for routine
physical examination, with an age range from 21 to 27 years old; the male-to-female ratio
was about 1:1, and the volunteers signed an informed consent form and received approval
from the University’s Medical Research Ethics Committee to participate in the test. A self-
designed four-wavelength PPG detection device collected the volumetric pulse signal from
each volunteer’s fingertip. Volunteers fasted from 9:00 p.m. the night before until the end
of the experiment the following day. The pulse wave signal was collected at a sampling rate
of 200 Hz for 1 min, and the data were through the host computer. Immediately after each
subject’s measurement, a venous blood sample is drawn from the volunteer and analyzed
Electronics 2023, 12, 1346 5 of 12

by the hospital’s fully automated hematocrit analyzer to obtain the corresponding invasive
hemoglobin assay value, which is used as a reference value for constructing the model.

2.2.2. Data Preprocessing


The PPG signal is low-frequency, generally between 0.2 and 10 Hz. In this paper,
a second-order Butterworth bandpass filter from 0.25 Hz to 10 Hz is designed and im-
plemented in the hardware system to filter the original pulse wave signal to remove
high-frequency noise, motion artifacts, and baseline drift from the call, and the processed
data are transmitted to the host computer by the serial port. Before feature extraction,
the acquired data need to be screened for completeness and availability as well as signal
quality assessment, and Mohamed et al. proposed signal quality indices skewness (SSQI )
as the best method to evaluate the quality of PPG signals, which can effectively distinguish
between good, acceptable, and noisy signals [16,17]. Meanwhile, the SSQI method is very
suitable for wearable health devices because of its real-time processing and low computing
power. Therefore, this paper uses the bias method to analyze and judge the quality of the
collected PPG signals.

2.3. PPG Signal Feature Extraction and Selection


2.3.1. Feature Extraction
After the signal pre−processing is finished, the signal feature extraction can be carried
out, mainly from the morphological and time−domain parts of the PPG signal. First,
they analyzed the PPG waveforms corresponding to the long sequence in each of the
four channels as the period splitting points, thus locating the feature points in different
periods, as shown in Figure 4. In the PPG waveform of one cycle, ”O” is called Onset, ”S”
is called Systolic Peak, “N” is called Notch, D is called Diastolic Peak, “a1 wave(a1)”, “b1
wave(b1)”, “c1 wave(c1)”, “d1 wave(d1)” in VPG waveform, “a2 wave(a2)”, “b2 wave(b2)”,
“c2 wave(c2)”, and “d2 wave(d2)” in APG waveform. According to the literature [18],
the definition of the waveform characteristic points. Then, the extraction of PPG wave-
forms and first−order derivative and second−order derivative feature information in one
complete cycle is completed, as shown in Figure 5. Forty feature parameters are extracted
for each channel, and the detailed feature information is shown in Table 1. In addition,
160 features are constructed and extracted in total for the four channels.

Figure 4. The characteristic features obtained from the PPG signal.


Electronics 2023, 12, 1346 6 of 12

Figure 5. The characteristic features acquired from APG and VPG.

Table 1. Features acquired in PPG signal.

Type Location Index Formula Descriptions


F1 x Height of systolic peak
Amplitude PPG F2 y Height of diastolic peak
F3 z Height of dicrotic notch
F4 tpi PPG cycle
F9 t1 Systolic peak time
F10 t2 Dicrotic notch time
PPG
F11 t3 Diastolic peak time
F12 ∆T Time interval between systolic peak and diastolic peak
F13 t1/2 Peak half-systolic peak time
Time Span
F21 ta1 Time from point d1 to point a1
F22 tb1 Time from point d1 to point b1
VPG
F23 tc1 Time from point d1 to point c1
F24 td1 Time from d1 point to the next d1 point
F28 ta2 Time from point d2 to the next point a2
APG
F29 tb2 Time from point d2 to the next point b2
Area PPG F14 A2/A1 Inflection Point area ratio
F5 y/x The ratio of diastolic peak amplitude to systolic peak amplitude
F6 (x−y)/x Alternative augmentation index [19]
F7 z/x The ratio of dicrotic notch amplitude to systolic peak amplitude
F8 (y−x)/x Negative relative augmentation index [19]
F15 t1/x The ratio of systolic peak time to systolic peak amplitude
PPG F16 y/(tpi−t3) Diastolic peak downward curve [19]
F17 t1/tpi Ratio of systolic peak time to PPG cycle
F18 t2/tpi Ratio of dicrotic notch time to PPG cycle
F19 t3/tpi Ratio of diastolic peak time to PPG cycle
F20 ∆T/tpi Ratio of ∆T to PPG cycle
F40 V2/V1 Stress-induced vascular response index [19]
F30 ta1/tpi Ratio of ta1 to PPG cycle
Ratio
F31 tb1/tpi Ratio of tb1 to PPG cycle
VPG
F32 tc1/tpi Ratio of tc1 to PPG cycle
F33 td1/tpi Ratio of td1 to PPG cycle
F34 ta2/tpi Ratio of ta2 to PPG cycle
F35 tb2/tpi Ratio of tb2 to PPG cycle
F36 (ta1+ta2)/tpi Ratio of (ta1+ta2) to PPG cycle
F37 (tb1+tb2)/tpi Ratio of (tb1+tb2) to PPG cycle
F38 (tc1+t2)/tpi Ratio of (tc1+t2) to PPG cycle
APG F39 (td1+t3)/tpi Ratio of (td1+t3) to PPG cycle
F25 b2/a2 Ratio of b2 to a2
F26 c2/a2 Ratio of c2 to a2
F27 (b2+c2)/a2 Ratio of (b2+c2) to a2
Notes: PPG represents the photoplethysmogram signal, VPG represents the velocity of PPG, and APG represents
the acceleration of PPG.
Electronics 2023, 12, 1346 7 of 12

2.3.2. Feature Selection


Feature selection is an essential step in model building. In machine learning, the greater
the number of features, the more irrelevant features will exist, and the correlation between
components and degree of importance to the detection target varies greatly. Through feature
importance selection, we can eliminate irrelevant, redundant, and non-drawable features,
thus reducing the number of features, training time, and model robustness. Therefore,
in this study, three feature importance selection methods [20], namely reliefF, Chi-square
Score, and Information Gain, were used, and the top 10, 20, and 30 features of the entire
feature set were screened as inputs, respectively, and applied to the regression model for
prediction, and analyze and discuss the differences in the performance of the different
number of features in regression prediction.

2.4. Hemoglobin Regression Model Selection


In this study, three prediction models with different regression principles, logistic
regression (LR), support vector regression (SVR), and eXtreme Gradient Boosting(XGBoost)
were used. Determination coefficient (R2 ), root mean square error (RMSE), and mean
absolute error (MAE) can be used to evaluate regression prediction performance.

2.4.1. LR
Logistic regression models are widely used and have powerful explanatory powers
and have been used to describe phenomena in diverse medical and nonmedical research
areas. Similar to other regression models, logistic regression models are often used to assess
predictors and regulate confusion and interactions [21]. The feature-to-result mapping
process adds a layer of function mapping. The sigmoid function uses the sigmoid function
to constrain the linear sum to between (0,1), and the resultant values can be used for binary
classification or regression prediction.

2.4.2. SVR
Smola [22] proposed Support Vector Regression (SVR) in 1998, a machine learning
method based on statistical VC dimensionality theory and structural risk minimization
criteria. It has a high degree of generalization and can solve practical problems such as small
sample size, high dimensionality, strong nonlinearity, and local extrema [23]. Furthermore,
unlike other regression methods, support-vector regression chooses the regression function
by minimizing some observational errors [24].

2.4.3. XGBoost
The XGBoost algorithm [25] is an integrated learning algorithm based on boosting.
It is developed based on the gradient-boosting decision tree (GBDT) algorithm [26]. As a
result, its speed and precision have increased. In addition, the XGBoost algorithm expands
the cost function by introducing regularization to avoid overfitting. In the field of machine
learning, it is a good and widely used algorithm. Furthermore, developing specialized
medical databases, such as the Medical Information Mart for Intensive Care III (MIMIC-III
database), facilitates data extraction and analysis for ML models [27].

3. Results and Discussion


In this study, 58 samples were collected, with 60 to 90 PPG signal cycles in each piece of
1-minute data. In addition, a randomly selected 70% of the heartbeat cycles of each instance
were for training and the remaining 30% were for testing, and each sample’s training
and testing cycles were pooled together to form the training and testing sets, respectively.
To better reflect the performance differences of different feature numbers, different feature
importance selection methods, and different regression prediction models for noninvasive
hemoglobin detection, the prediction performance results achieved by each heartbeat cycle
are compiled in Table 2 for comparative analysis.
Electronics 2023, 12, 1346 8 of 12

From the results, it can be seen that the prediction accuracy of the three models
increases with the increase in the number of features, and the detection error gradually
decreases. It indicates that the introduced feature parameters significantly improve the
prediction accuracy of hemoglobin concentration, and the accuracy of all three feature
selection methods is the highest, with some 30 features. The prediction accuracy of the
XGBoost regression model is the highest for the other two models, and the prediction
accuracy of all three regression models is better than the results of the other two feature
selection methods under the Chi-square Score feature selection method. In addition,
the XGBoost regression model achieved the most petite MAE of 0.325 g/L. Therefore,
overall, higher hemoglobin prediction performance could be achieved using the Chi-square
filtered 30 features combined with the XGBoost regression model.

Table 2. Prediction accuracy of three regression models under three feature selection methods under
different numbers of features.

Feature Selection Number of Regression


RMSE R2 MAE
Methods Features Models
LR 12.756 0.288 9.663
10 SVR 10.083 0.555 7.594
XGBoost 2.588 0.970 1.410
LR 11.968 0.373 9.114
InfoGain 20 SVR 8.677 0.670 6.547
XGBoost 2.594 0.970 1.440
LR 11.530 0.418 8.740
30 SVR 7.252 0.769 5.450
XGBoost 2.495 0.972 1.467
LR 14.955 0.021 7.421
10 SVR 11.812 0.389 8.956
XGBoost 10.669 0.501 13.406
LR 11.704 0.400 9.179
reliefF 20 SVR 7.902 0.726 5.864
XGBoost 2.665 0.968 1.413
LR 9.201 0.629 6.810
30 SVR 5.567 0.864 4.305
XGBoost 1.960 0.983 1.091
LR 14.561 0.072 12.129
10 SVR 12.750 0.288 9.899
XGBoost 12.168 0.352 9.040
LR 11.640 0.407 9.060
Chi-square 20 SVR 7.959 0.722 5.946
XGBoost 2.446 0.973 1.459
LR 10.614 0.507 8.256
30 SVR 4.776 0.900 3.870
XGBoost 0.762 0.997 0.325

The 30 key features screened based on the Chi-square feature selection method are
specified in Table 3.

Table 3. Characteristic results of Chi-square method selection.

Method Features
3F11 2F11 1F11 4F11 3F10 3F9 3F13 3F12 2F10 1F10
Chi-square 1F9 1F13 2F12 2F9 2F13 3F21 3F5 3F8 3F6 3F27
4F6 4F5 4F8 4F10 1F12 3F22 1F6 1F5 1F8 4F9
Note: 1, 2, 3, and 4 represent wavelength 1, wavelength 2, wavelength 3, and wavelength 4, respectively.

Figure 6 shows the scatter plot of hemoglobin reference values and the XGBoost
regression model predicted values under 30 key characteristic parameters. The horizontal
Electronics 2023, 12, 1346 9 of 12

coordinate is the actual hemoglobin value of the fully automated hematology analyzer,
and the vertical coordinate is the XGBoost regression model hemoglobin predicted value.
The correlation analysis of the valid and predicted values showed that the slope is 0.993,
R2 is 0.997, and MAE is 0.762 g/L.
The Bland–Altman plot in biomedicine is a data plotting method used to assess the
difference between a new and a standard procedure and to analyze the agreement between
two different assays. This paper uses Bland–Altman plots to achieve consistent analysis
of hemoglobin values. The horizontal axis of the field represents the mean value of the
results of each sample measured by the two methods, and the vertical axis represents
the difference between the results of the two methods. The upper and lower horizontal
lines indicate the upper and lower limits of the 95% consistency limits, i.e., 1.96 times the
standard deviation; the middle horizontal solid line indicates the position where the mean
value of the difference is 0. The Bland–Altman plots of the XGBoost regression model
with 30 key parameters are shown in Figure 7, and most of the sample data are within the
consistency limits, with 95% consistency limits of (−1.504, 1.486) g/dL.

Figure 6. Fitting chart of the real value of hemoglobin and predicted value of the XGBoost regres-
sion model.

This paper provides an idea for achieving noninvasive detection of hemoglobin,


and Table 4 compares the proposed method with the existing literature. Ghosal et al. [28]
proposed to collect conjunctival images of the right and left eyes of 65 subjects using a
smartphone camera and proposed the FANIAD image processing algorithm model to assess
hemoglobin levels. The model achieved an accuracy of ±0.32 g/dL, a sensitivity of 89%,
and an R2 of 0.8774 for the left eye and 0.8144 for the right eye. Saracoglu et al. [29] used a
Radical-7 Pulse CO-Oximeter (Massimo Corporation, Irvine, CA, USA) to continuously
monitor 42 patients The impact of hemoglobin measurement on patients during and after
surgery was evaluated, concluding that monitoring hemoglobin levels intraoperatively
allowed for less postoperative site bleeding and reduced patient length of stay in the ICU,
with accuracy and coefficient of determination not mentioned. Fan et al. [30] proposed a
smartphone-based acquisition of 24 fingertip images from normal and anemic populations
at five wavelengths. First, the images were extracted for the PPG signal. Then, a multiple
linear regression algorithm was used to achieve prediction with an R2 of 0.880 and an
RMSE of 9.04. Hardyanto et al. [31] used 660 nm and 940 nm LEDs to acquire PPG signals
from nine subjects for analysis. The experimental results yielded an accuracy of 94.2%
Electronics 2023, 12, 1346 10 of 12

for this noninvasive hemoglobin measurement device, with a standard deviation being
4.7. Pinto et al. [32] developed a noninvasive hemoglobin measurement device using an
Arduino Uno embedded development board to control five light-emitting diodes with
wavelengths of 670 nm, 770 nm, 810 nm, 850 nm, and 950 nm, respectively. Data from
15 subjects were collected for analysis, and after LED power normalization, the accuracy
reached 98.29%, RMSE was reduced to 0.36 gm/dL, and R2 was 0.981. All of these methods
achieved noninvasive detection of hemoglobin, and the predicted results were evaluated
using different indicators. As can be seen from the table, more volunteers were recruited
in this study than in the literature [29–32], indicating that the experimental data in this
paper have some reliability. For the R2 index, compared with the literature [28,30,32],
the R2 of this paper is closest to 1, indicating that the XGboost algorithm proposed in this
paper improves the generalization ability of the model. For the RMSE index, the RMSE
of this paper is the smallest compared with that of the literature [30]. The RMSE of this
paper is 0.402 more than that of the literature [32], which indicates that the prediction error
of hemoglobin by the system in this paper needs to be further reduced, which is also a
shortcoming of the method in this paper. However, in general, this paper’s experimental
results have improved performance.

Figure 7. Bland–Altman diagram of XGBoost regression model for predicting hemoglobin concentration.

Table 4. Comparison of the proposed methodology with the existing literature.

References Methodology Wavelength Algorithm Subjects R2 RMSE


660 nm, 730 nm,
Our study PPG XGBoost 58 0.997 0.762
850 nm, 940 nm
Left Eye:
Smartphone +
Ghosal et al. [28] - FANIAD 65 0.8774 -
the RGB spectrum
Right Eye:
Radical-7 0.8144
Saracoglu et al. [29] - - 42 - -
Pulse CO-Oximeter
660 nm, 810 nm,
Multiple linear
Fan et al. [30] Smartphone +PPG 900 nm, 970 nm, 24 0.88 9.04
regressor
1050 nm
Hardyanto et al. [31] PPG 660 nm, 940 nm Linear regression 9 - -
670 nm, 770 nm,
Pinto et al. [32] PPG 810 nm, 850 nm, Linear regression 15 0.981 0.36
950 nm
Electronics 2023, 12, 1346 11 of 12

4. Conclusions
The PPG acquisition system combining the four-wavelength DCM08 blood oxygen
sensor and the analog front-end chip ADPD4100 was designed to perform the human fin-
gertip hemoglobin detection study. First, the hemoglobin prediction model was established
by extracting the feature parameters of the four channels’ high-quality PPG waveforms.
Then, different feature parameters were filtered into other regression models using reliefF,
Chi-square, and InfoGain feature selection methods to determine the optimal model and
key feature parameters. Chi-square, a feature selection algorithm that screened 30 feature
quantities, has the best prediction result, R2 is 0.997, and RMSE is 0.762 g/L, which indicates
that this model has good generalization ability and accuracy. The results of the experiments
show that the XGBoost-based noninvasive hemoglobin prediction model established in this
paper has certain reliability and research value, which is helpful for the improvement and
broad application of continuous noninvasive hemoglobin measurement methods and can
be expected to be used for the diagnosis of early anemia. Suppose the proposed XGBoost al-
gorithm is put into the upper computer software. In that case, the prediction of hemoglobin
will be more convenient and intelligent, or the collected data will be transferred to the
cloud for processing, and the results will be returned to the upper computer software for
display, which will make the system function more diversified. In addition, more than the
sample size collected in this paper is needed. Therefore, we will expand the sample size
and widen the range of sample data in future research work to further study the regression
modeling algorithm, train the model continuously, improve the generalization ability of
the model, make our detection system have more data support, and make the experimental
results more reliable and based.

Author Contributions: Y.L. designed the study. Z.C., H.Q., W.G., S.L. and Y.L. conceived the study,
provided directions, feedback, and/or revised the manuscript. Y.L. led the investigation and drafted
the manuscript for submission with revisions and feedback from the contributing authors. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was supported by the Guangxi Innovation Driven Development Project
(Guike AA19254003), the National Natural Science Foundation of China (62101148), the Natural
Science Foundation of Guangxi (2020GXNSFBA297156), the National Major Research Instrument
Development Project of the NSFC (Grant No. 61627807), and the Innovation Project of GUET Graduate
Education (Grant No. 2022YCXS222 and Grant No. 2022YCXB08).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data used in this manuscript can be downloaded from this link https:
//figshare.com/articles/dataset/Hemoglobin_detection_based_on_four-wavelength_PPG_signal_zip/
22256143 (accessed on 12 February 2023).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Pintavirooj, C.; Ni, B.; Chatkobkool, C.; Pinijkij, K. Noninvasive portable hemoglobin concentration monitoring system using
optical sensor for anemia disease. Healthcare 2021, 9, 647. [CrossRef]
2. Jensen, F.B.; Fago, A.; Weber, R.E. Hemoglobin structure and function. Fish Physiol. 1998, 17, 1–40.
3. Pinto, C.; Parab, J.; Naik, G. Non-invasive hemoglobin measurement using embedded platform. Sens. Bio-Sens. Res. 2020,
29, 100370. [CrossRef]
4. Munira, L.; Viwattanakulvanid, P. Influencing Factors and Knowledge Gaps on Anemia Prevention among Female Students in
Indonesia. Int. J. Eval. Res. Educ. 2021, 10, 215–221. [CrossRef]
5. Tang, Q.; Chen, Z.; Ward, R.; Menon, C.; Elgendi, M. Subject-Based Model for Reconstructing Arterial Blood Pressure from
Photoplethysmogram. Bioengineering 2022, 9, 402. [CrossRef] [PubMed]
6. Tamura, T. Current progress of photoplethysmography and SPO2 for health monitoring. Biomed. Eng. Lett. 2019, 9, 21–36.
[CrossRef]
7. Kumar, A.; Komaragiri, R.; Kumar, M. A review on computation methods used in photoplethysmography signal analysis for
heart rate estimation. Arch. Comput. Methods Eng. 2022, 29, 921–940.
Electronics 2023, 12, 1346 12 of 12

8. Touw, H.R.; Verheul, M.H.; Tuinman, P.R.; Smit, J.; Th´’one, D.; Schober, P.; Boer, C. Photoplethysmography respiratory rate
monitoring in patients receiving procedural sedation and analgesia for upper gastrointestinal endoscopy. J. Clin. Monit. Comput.
2017, 31, 747–754. [CrossRef] [PubMed]
9. El-Hajj, C.; Kyriacou, P.A. A review of machine learning techniques in photoplethysmography for the non-invasive cuff-less
measurement of blood pressure. Biomed. Signal Process. Control 2020, 58, 101870. [CrossRef]
10. Huttunen, R.; Lepp´’anen, T.; Duce, B.; Oksenberg, A.; Myllymaa, S.; T´’oyr´’as, J.; Korkalainen, H. Assessment of obstructive
sleep apnea-related sleep fragmentation utilizing deep learning-based sleep staging from photoplethysmography. Sleep 2021,
44, zsab142. [CrossRef] [PubMed]
11. Sardana, H.; Dogra, N.; Kanawade, R. Dynamic time warping based arrhythmia detection using photoplethysmography signals.
Signal Image Video Process. 2022, 16, 1925–1933.
12. Kavsaoğlu, A.R.; Polat, K.; Hariharan, M. Non-invasive prediction of hemoglobin level using machine learning techniques with
the PPG signal’s characteristics features. Appl. Soft Comput. 2015, 37, 983–991. [CrossRef]
13. Acharya, S.; Swaminathan, D.; Das, S.; Kansara, K.; Chakraborty, S.; Kumar, D.; Francis, T.; Aatre, K.R. Non-invasive estimation
of hemoglobin using a multi-model stacking regressor. IEEE J. Biomed. Health Inform. 2019, 24, 1717–1726. [CrossRef]
14. Lakshmi, M.; Manimegalai, P.; Bhavani, S. Non-invasive haemoglobin measurement among pregnant women using photoplethys-
mography and machine learning. J. Physics Conf. Ser. 2020, 1432, 012089. [CrossRef]
15. Pinto, C.; Parab, J.; Sequeira, M.; Naik, G. Development of Altera NIOS II Soft-core system to predict total Hemoglobin using
Multivariate Analysis. J. Phys. Conf. Ser. 2021, 1921, 012039. [CrossRef]
16. Liang, Y.; Chen, Z.; Liu, G.; Elgendi, M. A new, short-recorded photoplethysmogram dataset for blood pressure monitoring in
China. Sci. Data 2018, 5, 1–7. [CrossRef] [PubMed]
17. Orphanidou, C. Quality Assessment for the Photoplethysmogram (PPG). In Signal Quality Assessment in Physiological Monitoring;
Springer: Berlin/Heidelberg, Germany, 2018; pp. 41–63.
18. Liang, Y.; Abbott, D.; Howard, N.; Lim, K.; Ward, R.; Elgendi, M. How effective is pulse arrival time for evaluating blood
pressure?Challenges and recommendations from a study using the MIMIC database. J. Clin. Med. 2019, 8, 337. [CrossRef]
19. Golap, M.A.u.; Raju, S.T.U.; Haque, M.R.; Hashem, M. Hemoglobin and glucose level estimation from PPG characteristics features
of fingertip video using MGGP-based model. Biomed. Signal Process. Control 2021, 67, 102478. [CrossRef]
20. Zhao, Z.; Morstatter, F.; Sharma, S.; Alelyani, S.; Anand, A.; Liu, H. Advancing Feature Selection Research–ASU Feature Selection
Repository. 2010; pp. 1–28. Available online: https://www.researchgate.net/publication/305083748 (accessed on 15 January
2023).
21. Zabor, E.C.; Reddy, C.A.; Tendulkar, R.D.; Patil, S. Logistic regression in clinical studies. Int. J. Radiat. Oncol. Biol. Phys.
2021, 271–277. . [CrossRef]
22. Smola, A.; Sch´’olkopf, B. A Tutorial on Support Vector Regression: NeuroCOLT; Technical Report NC-TR-98-030; Royal Holloway
College: London, UK , 1998.
23. Huang, H.; Wei, X.; Zhou, Y. An overview on twin support vector regression. Neurocomputing 2022, 490, 80–92. . [CrossRef]
24. Li, Q.; Qin, Z.; Liu, Z. Uncertain support vector regression with imprecise observations. J. Intell. Fuzzy Syst. 2022, 43, 3403–3409. .
[CrossRef]
25. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference
on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [CrossRef]
26. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 1189–1232. [CrossRef]
27. Wang, X.; Zhu, T.; Xia, M.; Liu, Y.; Wang, Y.; Wang, X.; Zhuang, L.; Zhong, D.; Zhu, J.; He, H.; et al. Predicting the prognosis
of patients in the coronary care unit: A novel multi-category machine learning model using XGBoost. Front. Cardiovasc. Med.
2022, 9, 764629. [CrossRef] [PubMed]
28. Ghosal, S.; Das, D.; Udutalapally, V.; Talukder, A.K.; Misra, S. sHEMO: Smartphone spectroscopy for blood hemoglobin level
monitoring in smart anemia-care. IEEE Sens. J. 2020, 21, 8520–8529. [CrossRef]
29. Saracoglu, A.; Abdullayev, R.; Sakar, M.; Sacak, B.; Girgin Incekoy, F.; Aykac, Z. Continuous hemoglobin measurement during
frontal advancement operations can improve patient outcomes. J. Clin. Monit. Comput. 2022, 36, 1689–1695. [CrossRef]
30. Fan, Z.; Zhou, Y.; Zhai, H.; Wang, Q.; He, H. A Smartphone-Based Biosensor for Non-Invasive Monitoring of Total Hemoglobin
Concentration in Humans with High Accuracy. Biosensors 2022, 12, 781. [CrossRef]
31. Hardyanto, I.; Pambudi, S.; Suyarna, Y.; Ardidarma, A.; Kurniawan, A.; Iskandar, J.; Siskandar, R.; Jenie, R.P.; Alatas, H.; Irzaman.
Non-invasive hemoglobin blood level measurement system. AIP Conf. Proc. 2021, 2320, 050005.
32. Pinto, C.; Parab, J.; Parab, M.; Naik, G. Improving hemoglobin estimation accuracy through standardizing of light-emitting diode
power. Int. J. Electr. Comput. Eng. 2022, 12, 219–228. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like