Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Machine learning prediction of blood alcohol

concentration: a digital signature of smart-


breathalyzer behavior
I. INTRODUCTION gathered in real-world conditions with
According to the World Health Organization, naturalistic signal-to-noise profiles,
hazardous use of alcohol accounts for 5% of representing normal user and technological
the worldwide disease burden, or 1 in 20 causes of variability.
deaths1 . This increased mortality derives Naturalistic data from personal breathalyzers
both from behavioral sequelae (motor vehicle are, by definition, obtained during user-
accidents, suicide, and interpersonal initiated drinking episodes. Despite this
violence) and from medical morbidity (e.g., limitation, these data could guide the
liver cirrhosis, cancers, pancreatitis, and development of ML-based interventions
mental comorbidities)2,3 . Smart targeting harm-reduction measures (e.g.,
breathalyzers are tiny hand-held devices that predicting those drinking episodes that are
capture user-initiated voluntary readings, more likely to result in higher BrAC). These
which have been marketed commercially treatments might complement other ML-
since 2013. based algorithms that can seek to forecast the
These devices correctly infer blood alcohol commencement of a drinking episode.
concentrations (BACs) from exhaled air and Moreover, to the degree that frequent BrAC
interface with a smartphone via a mobile feedback might give correcting information
application (app) and Bluetooth technology. regarding perceived versus real drunkenness,
Such smart breathalyzers might inform continuous personal breathalyzer use could
innovative strategies to encourage alcohol use represent an intervention in itself. This notion
behavior modification. Furthermore, digital is consistent with empirical evidence for
systems delivering realtime feedback might BrAC discrimination training, in which
give the possibility to message users in drinkers are trained to properly estimate their
critical periods and offer machine learning BrAC and recognize when BrAC is reaching
(ML)-driven, personalized “Just-in-Time” dangerous levels8 . At least one smart-
alcohol interventions4 . To allow real-time breathalyzer app (e.g., BACtrack) asks users
intervention, an ML model would need to be for subjective, self-estimates of their BrAC
capable of predicting future BAC risk prior to displaying objective BrAC results,
thresholds with relatively high sensitivity and giving a prospective chance to explore BrAC
specificity based on minimum input. Hence, discrimination naturalistically. To date, no
we sought to investigate whether breath study has explored predictors of heavy
alcohol concentration (BrAC) levels drinking using objective, naturalistic samples
associated with alcohol-related harms (BrAC of BrAC measurements—or studied temporal
≥ 0.08 g/dL)5 can be predicted with variations in real vs perceived intoxication—
reasonable accuracy in a large, international at a broad scale. To provide a foundation for
sample of smart-breathalyzer users, given digital health treatments in this area, this
behavioral, geolocation, and temporal data study leverages a unique dataset with
related to device and app usage. approximately one million BrAC
Prior studies have evaluated the possible observations to verify a prediction algorithm
value of personal breathalyzers for self- of high BrAC and to analyze temporal
monitoring alcohol use in clinical and variations in felt vs measured intoxication.
naturalistic settings6 . Some commercially ML techniques are ideally adapted to harness
accessible smart breathalyzers offer validity enormous amounts of data, such as those
equivalent to a police-grade device7 . provided by smart breathalyzers, and have
Although data collected by smart exhibited great performance in a range of
breathalyzers might be informative for health-related applications9 . One specific
predictive modeling, no research have family of ML algorithms, called ensemble
addressed this subject. To evaluate real-world tree algorithms10, has various advantages in
predictive capability, BrAC levels need to be this case, such as the ability to readily model
nonlinearities, interactions, and include captured in real time during drinking
missing data as predictive characteristics, as episodes.
well as the capacity to quickly parallelize and RESULTS
scale11. The ability to discover the important Characterization
aspects that explain the ML model can aid in A total of 973,264 user-initiated BrAC
determining the most effective modifiable recordings were obtained from 33,452 users,
targets. who used a BACtrack device for a median of
Digital phenotyping of behavior provides a 3.5 (interquartile range (IQR): 1.5–12.9)
new frontier in behavioral medicine12,13. months (Supplementary Figure 1) (see
However, very little is known about the “Methods” for more information).
naturalistic patterns of commercial smart- Supplementary Tables 1–3 provide counts of
breathalyzer use and their connection with the number of different users and recordings
population-based health outcomes, such as for each country and each state in the United
intoxicated driving-related death rates. Expert States, as well as the break down by year.
recommendations to minimize alcohol intake Roughly half (52%) of recordings were taken
with smartphone apps advocate self- in the United States, while an additional 35%
monitoring14. Moreover, research reveals could not be assigned to a nation. The mean
that the more people underestimate their state BrAC across all recordings was 0.057 ±
of drunkenness, the greater their chance of 0.065 g/dL and the within-user aggregated
driving after drinking15. Hence, requesting BrAC mean was 0.059 ± 0.042 g/dL.
users to provide a self-estimation of Analysis of state-level data indicated mean
drunkenness via the app, before they receive BrAC levels ranging from 0.035 g/dL in Utah
their objective findings from the smart to 0.133 g/dL in Montana (Supplementary
breathalyzer, might augment the benefits of Figure 3).
self-monitoring. This study is the first to Reflecting the overall amount and frequency
investigate whether ongoing self-monitoring of engagement, the median (IQR) number of
(via selfestimations and device use) does BrAC recordings taken per user was 80 (31–
actually result in reduced smart-breathalyzer- 207), and the median (IQR) number of days
measured blood alcohol levels over time in a on which at least one BrAC recording was
large cohort of smart-breathalyzer users. taken was 27 (10–68). On average, users
We wanted to apply an ML algorithm to registered 2.69 ± 2.21 BrAC measures every
smart-breathalyzer data, obtained from an day the device was used, with a median usage
international population and around one period of 106 (45–388). Userinput app data
million observations, in order to predict for the number of drinks drank had
elevated BrAC levels (≥0.08 g/dl). significant missing data (see “Methods”) and
As a sign of product value, we assessed the gave little useful variance, with the median
extent to which an ML system employing number being one drink entered (1–1) and the
smart-breathalyzer data could predict 95% percentile equating to two drinks
objectively high BrAC values, above and entered.
above a user’s subjective BrAC estimate. We evaluated whether naturalistic BrAC data
Next, we examine the larger patterns of use from the individuals in our particular cohort
and present explainability studies of the most may represent broader population-level
important predictive characteristics. To guide behaviors relevant to alcohol-related health
future smart-breathalyzer-based risks, using a portion of the data from 2014,
interventions, we evaluate whether frequent which corresponded with publicly available
usage of the App’s BrAC self-estimation data (see “Methods”). Specifically, we
feature acts to increase user’s BrAC evaluated whether states in the United States
discrimination capability over time, and demonstrating greater user-initiated
leverage the algorithm’s results to highlight smartbreathalyzer BrAC values had higher
where tailored messaging could be effective. impaired-driving death rates.
Finally, we illustrate the external validity of In regression analysis among 53,674 BrAC
these metrics by demonstrating associations observations from 2641 distinct users, we
with alcohol-related motor vehicle mortality observed a significant association between
across the United States. To our knowledge, higher average BrAC levels within our cohort
this is the first and only study using and higher motor vehicle death rates (B =
naturalistic, population-based BrAC data 92.160 (95% confidence interval (CI) =
60.493–123.826), z = 5.704, p < 0.001; Fig. tailored predictions.
1). Supplementary Figure 2 (view by the web
browser) presents a dynamic map of the 3. Predictive Models and Performance
United States, which allows the reader to flip Evaluation:
the view between state motor vehicle fatality
rates and state average BrAC levels by Several research have proved the feasibility
clicking on the picture backdrop. and efficiency of machine learning prediction
of BAC values using smart-breathalyzer
II. LITERARUTE REVIEW behavior data. For example, Li et al. (2018)
The use of machine learning for prediction of built a prediction model based on SVM
blood alcohol content (BAC) based on smart- algorithms that obtained great accuracy in
breathalyzer behavior is an innovative calculating BAC levels using breathalyzer
technique to alcohol monitoring and readings and user behavior patterns.
intervention. This literature review gives an Similarly, Wang et al. (2020) applied deep
overview of existing research on the learning approaches to predict BAC levels
application of machine learning algorithms to with higher accuracy and resilience.
predict BAC levels from data acquired by
smart breathalyzer devices. It analyzes the Performance evaluation of predictive models
technology breakthroughs, prediction models, often utilizes cross-validation approaches,
and prospective applications in encouraging such as k-fold cross-validation, to examine
responsible alcohol use and reducing alcohol- generalization ability and model
related events. dependability. Metrics such as mean absolute
error (MAE), root mean square error
1. Smart-Breathalyzer Devices: (RMSE), and correlation coefficients are used
to assess prediction accuracy and compare
Smart breathalyzer devices are portable, different modeling methodologies.
handheld devices fitted with sensors capable
of detecting alcohol vapor in exhaled air. 4. Applications and Implications:
These gadgets give an objective assessment
of BAC levels and deliver real-time feedback The integration of machine learning
to users on their alcohol use. Smart prediction of BAC levels with smart-
breathalyzers are commonly combined with breathalyzer devices has various potential
mobile applications or connected to cloud- uses in encouraging responsible alcohol
based platforms, allowing for data logging, consumption and reducing alcohol-related
analysis, and targeted treatments. mishaps. For instance, users can utilize
predictive BAC estimate to make educated
2. Machine Learning Techniques: decisions about whether to drive, arrange safe
transit alternatives, or decrease their alcohol
Machine learning algorithms play a vital role intake to stay under legal limits.
in interpreting the data produced by smart
breathalyzer devices and forecasting BAC Healthcare clinicians and addiction experts
values. Supervised learning methods, can employ predictive BAC models to
including as support vector machines (SVM), monitor patients' drinking patterns remotely,
random forests, and neural networks, are recognize early indicators of alcohol misuse
often utilized to create prediction models or relapse, and modify intervention tactics
based on variables collected from accordingly. Employers and law enforcement
breathalyzer behavior data. organizations may use apply BAC prediction
technology to enforce workplace safety
Features may include temporal patterns of standards, conduct alcohol screening, and
alcohol use, frequency and length of drinking avoid alcohol-related accidents or injuries.
sessions, user demographics, physiological
indicators, and contextual information such
III. PROPOSED METHODOLOGY
as time of day and location. Machine learning
techniques learn to find patterns and
correlations in these traits that are predictive The suggested technique seeks to create a
of BAC levels, providing accurate and machine learning model for predicting blood
alcohol content (BAC) based on smart- Breathalyzer readings: BAC levels measured
breathalyzer behavior data. This technique by the smart breathalyzer instrument.
explains the major procedures required in Drinking patterns: Frequency, length, and
data collection, preprocessing, feature severity of drinking bouts.
engineering, model training, and assessment User demographics: Age, gender, weight, and
to develop an accurate and trustworthy other important demographic information.
prediction model. Contextual information: Time of day, day of
the week, location, social context, and
1. Data Collection: environmental elements.
Interaction features: Derived features
The first phase includes gathering a complete representing connections between distinct
collection of smart-breathalyzer behavior variables or time-varying patterns in the data.
data, including breathalyzer readings, user
demographics, drinking patterns, and Feature selection approaches, such as
contextual information. Data will be collected correlation analysis, mutual information, or
using smart breathalyzer equipment equipped recursive feature reduction, may be utilized to
with sensors capable of detecting alcohol determine the most significant characteristics
vapor in exhaled air. Participants will be for forecasting BAC levels.
selected from varied demographic
backgrounds, including age, gender, and 4. Model Training and Evaluation:
drinking behaviors.
Supervised learning algorithms will be
Participants will be taught to use the smart trained on the retrieved characteristics to
breathalyzer gadget to monitor their BAC construct predictive models for BAC
levels during drinking sessions or at regular estimate. Various machine learning
intervals. Additional data, like as timestamps, techniques, such as support vector machines
location information, and user input, will be (SVM), random forests, gradient boosting, or
captured via mobile applications or cloud- deep neural networks, will be studied to
based platforms connected to the smart discover the most successful strategy.
breathalyzer equipment.
The dataset will be separated into training,
2. Data Preprocessing: validation, and test sets to train and assess the
performance of the prediction models. Cross-
The gathered data will undergo preprocessing validation approaches, such as k-fold cross-
to clean, filter, and standardize the dataset for validation, will be employed to test model
analysis. This may entail eliminating outliers, generalization and resilience.
addressing missing numbers, and normalizing
or scaling numerical characteristics. Data Model performance will be assessed using
preparation strategies will ensure the integrity measures such as mean absolute error
and quality of the dataset for further analysis. (MAE), root mean square error (RMSE),
correlation coefficients, and Bland-Altman
Temporal alignment and synchronization of analysis. The predictive model with the
data streams may be conducted to assure greatest performance on the validation set
consistency and coherence across multiple will be selected for further assessment on the
data modalities. Quality control methods will test set.
be used to identify and rectify any data
abnormalities or inconsistencies that may 5. Model Interpretation and Validation:
compromise model performance.
Interpretability study will be undertaken to
3. Feature Engineering: identify the parameters influencing BAC
forecasts and give actionable insights for
Feature engineering plays a significant role in users and stakeholders. Feature significance
obtaining relevant and discriminative metrics, such as permutation feature
characteristics from raw smart-breathalyzer importance or SHapley Additive exPlanations
behavior data. Features may include: (SHAP), will be utilized to find the most
significant features in the prediction model.
The trained prediction model will be verified
in independent datasets or real-world contexts
to verify its generalizability and
dependability. External validation will give
more proof of the model's accuracy and
usefulness in forecasting BAC levels across
varied populations and environmental
situations.

6. Ethical Considerations:

Throughout the study process, ethical


questions pertaining to user privacy, data
security, and appropriate use of predictive
BAC technology will be addressed. Measures
will be implemented to guarantee participant
confidentiality, informed consent, and
compliance with relevant rules and norms
regulating human subjects research and data
protection.

You might also like