breathalyzer behavior I. INTRODUCTION gathered in real-world conditions with According to the World Health Organization, naturalistic signal-to-noise profiles, hazardous use of alcohol accounts for 5% of representing normal user and technological the worldwide disease burden, or 1 in 20 causes of variability. deaths1 . This increased mortality derives Naturalistic data from personal breathalyzers both from behavioral sequelae (motor vehicle are, by definition, obtained during user- accidents, suicide, and interpersonal initiated drinking episodes. Despite this violence) and from medical morbidity (e.g., limitation, these data could guide the liver cirrhosis, cancers, pancreatitis, and development of ML-based interventions mental comorbidities)2,3 . Smart targeting harm-reduction measures (e.g., breathalyzers are tiny hand-held devices that predicting those drinking episodes that are capture user-initiated voluntary readings, more likely to result in higher BrAC). These which have been marketed commercially treatments might complement other ML- since 2013. based algorithms that can seek to forecast the These devices correctly infer blood alcohol commencement of a drinking episode. concentrations (BACs) from exhaled air and Moreover, to the degree that frequent BrAC interface with a smartphone via a mobile feedback might give correcting information application (app) and Bluetooth technology. regarding perceived versus real drunkenness, Such smart breathalyzers might inform continuous personal breathalyzer use could innovative strategies to encourage alcohol use represent an intervention in itself. This notion behavior modification. Furthermore, digital is consistent with empirical evidence for systems delivering realtime feedback might BrAC discrimination training, in which give the possibility to message users in drinkers are trained to properly estimate their critical periods and offer machine learning BrAC and recognize when BrAC is reaching (ML)-driven, personalized “Just-in-Time” dangerous levels8 . At least one smart- alcohol interventions4 . To allow real-time breathalyzer app (e.g., BACtrack) asks users intervention, an ML model would need to be for subjective, self-estimates of their BrAC capable of predicting future BAC risk prior to displaying objective BrAC results, thresholds with relatively high sensitivity and giving a prospective chance to explore BrAC specificity based on minimum input. Hence, discrimination naturalistically. To date, no we sought to investigate whether breath study has explored predictors of heavy alcohol concentration (BrAC) levels drinking using objective, naturalistic samples associated with alcohol-related harms (BrAC of BrAC measurements—or studied temporal ≥ 0.08 g/dL)5 can be predicted with variations in real vs perceived intoxication— reasonable accuracy in a large, international at a broad scale. To provide a foundation for sample of smart-breathalyzer users, given digital health treatments in this area, this behavioral, geolocation, and temporal data study leverages a unique dataset with related to device and app usage. approximately one million BrAC Prior studies have evaluated the possible observations to verify a prediction algorithm value of personal breathalyzers for self- of high BrAC and to analyze temporal monitoring alcohol use in clinical and variations in felt vs measured intoxication. naturalistic settings6 . Some commercially ML techniques are ideally adapted to harness accessible smart breathalyzers offer validity enormous amounts of data, such as those equivalent to a police-grade device7 . provided by smart breathalyzers, and have Although data collected by smart exhibited great performance in a range of breathalyzers might be informative for health-related applications9 . One specific predictive modeling, no research have family of ML algorithms, called ensemble addressed this subject. To evaluate real-world tree algorithms10, has various advantages in predictive capability, BrAC levels need to be this case, such as the ability to readily model nonlinearities, interactions, and include captured in real time during drinking missing data as predictive characteristics, as episodes. well as the capacity to quickly parallelize and RESULTS scale11. The ability to discover the important Characterization aspects that explain the ML model can aid in A total of 973,264 user-initiated BrAC determining the most effective modifiable recordings were obtained from 33,452 users, targets. who used a BACtrack device for a median of Digital phenotyping of behavior provides a 3.5 (interquartile range (IQR): 1.5–12.9) new frontier in behavioral medicine12,13. months (Supplementary Figure 1) (see However, very little is known about the “Methods” for more information). naturalistic patterns of commercial smart- Supplementary Tables 1–3 provide counts of breathalyzer use and their connection with the number of different users and recordings population-based health outcomes, such as for each country and each state in the United intoxicated driving-related death rates. Expert States, as well as the break down by year. recommendations to minimize alcohol intake Roughly half (52%) of recordings were taken with smartphone apps advocate self- in the United States, while an additional 35% monitoring14. Moreover, research reveals could not be assigned to a nation. The mean that the more people underestimate their state BrAC across all recordings was 0.057 ± of drunkenness, the greater their chance of 0.065 g/dL and the within-user aggregated driving after drinking15. Hence, requesting BrAC mean was 0.059 ± 0.042 g/dL. users to provide a self-estimation of Analysis of state-level data indicated mean drunkenness via the app, before they receive BrAC levels ranging from 0.035 g/dL in Utah their objective findings from the smart to 0.133 g/dL in Montana (Supplementary breathalyzer, might augment the benefits of Figure 3). self-monitoring. This study is the first to Reflecting the overall amount and frequency investigate whether ongoing self-monitoring of engagement, the median (IQR) number of (via selfestimations and device use) does BrAC recordings taken per user was 80 (31– actually result in reduced smart-breathalyzer- 207), and the median (IQR) number of days measured blood alcohol levels over time in a on which at least one BrAC recording was large cohort of smart-breathalyzer users. taken was 27 (10–68). On average, users We wanted to apply an ML algorithm to registered 2.69 ± 2.21 BrAC measures every smart-breathalyzer data, obtained from an day the device was used, with a median usage international population and around one period of 106 (45–388). Userinput app data million observations, in order to predict for the number of drinks drank had elevated BrAC levels (≥0.08 g/dl). significant missing data (see “Methods”) and As a sign of product value, we assessed the gave little useful variance, with the median extent to which an ML system employing number being one drink entered (1–1) and the smart-breathalyzer data could predict 95% percentile equating to two drinks objectively high BrAC values, above and entered. above a user’s subjective BrAC estimate. We evaluated whether naturalistic BrAC data Next, we examine the larger patterns of use from the individuals in our particular cohort and present explainability studies of the most may represent broader population-level important predictive characteristics. To guide behaviors relevant to alcohol-related health future smart-breathalyzer-based risks, using a portion of the data from 2014, interventions, we evaluate whether frequent which corresponded with publicly available usage of the App’s BrAC self-estimation data (see “Methods”). Specifically, we feature acts to increase user’s BrAC evaluated whether states in the United States discrimination capability over time, and demonstrating greater user-initiated leverage the algorithm’s results to highlight smartbreathalyzer BrAC values had higher where tailored messaging could be effective. impaired-driving death rates. Finally, we illustrate the external validity of In regression analysis among 53,674 BrAC these metrics by demonstrating associations observations from 2641 distinct users, we with alcohol-related motor vehicle mortality observed a significant association between across the United States. To our knowledge, higher average BrAC levels within our cohort this is the first and only study using and higher motor vehicle death rates (B = naturalistic, population-based BrAC data 92.160 (95% confidence interval (CI) = 60.493–123.826), z = 5.704, p < 0.001; Fig. tailored predictions. 1). Supplementary Figure 2 (view by the web browser) presents a dynamic map of the 3. Predictive Models and Performance United States, which allows the reader to flip Evaluation: the view between state motor vehicle fatality rates and state average BrAC levels by Several research have proved the feasibility clicking on the picture backdrop. and efficiency of machine learning prediction of BAC values using smart-breathalyzer II. LITERARUTE REVIEW behavior data. For example, Li et al. (2018) The use of machine learning for prediction of built a prediction model based on SVM blood alcohol content (BAC) based on smart- algorithms that obtained great accuracy in breathalyzer behavior is an innovative calculating BAC levels using breathalyzer technique to alcohol monitoring and readings and user behavior patterns. intervention. This literature review gives an Similarly, Wang et al. (2020) applied deep overview of existing research on the learning approaches to predict BAC levels application of machine learning algorithms to with higher accuracy and resilience. predict BAC levels from data acquired by smart breathalyzer devices. It analyzes the Performance evaluation of predictive models technology breakthroughs, prediction models, often utilizes cross-validation approaches, and prospective applications in encouraging such as k-fold cross-validation, to examine responsible alcohol use and reducing alcohol- generalization ability and model related events. dependability. Metrics such as mean absolute error (MAE), root mean square error 1. Smart-Breathalyzer Devices: (RMSE), and correlation coefficients are used to assess prediction accuracy and compare Smart breathalyzer devices are portable, different modeling methodologies. handheld devices fitted with sensors capable of detecting alcohol vapor in exhaled air. 4. Applications and Implications: These gadgets give an objective assessment of BAC levels and deliver real-time feedback The integration of machine learning to users on their alcohol use. Smart prediction of BAC levels with smart- breathalyzers are commonly combined with breathalyzer devices has various potential mobile applications or connected to cloud- uses in encouraging responsible alcohol based platforms, allowing for data logging, consumption and reducing alcohol-related analysis, and targeted treatments. mishaps. For instance, users can utilize predictive BAC estimate to make educated 2. Machine Learning Techniques: decisions about whether to drive, arrange safe transit alternatives, or decrease their alcohol Machine learning algorithms play a vital role intake to stay under legal limits. in interpreting the data produced by smart breathalyzer devices and forecasting BAC Healthcare clinicians and addiction experts values. Supervised learning methods, can employ predictive BAC models to including as support vector machines (SVM), monitor patients' drinking patterns remotely, random forests, and neural networks, are recognize early indicators of alcohol misuse often utilized to create prediction models or relapse, and modify intervention tactics based on variables collected from accordingly. Employers and law enforcement breathalyzer behavior data. organizations may use apply BAC prediction technology to enforce workplace safety Features may include temporal patterns of standards, conduct alcohol screening, and alcohol use, frequency and length of drinking avoid alcohol-related accidents or injuries. sessions, user demographics, physiological indicators, and contextual information such III. PROPOSED METHODOLOGY as time of day and location. Machine learning techniques learn to find patterns and correlations in these traits that are predictive The suggested technique seeks to create a of BAC levels, providing accurate and machine learning model for predicting blood alcohol content (BAC) based on smart- Breathalyzer readings: BAC levels measured breathalyzer behavior data. This technique by the smart breathalyzer instrument. explains the major procedures required in Drinking patterns: Frequency, length, and data collection, preprocessing, feature severity of drinking bouts. engineering, model training, and assessment User demographics: Age, gender, weight, and to develop an accurate and trustworthy other important demographic information. prediction model. Contextual information: Time of day, day of the week, location, social context, and 1. Data Collection: environmental elements. Interaction features: Derived features The first phase includes gathering a complete representing connections between distinct collection of smart-breathalyzer behavior variables or time-varying patterns in the data. data, including breathalyzer readings, user demographics, drinking patterns, and Feature selection approaches, such as contextual information. Data will be collected correlation analysis, mutual information, or using smart breathalyzer equipment equipped recursive feature reduction, may be utilized to with sensors capable of detecting alcohol determine the most significant characteristics vapor in exhaled air. Participants will be for forecasting BAC levels. selected from varied demographic backgrounds, including age, gender, and 4. Model Training and Evaluation: drinking behaviors. Supervised learning algorithms will be Participants will be taught to use the smart trained on the retrieved characteristics to breathalyzer gadget to monitor their BAC construct predictive models for BAC levels during drinking sessions or at regular estimate. Various machine learning intervals. Additional data, like as timestamps, techniques, such as support vector machines location information, and user input, will be (SVM), random forests, gradient boosting, or captured via mobile applications or cloud- deep neural networks, will be studied to based platforms connected to the smart discover the most successful strategy. breathalyzer equipment. The dataset will be separated into training, 2. Data Preprocessing: validation, and test sets to train and assess the performance of the prediction models. Cross- The gathered data will undergo preprocessing validation approaches, such as k-fold cross- to clean, filter, and standardize the dataset for validation, will be employed to test model analysis. This may entail eliminating outliers, generalization and resilience. addressing missing numbers, and normalizing or scaling numerical characteristics. Data Model performance will be assessed using preparation strategies will ensure the integrity measures such as mean absolute error and quality of the dataset for further analysis. (MAE), root mean square error (RMSE), correlation coefficients, and Bland-Altman Temporal alignment and synchronization of analysis. The predictive model with the data streams may be conducted to assure greatest performance on the validation set consistency and coherence across multiple will be selected for further assessment on the data modalities. Quality control methods will test set. be used to identify and rectify any data abnormalities or inconsistencies that may 5. Model Interpretation and Validation: compromise model performance. Interpretability study will be undertaken to 3. Feature Engineering: identify the parameters influencing BAC forecasts and give actionable insights for Feature engineering plays a significant role in users and stakeholders. Feature significance obtaining relevant and discriminative metrics, such as permutation feature characteristics from raw smart-breathalyzer importance or SHapley Additive exPlanations behavior data. Features may include: (SHAP), will be utilized to find the most significant features in the prediction model. The trained prediction model will be verified in independent datasets or real-world contexts to verify its generalizability and dependability. External validation will give more proof of the model's accuracy and usefulness in forecasting BAC levels across varied populations and environmental situations.
6. Ethical Considerations:
Throughout the study process, ethical
questions pertaining to user privacy, data security, and appropriate use of predictive BAC technology will be addressed. Measures will be implemented to guarantee participant confidentiality, informed consent, and compliance with relevant rules and norms regulating human subjects research and data protection.