Professional Documents
Culture Documents
2022 Ganguly _ WNL-2022-200480
2022 Ganguly _ WNL-2022-200480
Results
In the pilot cohort, accuracy of automated detection for individual seizures was modest
(sensitivity 0.50, PPV 0.60). At the record level (did the recording contain seizures or not?),
sensitivity was higher (pilot cohort 0.78, expanded cohort 0.91), PPV was low (pilot cohort
0.40, expanded cohort 0.08), and NPV was high (pilot cohort 0.88, expanded cohort 0.97).
Different software versions (version 12 vs 13) performed similarly. Sensitivity was higher for
records containing focal-onset seizures compared to generalized-onset seizures (0.93 vs 0.85,
p = 0.012).
Discussion
In critical care continuous EEG recordings, automated detection of individual seizures had rates
of both false negatives and false positives that bring into question its utility as a seizure alarm in
clinical practice. At the level of entire EEG records, the absence of automated detections
accurately predicted EEG records without true seizures. The true value of Persyst automated
seizure detection appears to lie in triaging of low-risk EEGs.
Classification of Evidence
This study provides Class II evidence that an automated seizure detection program cannot
accurately identify EEG records that contain seizures.
From the Department of Neurology (T.M.G., C.A.E., K.A.D., B.L., J.P.), and Penn Statistics in Imaging and Visualization Endeavor (PennSIVE) Center of Excellence (D.T., R.T.S.), Center
for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania; Department of Biostatistics, Epidemiology, & Informatics (D.T., R.T.S.) and
Center for Biomedical Image Computing and Analytics (R.T.S.), University of Pennsylvania, Philadelphia.
Go to Neurology.org/N for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.
The use of continuous long-term EEG (cEEG) in critical care critical for use as a real-time alarm, while record-level de-
settings continues to rise, supported by consensus recommen- tection is relevant for triaging studies for manual EEG review.
dations and emerging evidence that it improves outcomes.1,2 The study presented here provides a systematic and unbiased
Manually reviewing large volumes of EEG data is labor intensive, analysis of the automated seizure detection performance of
necessitating an alternative method of quickly interpreting cEEG. Persyst in inpatient long-term continuous EEG monitoring.
Nearly all neurophysiologists use quantitative EEG, and up to half Our primary research question sought to evaluate the sensi-
do not review all pages of EEG.3 Automated seizure detectors tivity, specificity, and positive predictive value (PPV), and
have the potential to facilitate the efficiency of cEEG review negative predictive value (NPV) of this tool at both the in-
through triaging low-risk EEGs and allows the timely identifica- dividual seizure and record levels.
tion and treatment of seizures.3-6 The use of automated seizure
detection systems is becoming increasingly widespread, but data
proving their reliability and how best to apply them are limited. Methods
Standard Protocol Approvals, Registrations,
Persyst is the most widely used commercially available auto- and Patient Consents
mated seizure detection software with Food and Drug Ad- This study was approved by the institutional review board at
ministration clearance. It is used by thousands of neurologists the University of Pennsylvania with waiver of informed
at hundreds of hospitals worldwide, including 48 of the 50 consent.
U.S. News & World Report’s top-ranked hospitals.7 While
Persyst offers quantitative spectral arrays to reflect EEG pat- Data Collection
terns, these quantitative visual tools have not been thoroughly For this study, we considered only 24-hour cEEG recordings
validated in adults, have demonstrated significant variability, performed on inpatients at the Hospitals of the University of
and require appropriate standardization before being applied Pennsylvania (the Hospital of the University of Pennsylvania,
routinely for clinical decision-making.3,8-11 However, the Presbyterian Medical Center, and Pennsylvania Hospital),
software does offer an automated seizure detection tool that excluding patients admitted to the EMU. In all cases, cEEG
asserts a yes or no interpretation of whether a sample of EEG had been clinically requested by the primary inpatient team to
is consistent with a seizure. Overall, there has been little in- evaluate for seizures. EEG electrodes were placed by trained
dependent assessment of the Persyst algorithms; the largest and registered EEG technologists using the international 10-
and most rigorous studies have been performed in affiliation 20 system and with eye leads. EEG data were collected at a
with the company itself.12-14 Patient selection for validation is minimum sampling rate of 256 Hz using Natus (Natus Inc,
also a potential issue that could influence reports of the Pleasanton, CA) XLtek equipment running Natus Neuro-
software performance: the accuracy of the Persyst automated works version 8.5. Persyst version 12 or 13 (Persyst Inc,
seizure detection tool has been studied primarily in epilepsy Solana Beach, CA), henceforth labeled P12 or P13, was run
monitoring units (EMUs)4,15 and ambulatory EEGs in on each EEG, either at the time of data capture or later (as
adults.16 Critical care patients present an additional challenge outlined below). EEGs were interpreted by trained epi-
for automated seizure detection due to abnormal background leptologists credentialed to interpret EEGs at the University
rhythms and unusual ictal patterns that may confound auto- of Pennsylvania. EEG reports were generated with custom-
mated algorithms.17 Yet, automated seizure detection is ar- ized software that stores EEG interpretations in a searchable
guably most relevant in the intensive care unit (ICU), where SQL database. Searchable fields include demographic data
seizures are common, rapid treatment is desired, and manual and terms from the American Clinical Neurophysiology So-
review is often not immediately available.6,18,19 ciety ICU nomenclature (including background amplitude,
organization, symmetry, rhythmic and periodic patterns, and
We studied the performance of Persyst automated seizure details on seizures). Because these fields are required by our
detection in a large sample of continuous ICU EEG record- EEG system for report generation, there were no missing data.
ings. We measured the performance of automated detections For this retrospective study, 2 EEG datasets were generated.
at the level of individual seizures (whether each seizure was
correctly detected) and the accuracy of automated detections We first identified a pilot cohort of cEEG recordings that
at the level of EEG records (whether records were correctly contained seizures. We selected ICU EEG reports coded as
identified as containing seizures, even if individual seizures containing seizures, recorded from 2015 to 2019, and ran-
were not accurately detected). The ability of automated de- domly selected 23 EEG recordings from 23 different patients.
tection software to correctly identify individual seizures is Because the prevalence of seizures among inpatient cEEGs
We next identified an expanded cohort to study algorithm Next, we analyzed the accuracy of automated seizure de-
performance at scale. We examined all cEEGs recorded be- tection in the pilot cohort at the level of the EEG record rather
tween December 1, 2017, and October 30, 2020, and included than the individual seizure level. Each EEG record was coded
all cEEGs that were analyzed by P12 or P13 at the time of as having ≥1 human-detected seizures (yes/no) and as having
recording, a total of 7,924 cEEGs studies from 2,854 unique ≥1 Persyst-detected seizures (yes/no), regardless of when
patients. The report database was queried for human reported during the record these detections occurred. Sensitivity,
seizures and compared to the presence or absence of Persyst specificity, NPV, and PPV of Persyst-detected seizures
automated seizure detections at any point in the record. In compared to human-detected seizures were calculated.
this dataset, no effort was made to exclude patient discon- Adjustment for clustered data was not performed because
nections or excessive artifact as a cause of false automated each EEG record in this cohort was an independent ob-
detection, reflecting real-world practice. servation. Exact binomial CIs were calculated with the R
package epiR.24
To examine the seizure detections across a dataset of this size,
an EEG comment extractor was created in collaboration with For our expanded cohort of 7,924 EEG records, seizure de-
Natus Neuroworks that allowed us to directly read the EEG tection was again analyzed at the level of each entire EEG
files, including both human- and Persyst-generated comments. record. Similar to our methods for pilot cohort -level review, we
a
Mann-Whitney U test.
b
The χ2 test.
c
Fisher exact test.
a
Mann-Whitney U test.
b
The χ2 test.
c
Fisher exact test.
low-voltage background (Mann-Whitney U test, p = 0.001). No seizures were present in 786 of 7,924 records (10%). Auto-
other EEG feature was significantly associated with successful mated seizure detections were present in 6,079 of 7,924 re-
automated detection of individual seizures. cords (77%). The accuracy of Persyst at the record level is
shown in Figure 1 and Table 3. Results in this expanded
Because detection of some but not all seizures within an EEG cohort overall showed trends similar to our pilot cohort.
record may be adequate for triaging EEG, we assessed the Persyst detected seizures in 723 of 786 records that contained
performance of Persyst in identifying seizures at the record level. seizures (adjusted sensitivity 0.91, 95% CI 0.88, 0.93). Persyst
That is, does the automated seizure detection identify the also detected seizures in 5,356 of 7,138 records that did not
presence of a seizure anywhere in the record? In our pilot cohort contain seizures (adjusted false alarm rate 0.74, 95% CI 0.72,
of 85 EEG records, Persyst detected seizures in 18 of 23 records 0.75). The PPV of Persyst detections was low (0.08, 95% CI
that contained seizures (sensitivity 0.78, 95% CI 0.56, 0.93). 0.07, 0.09), meaning that a human reader would have to read
Persyst also detected seizures in 27 of 62 records that did not 12.5 EEGs in which automated detections occurred to find 1
contain seizures (false alarm rate 44%, 95% CI 31%, 57%). record with true seizures. On the other hand, the NPV was
high (0.97, 95% CI 0.96, 0.98), meaning that if Persyst did not
On the basis of our finding that Persyst performed particularly detect seizures, there was only a 3% chance that true seizures
poorly at the individual seizure level for EEGs with low- were present. To account for potential bias from repeated
voltage backgrounds, we performed a post hoc analysis in EEGs from single individuals, we limited the analysis to only
which we removed the low-voltage EEGs and then repeated the first EEG from each subject (2,854 unique individuals/
the record-level analysis. In the remaining 18 EEGs contain- EEGs). Results were similar, as demonstrated in eTable 1,
ing seizures and without low-voltage background, Persyst links.lww.com/WNL/B938. Assuming that each record in
correctly detected seizures in all 18 of 18 at the record level, which there were no human or automated seizure detections
corresponding to both a sensitivity and an NPV of 100%. was ≈24 hours (total 44,280 hours), Persyst accurately
identified 42,768 hours of EEG as being seizure-free.
We expanded these analyses to a dataset of 7,924 EEG records
from 2,854 unique individuals to determine the performance We performed several subgroup analyses in this expanded
of Persyst at the record level at scale. Human-detected dataset, using the same GEE model as previously described
Seizure level, pilot cohort 229 0.50 (0.34, 0.66) — 0.60 (0.42, 0.75) —
Record level, pilot cohort 85 0.78 (0.56, 0.93) 0.56 (0.43, 0.69) 0.40 (0.26, 0.56) 0.88 (0.73, 0.96)
Record level, expanded cohort 7,924 0.91 (0.88, 0.93) 0.26 (0.25, 0.28) 0.08 (0.07, 0.09) 0.97 (0.96, 0.98)
EEGs with low-voltage backgrounds had significantly lower sensitivities of automated seizure detection at the level of individual seizures compared to EEG
records with normal-voltage backgrounds (p = 0.001). All other features we assessed did not significantly affect the sensitivity of automated seizure detection.
LPD = lateralized periodic discharge; PDR = posterior dominant rhythm.
12 1,310 0.91 (0.80, 0.96) 0.24 (0.21, 0.27) 0.05 (0.04, 0.07) 0.98 (0.95, 0.99)
13 6,605 0.91 (0.88, 0.93) 0.27 (0.25, 0.28) 0.09 (0.08, 0.10) 0.97 (0.96, 0.98)
but with a subgroup indicator variable to enable comparisons. In prompting immediate manual EEG review or treatment of the
this expanded cohort, the estimated sensitivity of automated de- patient. Our data argue against the use of Persyst for this
tections was not significantly different in EEGs with low-voltage function in clinical practice. Relying on automated detections
backgrounds than in those with normal voltage (p > 0.05), in would have missed more than half of all true seizures in our
contrast to the hypothesis generated by our pilot cohort. We then pilot cohort. In addition, the false positives pose a challenge,
examined the difference between versions P12 and P13, both of as evidenced by the low PPVs (i.e., the likelihood that an
which performed similarly (Table 4). In addition, we examined automated detection is a true seizure) at the level of both
the difference between focal and generalized seizures. Among the individual seizures and entire records across our cohorts. An
786 records that contained human-detected seizures, 653 records ideal seizure alarm will require a substantially higher sensi-
contained only focal-onset seizures, and 111 records contained tivity (i.e., miss very few true seizures) and lower false alarm
only generalized-onset seizures. Sensitivity of automated de- rate than the values seen here.
tection was higher for focal-onset seizures than for generalized-
onset seizures (0.93 vs 0.85, p = 0.012, Table 5). A second potential role for automated seizure detection is to
reduce the volume of EEG for manual review. For example,
Last, to further account for potential bias from repeated EEGs EEG records with a high probability of seizures could be
for some individuals in the cohort, we limited the analysis to reviewed earlier or more often, whereas other records could be
only the first EEG recorded for each of 2,854 unique indi- reviewed less often or more briefly if the probability of seizures
viduals. Results were similar to those of the entire expanded were sufficiently low. The relevant metric here is the NPV,
cohort, suggesting that repeated measures was not an im- i.e., the probability that the absence of automated detections
portant source of bias. This study provides Class II evidence reflects the absence of true seizures. We found high NPVs at the
that an automated seizure detection program cannot accu- record level in both our pilot and expanded cohorts, up to 97%.
rately identify EEG records that contain seizures. This may be adequate to justify clinically useful triage of records
with low probability of seizures based on lack of automated
detections. It is important to note that our data support this
Discussion kind of triaging only at the level of entire EEG records, not at
In this study, we measured the accuracy of Persyst for auto- the level of individual automated detections. That is, if no
mated seizure detection in 24-hour critical care EEG records. At automated detections are present, one can be 97% confident
the level of individual seizures, we found modest performance of that the record contains no seizures; whereas if automated
automated seizure detections, with fewer than half of seizures detections were present, our data suggest that the entire record
detected. At the level of EEG records, in a large cohort of 7,924 should be reviewed for accurate seizure detection, rather than
records, we found that the presence of automated seizure de- limiting the review to only the individual detection events,
tections was a poor predictor of true seizures in the recording, which would be likely to miss true seizures.
while the absence of automated seizure detections was an ex-
cellent predictor that the record did not contain true seizures. Prior studies on the accuracy of Persyst for automated seizure
detection have shown mixed findings. In the ICU setting,
Automated seizure detection can play at least 2 different roles there has been exploration of the use of spectral array
in clinical care. First, it could serve as a real-time seizure alarm, analysis,3,6,8,11 particularly in the pediatric population, but the
a
Wald test.
Our study extends these prior findings in several ways. First, Automated seizure detection in critical care EEG is an im-
this is the largest study of the automated seizure detection portant need, and the true value of Persyst lies in triaging
accuracy on inpatient cEEGs outside the EMU, examining low-risk EEGs We found that its performance at detecting
performance at both the individual seizure and study levels. individual seizures was not sufficient for use as a seizure alarm
Our findings indicate lower performance than has been in clinical practice, given a poor PPV. At the level of entire
reported by previous studies in different patient populations. records, we found that the absence of automated seizure de-
This is important because critical care populations are a major tections could be useful to triage studies with low probability
source of continuous EEG recording and the need for of containing seizures. In our dataset, Persyst accurately tri-
automation-assisted interpretation is particularly acute. In aged up to 42,000 hours of EEG recordings as seizure-free. On
subgroup analyses, we found slightly higher automated sei- the basis of these findings, our institution plans to use auto-
zure detection sensitivity for focal rather than generalized mated seizure detection as a tool in our morning workflow
seizures, which, to the best of our knowledge, has not been and triage strategy but not as a seizure detector. Future studies
previously reported.11,16 should aim to improve the accuracy of automated seizure
detection and to measure its accuracy in different patient
The study has several limitations. Persyst version 14 (P14) populations. It is anticipated that future seizure detection al-
was not available at the time of data analysis in this report. gorithms will close the gaps identified here, but human efforts
However, we did not find significant differences between cannot be minimized at present.
versions P12 and P13, and neither of those versions achieved
adequate performance for fully automated seizure detection. Acknowledgment
We intend to compare P14 performance to these results in the B. Litt is supported by the following NIH grant from the
future. Second, we have not provided specificity or NPVs for National Institute of Neurological Disorders and Stroke:
seizure-level analyses because true negatives (no human- DP1NS122038.
detected seizure, no Persyst-detected seizure) were not
countable events. These were countable at the record level Study Funding
only. Third, the pilot cohort did not contain any EEG patterns No targeted funding reported.
classified as ictal-interictal continuum, and we cannot com-
ment on the accuracy of Persyst in that context (although the Disclosure
expanded cohort did contain such EEG patterns, we did not T.M. Ganguly, C. Ellis, D. Tu, R.T. Shinohara, and K.A. Davis
analyze that subgroup specifically). Although the pilot cohort report no disclosures relevant to the manuscript. B. Litt has
allowed assessment of individual seizures, the expanded co- licensed intellectual property through the University of Penn-
hort was too large for such analyses, particularly for correla- sylvania in exchange for equity in the following companies:
tion of individual automated markings with individual seizures NeuroPace, MC10, and Blackfynn. He is a consultant for 4
and for accounting for automated detections during periods of Catalayzer, including Liminal Neurosciences, Tesseract, Hy-
excess artifact. These differences likely accounted for the perfine, Detect, and AI Therapeutics. None of these entities
difference in PPVs between our pilot seizure-level and ex- have sponsored this work and their value is not affected by this
panded record-level cohorts. Our study was also limited by research. J. Pathmanathan reports no disclosures relevant to the
the number of manually reviewed EEGs, and a larger pilot manuscript. Go to Neurology.org/N for full disclosures.
cohort sample size may lead to more accurate results. The
expanded cohort, although with a much larger sample size, Publication History
was obtained from a single center. Performing this study Received by Neurology April 11, 2021. Accepted in final form
across multiple sites may more comprehensively represent February 8, 2022.