Craig 2023

IJID Regions 8 (2023) 157–163
Contents lists available at ScienceDirect
IJID Regions
journal homepage: www.elsevier.com/locate/ijregi
Comparison of statistical methods for the early detection of disease

outbreaks in small population settings
Adam T. Craig a,b,∗, Robert Neil F. Leong b, Mark W. Donoghoec, David Muscatello b,
Vio Jianu C. Mojica d, Christine Joy M. Octavo d
a
School of Public Health, The University of Queensland, Herston, Australia
b
School of Population Health, University of New South Wales, Sydney, Kensington, Australia
c
Mark Wainwright Analytical Centre, University of New South Wales, Sydney, Kensington, Australia
d
Department of Physical Sciences and Mathematics, University of the Philippines, Manila, Philippines
a r t i c l e i n f o a b s t r a c t
Keywords: Objectives: This study examines the performance of 6 aberration detection algorithms for the early detection of
Outbreak disease outbreaks in small population settings using syndrome-based early warning surveillance data collected by
Syndromic surveillance the Pacific Syndromic Surveillance System (PSSS). Although previous studies have proposed statistical methods
Performance
for detecting aberrations in larger datasets, there is limited knowledge about how these perform in the presence
Evaluation
of small numbers of background cases.
Simulation
Modelling Methods: To address this gap a simulation model was developed to test and compare the performance of the 6
Pacific Islands algorithms in detecting outbreaks of different magnitudes, durations, and case distributions.
Results: The study found that while the Early Aberration Reporting System–C1 algorithm developed by Hut-
wagner et al. outperformed others, no single approach provided reliable monitoring across all outbreak types.
Furthermore, aberration detection approaches could only detect very large and acute outbreaks with any relia-
bility.
Conclusion: The findings of this study suggest that algorithm-based approaches to outbreak signal detection
perform poorly when applied to settings with small numbers of background cases and should not be relied upon
in these contexts. This highlights the need for alternative approaches for accurate and timely outbreak detection
in small population settings, particularly those that are resource-constrained.
Introduction of cases of an event of interest during a surveillance reporting or obser-

vation interval (e.g., latest day or week) and geographical location; and
Public health surveillance is the ongoing, systematic collection, anal- (ii) the comparison of the expected estimate with the observed value
ysis, and dissemination of health-related data to provide intelligence during the most recent reporting interval. A statistical method is used
that can be used to monitor and improve the health of populations. to detect or signal an observed recent incidence of illness that is higher
Used correctly, surveillance outputs can raise risk awareness and guide than expected (an ’aberration’) and are epidemiologically important.
evidence-based decision-making [1]. One objective of surveillance for The main differences among statistical methods used lie in how the
communicable diseases is the rapid identification of outbreaks so that expected value is calculated and how the statistical significance of the
control measures can be quickly implemented and the impact can be difference between the expected and observed value is determined [4].
minimized [1,2]. Several aberration detection methods have been proposed including sta-
A commonly used method for the early detection of outbreaks is tistical process control, change-point detection, and spatial statistics
syndromic surveillance [3]. Syndromic surveillance can be defined as a methods [5–9]. Most research has been conducted on datasets drawn
public health surveillance approach where health staff monitor the signs from large populations. The literature concludes that no single algo-
and symptoms of patients to identify patterns that may signal a poten- rithm outperforms others in all contexts and that selecting a syndromic
tial outbreak [1]. Syndromic surveillance for outbreak detection has 2 algorithm requires the weighing up of complex factors – including sys-
essential statistical elements: (i) the estimation of an expected number tem objective, system design, data quality, and data stability.
∗
Corresponding author: Tel.: +61 (0)7 3346 4922.
E-mail address: adam.craig@uq.edu.au (A.T. Craig).
https://doi.org/10.1016/j.ijregi.2023.08.007
Received 29 May 2023; Received in revised form 10 August 2023; Accepted 12 August 2023
2772-7076/© 2023 The Author(s). Published by Elsevier Ltd on behalf of International Society for Infectious Diseases. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
A.T. Craig, R.N.F. Leong, M.W. Donoghoe et al. IJID Regions 8 (2023) 157–163
With many small population states (including small island develop- uses a 2-reporting period guard band; however, it calculates a
ing states) dependent on syndromic surveillance for the early detection partial sum for the last 3 reporting periods of the positive devia-
of outbreaks [3,10,11] there remains a need for performance evaluation tion of the current value from the mean.
based on these contexts, especially in settings where background cases (6) An exponentially weighted moving average (EWMA) approach
are scarce to begin with. To address this, we performed a simulation [17] using a moving 28-reporting period baseline (t), a two-
study that overlayed authentic baseline data with synthesized outbreak reporting period guard band and a smoothing constant (𝑎) of be-
scenarios. We then compared the performance of 6 aberration detection tween 0 and 1. The test statistic is calculated as St = axt + (1-
algorithms. Simulation-based analysis methods offer the advantage of a)St-1 . A signal is generated if St > h, where h is pre-defined false
allowing the performance of multiple algorithms to be tested under a positive rate close to 5% (i.e., Prob[St > h | no outbreak] = 0.05).
range of scenarios and with greater control over outbreak scenario pa-
rameters [4]. Reference datasets
Methods We extracted national level aggregated data from the PSSS database
for the period January 08, 2011 to September 17, 2018. Data extracted
Setting were for 4 core syndrome categories – diarrhea (DIARR), influenza-like
illness (ILI), acute fever and rash (AFR), and prolonged fever (PF). Across
We conducted our analysis using syndromic surveillance data col- all data, the mean and 95% confidence interval (CI) weekly counts for
lected from 8 small population Pacific Island States and Territories ILI, DIARR, PF, and AFR syndromes were: ILI: 172.7 (CI: 165.0-180.4),
(PICTs). The median estimated 2023 population of these PICTs was DIARR: 73.7 (CI: 70.6-76.9), PF: 17.7 (CI: 16.2-19.1), and AFR: 7.0 (CI:
163,400 (interquartile range: 93,950-401,650) [12]). Supplementary 5.8-8.2), respectively.
file S1 provides population and syndromic surveillance system-related To address gaps in the reference data, we replaced reporting periods
information for each. where no data were recorded with the mean of the 2 preceding and 2
The PICTs cover one-third of the earth and are home to approxi- following periods. As we were not able to determine if, and if so when,
mately 13 million people of which 9.5 million reside in Papua New true outbreaks were reflected in the reference data (given the lack of
Guinea, with the remainder dispersed over thousands of islands and a ’gold standard’ record of outbreaks against which to compare), we
atolls that make up the other 21 PICTs [13]. Rates of infectious diseases applied a robust filtering method devised by Fried et al. (2014) to pro-
are high in the PICTs, as are the number of new and re-emerging disease duce what we term ’trimmed reference datasets’ [18]. Robust filtering
threats [14]. The ability to detect outbreaks is often hampered by poor removed peaks in data while retaining data authenticity.
health infrastructure, insufficient human resources, lack of advanced
diagnostic capacities, geographic isolation, and inadequate communi- Simulations
cation infrastructure. Recognizing the challenge of conducting early
warning outbreak surveillance in low-resource small population island We used R (The R Project for Statistical Computing v3.5.1,
settings and heeding a call to strengthen outbreak early warning sys- https://www.r-project.org/) and the R packages surveillance [19], im-
tems as a requirement of the International Health Regulations (2005), puteTS [20], robfilter [18], purrr [21], dplyr [22], qcc [23], and forcats
in 2010 PICTs designed and implemented a region-wide early warning [24] to create 16 unique scenarios representing ways that outbreaks
syndromic surveillance system known as the Pacific Syndromic Surveil- in the community may alter patient presentation counts to health fa-
lance System (PSSS) [10]. The PSSS collects data on a weekly basis for cilities. Two parameters were varied to generate the scenarios: (i) the
a limited number of syndromes from sentinel sites located across the total number of additional case presentations (magnitude) and (ii) the
PICTs. Data are collected, reported, and analyzed weekly by national temporal distribution of additional case presentations. Four arbitrary
surveillance officers using a simple threshold-based approach for signal magnitudes (× 1.5, × 2.0, × 2.5, and × 3.0) of increase in case presen-
generation. tations above baselines and 4 signal duration/distribution pairs were
combined to produce outbreak epidemic curves resembling acute point
source, acute point source with tail (as is common for gastrointestinal
Algorithms tested
disease outbreaks), person-to-person outbreak (as is common for respi-
ratory disease outbreaks), sustained disease transmission patterns (as is
The algorithms tested include the approach used by the PSSS and 5
common for endemic diseases transmission events) (Supplementary file
commonly applied options cited in the literature. These were as follows:
S2). Syndrome-specific baselines for each PICT were estimated by cal-
(1) The simple threshold approach used by the PSSS. The algorithm culating the mean of the 52 reporting periods preceding the start of the
compares a weekly case count value with the 90th percentile simulated outbreak. Supplementary file S3 describes what each of the
of historical weekly values and generates a signal if is exceeded abovementioned R packages contributed to the model.
[15]. We call this method the ’90th percentile approach.’
(2) A modified Poisson CUmulative SUMmation (CUSUM) method Implementing the model
proposed by Rogerson and Yamada [16] and designed for syn-
dromic surveillance systems with small counts. The test statistic We systematically superimposed each scenario onto each trimmed
is calculated as St = max(0,St–1 + z – k) where k is a reference reference dataset at 500 randomly selected start points. This pro-
value, t is the surveillance week, and z is the standardized score duced 256,000 unique test datasets (4 syndrome categories × 8 coun-
based on the variance of the observed values. A signal is gener- tries × 16 scenarios × 500 runs) for analysis. We then applied the panel
ated if St > h, where h is a threshold parameter. We set h and k to of aberration detection algorithms and calculated performance metrics.
the default values of 1 and 11, respectively.
Performance metrics
(3–5) Three variations of the Early Aberration Reporting Systems
(EARS) – C1, C2, and C3 developed by Hutwagner et al [16] The
To measure the performance of each algorithm, we computed the
EARS-C1 algorithm signals if the weekly observed value exceeds
following metrics:
the mean of the previous 7 weekly reporting periods plus 3 SDs.
The EARS-C2 algorithm differs from C1 in that it uses a temporal • Probability of detection (POD): defined as the proportion of out-
guard band of 2 reporting periods between the baseline and the breaks detected at any point during the outbreak’s duration. This
reporting period being evaluated. The EARS-C3 algorithm also metric provides a measure of overall sensitivity.
158
Figure 1. Comparative performance of 6 aber-

ration detection algorithms. CUSUM = cu-
mulative summation; EARS = early alert
and response system; EWMA = exponentially
weighted moving average; POD = probability
of detection; POD1 = POD in the first report-
ing period; PPV = positive predictive value;
T-measure = proportion of all detected multi-
reporting period outbreaks detected in the first
reporting period. Results are available in tabu-
lar form in Supplementary file S4.
• POD in the first week (POD1 ): defined as the proportion of outbreaks Background case count
detected during the first reporting period (regardless of an outbreak’s
duration). In general, we found that the probability of detecting an outbreak
• Positive predictive value (PPV): defined as the probability that a was higher for the more common syndromes (Figure 2). For example,
surveillance signal truly indicates the presence of a simulated out- the median POD across all algorithms tested for DIARR, which had a
break. mean weekly baseline count of 73.7 (CI: 70.6-76.9) cases per PICT, was
• False-positive rate (FPR): defined as the proportion of time an algo- 47.2% (CI: 46.7-47.6%), compared to only 11.7% (CI: 11.4-11.9) for PF,
rithm generated a signal in the absence of a simulated outbreak. One which had a mean weekly baseline count of 17.7 (CI: 16.2-19.1).
minus FPR provides a measure of the algorithm’s specificity.
• Timeliness (T-measure): defined as the proportion of outbreaks last- Size of outbreak
ing more than one reporting period that were detected in the first
reporting period. We found that the probability of detecting an outbreak increased as
the size of the outbreak increased. For example, the POD for EARS-C1
To allow comparability of algorithms, parameter thresholds for al- ranged from as high as 46.7% (CI: 46.3-47.1%) for ’very large’ outbreaks
gorithms 2-6 were calibrated to generate FPRs that were not signifi- (i.e., those with 3 times the number of case presentations as baseline) to
cantly different from the FPR of the 90th percentile approach, which as low as 28.4% (CI: 28.0-28.8%) when the outbreak was ‘small’ (i.e., an
was around 5.4% (range: 5.1-5.8%). To derive 95% CIs, we used the Wil- additional half the number of case presentations as the baseline). While
son score method [25] for POD, PPV, and POD1 , and the delta method there were large differences in performance between the methods tested,
for T-measure [26]. overall detection was mediocre (Figure 2).
Further, to assess the performance of the 90th percentile approach,
we computed 2 versions of extended receiver operating characteristic Timeliness
(ROC) curves that account for timeliness in addition to analyzing sen-
sitivity and FPR, as proposed by Kleinman and Abrams (2006) [27]. As the timeliness of outbreak detection is a key measure of perfor-
The first version was a weighted ROC (WROC) that adapts a conven- mance, we calculated the proportion of outbreaks detected in the first re-
tional two-dimensional ROC to include T-measure. The summary area porting period (the POD1 ), and the proportion of all detected outbreaks
under the WROC curve (AUC) was calculated using a linear trapezoidal that lasted for more than one reporting period that were detected in the
method from the DescTools package in R [28]. The second version was first reporting period (T-measure).
a three-dimensional ROC curve, having the T-measure as an added di- By the POD1 measure, we found that EARS-C1 performed best with
mension. The summary volume under the three-dimensional ROC curve an overall detection rate of 38.0% (CI: 37.8-38.2) and the 90th per-
(VUC) was calculated using the Delaunay triangulation from the geom- centile approach was second best with an overall detection rate 24.1%
etry package in R [29]. (CI: 23.9-24.2). We found that the POD1 varied greatly across syndrome
categories; for instance, using EARS-C1, POD1 was 65.3% (CI: 64.9-
Results 65.7) for AFR but only 21.1% (CI: 20.8-21.4) for PF (Figure 3).
By the T-measure, we found CUSUM performed best with 87.8% (CI:
General 85.8-89.8) of multi-reporting period outbreaks detected by the algo-
rithm in the first reporting period. The next best-performing algorithms
Across all PICTs, syndromes, and scenarios, EARS-C1 was the most by this measure were the EARS-C1 at 75.1% (CI: 74.4-75.8) and the 90th
sensitive with a POD of 46.5% (CI: 46.3-46.7), a POD1 of 38.0% (CI: percentile approach at 73.1% (CI: 72.0-74.1%). The performance of the
37.8-38.2), and a PPV of 99.2% (CI: 99.2-99.3). The 90th percentile other EARS group of algorithms was notably poorer (Figure 4).
approach and EWMA were the next most sensitive with POD, POD1,
and PPV of 24.4%, 19.5%, and 98.6% for EWMA and 29.7%, 24.1%, Receiver operating characteristic curves analysis on the 90th percentile
and 98.9% for the 90th percentile approach, respectively. The CUSUM, approach
EARS-C2 and EARS-C3 all had poorer sensitivity measures (Figure 1).
All algorithms had high PPV results, indicating that surveillance signals For the weighted ROC, AUC was computed to be 0.7687 and for the
represent ’true’ outbreaks (Figure 1). three-dimensional ROC, VUC was computed to be 0.8034 (Figure 5).
159
Figure 2. Mean proportion of outbreaks de-

tected by 6 algorithms in the first reporting pe-
riod, by the size of the outbreak. CUSUM = cu-
mulative summation; EARS = early alert
and response system; EWMA = exponentially
weighted moving average; ∗ ’ × ’ indicates the
factor by which the estimated baseline was
multiplied to generate the outbreaks at differ-
ent magnitudes.
Figure 3. Proportion of outbreaks detected

within the first reporting period. AFR = acute
fever and rash; CUSUM = cumulative summa-
tion; EARS = early alert and response system;
EWMA = exponentially weighted moving aver-
age; ILI = influenza-like illness; PF = prolonged
fever.
Figure 4. Proportion of multi-reporting

period outbreaks detected within the first
reporting period. AFR = acute fever and
rash; CUSUM = cumulative summation;
EARS = early alert and response system;
EWMA = exponentially weighted moving
average; ILI = influenza-like illness; PF = pro-
longed fever; POD1 = probability of detection
in the first reporting period.
160
Figure 5. Extended ROC curves applied to the 90th percentile approach. (Left) weighted ROC curve; (right) three-dimensional ROC curve. ROC = receiver operating
characteristic.
Both versions led to extended ROC curves with relatively high prediction (i.e., 11.1% for CUSUM to 46.5% for EARS-C1), when the FPR was kept
performance, albeit against a backdrop of very high false positive rates. relatively constant (between 5.7-6.1%). This is of particular relevance
in resource-limited settings where excessive false positives have the po-
Discussion tential to overwhelm limited public health capacities, draw human and
financial resources away from other surveillance activities and, in doing
We present a comparative assessment of the performance of 6 out- so, undermine the acceptability of surveillance-based intelligence. Fur-
break detection algorithms using simulation scenarios based on param- ther, we found that algorithms that performed best –90th percentile ap-
eters derived from national-level data collected from 8 PICTs as part of proach, EARS-C1 and EARS-C2 – could only detect very large and acute
the PSSS. A simulation-based evaluation method was chosen as there is outbreaks of syndromes with small baselines (notably AFR and PF) with
no reference standard of ’true’ outbreaks that could be used for com- any degree of reliability. Outbreaks with these dynamics are likely to be
parison, nor was it feasible to enhance existing surveillance activities obvious and detected and reported by clinical staff or the public through
and prospectively collect data that might fulfill the requirements of a event-based reporting mechanisms. Algorithm-based outbreak detection
reference standard. adds the most value if able to detect smaller and less obvious outbreaks
Our analysis comparing algorithms’ performance found that no sin- (i.e., those that result in low-to-moderate attack rates or manifest with
gle approach outperformed others across all dimensions tested (i.e., patients presenting over extended periods).
sensitivity, specificity, and timeliness), suggesting that decision-makers Second, we observed a direct correlation between the size of a syn-
need to consider trade-offs between these parameters when deciding drome’s baseline and algorithm performance. Except for EARS-C’s detec-
which surveillance signal generation algorithm (or algorithms) to use. tion of simulated AFP outbreaks, detection performance was markedly
Decision-makers must consider both performance needs and contextual poorer when the baseline was small, likely because of high variability
realities, including the health sector’s tolerance for false alarms and in the time series due to few case counts. This finding raised questions
costs associated with investigating excessive numbers of false signals. about the appropriateness of using algorithmic approaches when the
As an example, exploration of the 90th percentile approach’s perfor- background data they rely on are inadequate to provide a reliable in-
mance using the extended ROC curve (Figure 5) suggests that the 90th dication of the ’normal’ pattern of disease for a given time and place.
percentile approach can reliably provide time-sensitive outbreak pre- This finding may inspire further research into developing early warning
dictions if a high FPR (of around 50%, or one-in-two signals generated surveillance approaches specifically designed for use when count data
are false alarm) is tolerable. If a lower FPR is required (say 5% or one- is small.
in-twenty signals generated being false), time-sensitive outbreak predic- Third, and consistent with existing literature, across all algorithms
tions fall to levels that have limited outbreak intelligence-related public tested, we found the ability to detect outbreaks was directly propor-
health value. Further, our analysis was executed using aggregated national to the magnitude of an outbreak and how temporally clustered
tional data. As analysis moves toward the sub-national/sentinel site lev- cases were, suggesting that syndromic surveillance is likely most ap-
els, with the primary aim to identifying outbreaks locally and quickly, propriately used to detect large and explosive events, and perhaps not
the volume (and stability) of data will dwindle resulting in a higher suitable for the early detection of outbreaks with low reproduction rates.
probability of encountering false positives. This jeopardizes the integrity Again, our analysis found that EARS-C1 outperformed other algorithms
and credibility of surveillance efforts, as frequent false alarms will both by this measure.
waste precious resources and erode trust in the system, risking alerts Finally, POD and FPR represent the detection performance of algo-
being inconsequential in driving public health actions. These findings rithms from 2 perspectives, which may confuse the interpretation of re-
reinforce the need for future work to explore if, and if so how, routinely sults and challenge comparative analysis. To overcome this, we fixed the
collected data in small population settings may best be used for reliable FPR and computed the resulting PODs. We acknowledge that the balance
outbreak signal generation. between sensitivity and specificity needs to consider factors such as re-
Beyond these general observations, we make 4 specific comments. source availability and the implications of too high a false alarm rate. To
First, while overall the EARS-C1 performed the best of the algorithms address these, future analysis may consider including a modifying factor
tested, no algorithm was found to perform particularly well as a reliable to adjust the relative emphasis placed on sensitivity or specificity com-
and timely means to detect all types of outbreaks. Interestingly, overall, ponents to better meet local conditions and system tolerances. Further-
there was a four-fold difference in sensitivity across the algorithm tested more, given the importance of prompt outbreak detection, researchers
161
may consider how a timeliness dimension can be incorporated into a poorly in the presence of small numbers of background cases and hence,
modified measure. We have demonstrated how this may be done using in these settings, should not be relied upon but rather seen as supplemen-
2 modified ROC curves (Figure 5); however, computational efficiency tary to more robust surveillance approaches, including hospital-based,
must be further optimized. point-of-care-test-based and event-based surveillance strategies. Efforts
The following should be considered to improve algorithm-based out- to increase the volume and stability of the data on which aberration
break detection performance. The health information systems, the pro- detection relies will likely improve performance.
cess of capturing and transferring data, and the quality of data exhibit
significant variation across the PICT, often characterized by fragmen- Declarations of competing interest
tation [30]. Top priority must be given to improving the quality (i.e.,
the completeness, stability, and timeliness) of data that is collected and The authors have no competing interests to declare.
used in analysis. This will require exploring what motivates staff re-
sponsible for data collection and entry (typically nurses) to perform Funding
surveillance-related tasks with rigor. Data quality may be improved
through surveillance nurse training, mentoring, and supervision; follow- Adam Craig is supported by the University of Queensland Health Re-
up of non-reporting sites; implementation of data quality monitoring; search Accelerator Initiative and David Muscatello is supported by an
and recognition and rewards to encourage good data collection and re- Australian National Health and Medical Research Council (NHMRC) In-
porting practices. Importantly, syndromic case definitions need to be vestigator Grant (APP1194109). The contents of the published material
clinically meaningful and distinct. In addition, because some datasets are solely the responsibility of the Administering Institution, a Partici-
will be too small for meaningful analysis in a small-population context, pating Institution or individual authors and do not reflect the views of
policymakers should consider whether case definitions used are too re- the NHMRC.
strictive to be useful (from a data analysis point of view) and hence re-
quire modification. To boost the volume of data for statistical analysis, Ethical approval
and being pragmatic, aggregation of data collected by multiple sentinel
sites in close proximity to each other (i.e., within a mile or 2) before Given the nature of the research conducted and data used, ethical
analysis may be considered as a strategy to increase the power of anal- approval was not required.
ysis that is conducted.
As mentioned earlier, our analysis revealed that no single algorithm Author contributions
showed significant superiority over others tested; therefore, the deci-
sion regarding which algorithm to use is not straightforward. It is es- ATC, RNFL, and MWD were involved in the manuscript\220s con-
sential for decision-makers to acknowledge the inherent limitations of ceptualisation, analysis, interpretation of data, writing, revision, and re-
syndrome-based early warning surveillance, especially when applied to view. ATC, RNFL, MWD, VJCM, and CJMO were involved in analysis,
small populations. Decision-makers should be counseled not to rely en- interpretation of data, writing, revision, and review. DM was invovled
tirely on a syndromic strategy to meet all national health protection in the interpretation of data, revision, and review.
needs. Instead, syndromic surveillance should be viewed as a comple-
mentary strategy that supports other more robust outbreak detection Supplementary materials
methods, such as laboratory-based and formalized event-based surveil-
lance mechanisms. We re-emphasize the need for conducting further Supplementary material associated with this article can be found, in
analysis using localized data to better understand how algorithms per- the online version, at doi:10.1016/j.ijregi.2023.08.007.
form when applied at the community level.
The results presented in this paper should be interpreted with cau- References
tion. Most notably, we assume that robust filtering produced outbreak-
[1] Buehler JW, Hopkins RS, Overhage JM, Sosin DM, Tong V, Working Group CDC.
free reference baselines; some outbreaks may have been missed. Sec-
Framework for evaluating public health surveillance systems for early detection of
ond, we analyzed national-level data, which may have led to missing outbreaks: recommendations from the CDC Working Group. MMWR Recomm Rep
smaller outbreaks occurring at individual sentinel sites. Third, simu- 2004;53:1–11.
lated outbreaks are theoretical and may not represent how outbreaks [2] Wagner M, Tsui F, Cooper G, Espino JU, Harkema H, Levander J, et al. Probabilis-
tic, decision-theoretic disease surveillance and control. Online J Public Health Inform
evolve. Fourth, we tested a limited number of algorithms and set the 2011;3 ojphi.v3i.3798. doi:10.5210/ojphi.v3i3.3798.
modifying variables to the default levels recommended by their authors. [3] May L, Chretien JP, Pavlin JA. Beyond traditional surveillance: applying syndromic
Despite these limitations, we could retain contextual reality within the surveillance to developing settings–opportunities and challenges. BMC Public Health
2009;9:242. doi:10.1186/1471-2458-9-242.
model by drawing on authentic baseline data. Our analysis has gener- [4] Bédubourg G, Le Strat Y. Evaluation and comparison of statistical methods
ated novel policy-relevant insights into the performance of syndromic for early temporal detection of outbreaks: a simulation-based study. PLoS One
surveillance when applied to nationally aggregated data in small popu- 2017;12:e0181227. doi:10.1371/journal.pone.0181227.
[5] Shmuei G, Fienberg SE. Current and potential statistical methods for monitoring
lation contexts and may help shape early warning surveillance system multiple data streams for biosurveillance. In: Wilson AG, Wilson GD, Olwell DH,
design within these settings. We emphasize that outbreaks need to be editors. Statistical methods for counterterrorism. New York: Springer; 2006. p. 109–40.
detected locally and hence suggest the application of this tool using sen- [6] Noufaily A, Morbey RA, Colón-González FJ, Elliot AJ, Smith GE, Lake IR, et al.
Comparison of statistical algorithms for daily syndromic surveillance aberration de-
tinel site data is appropriate. The modelling tool offers potential utility
tection. Bioinformatics 2019;35:3110–18. doi:10.1093/bioinformatics/bty997.
for the prospective measurement (and monitoring) of surveillance sys- [7] Yuan M, Boston-Fisher N, Luo Y, Verma A, Buckeridge DL. A systematic review of
tems’ performance at sentinel site and aggregated district or national aberration detection algorithms used in public health surveillance. J Biomed Inform
2019;94:103181. doi:10.1016/j.jbi.2019.103181.
levels. The methods presented in this paper are broadly transferable and
[8] Alsentzer E, Ballard SB, Neyra J, Vera DM, Osorio VB, Quispe J, et al. As-
may be used to measure (and monitor) surveillance system performance sessing 3 outbreak detection algorithms in an electronic syndromic surveil-
in other settings. lance system in a resource-limited setting. Emerg Infect Dis 2020;26:2196–200.
doi:10.3201/eid2609.191315.
[9] Yeng PK, Woldaregay AZ, Solvoll T, Hartvigsen G. Cluster detection mechanisms
Conclusion for syndromic surveillance systems: systematic review and framework development.
JMIR Public Health Surveill 2020;6:e11512. doi:10.2196/11512.
While having notable limitations in detecting outbreaks, the EARS- [10] Craig AT, Kama M, Samo M, Vaai S, Matanaicake J, Joshua C, et al. Early
warning epidemic surveillance in the Pacific island nations: an evaluation of the
C1 approach outperformed all other algorithms tested. Our findings sug- Pacific syndromic Surveillance System. Trop Med Int Health 2016;21:917–27.
gest that algorithm-based outbreak signal detection methods perform doi:10.1111/tmi.12711.
162
[11] Vilain P, Maillard O, Raslan-Loubatie J, Ahmed Abdou M, Lernout T, Filleul L. Use- [21] Wickham H, Henry L. purrr: functional Programming Tools. R Foundation for Statis-
fulness of syndromic surveillance for early outbreak detection in Small Islands: the tical Computing, https://purrr.tidyverse.org/; 2019 [acccessed 13 February 2023].
case of Mayotte. Online J Public Health Inform 2013;5. doi:10.5210/ojphi.v5i1.4503. [22] Wickham H, Francois R, Henry L, Muller K, Vaughan D. dplyr: A Grammar of Data.
[12] Pacific Community. Pacific datahub: population projections, https://sdd.spc.int/ R Foundation for Statistical Computing, https://dplyr.tidyverse.org; 2023 [accessed
dataset/df_pop_proj; 2023 [accessed 27 July 2023]. 13 February 2023].
[13] Craig AT, Kool J, Nilles EJ. The Pacific experience: supporting small island coun- [23] Scrucca L, Snow G, Bloomfield P. qcc: an R package for quality control charting and
tries and territories to meet their 2012 International Health Regulations (2005) statistical process control. R News 2004;4:11–17.
commitments. Western Pac Surveill Response J 2013;4:14–18. doi:10.5365/WP- [24] Wickham H. forcats: Tools for Working with Categorical Variables (Factors). R Foun-
SAR.2012.3.4.007. dation for Statistical Computing, https://forcats.tidyverse.org/ https://github.com/
[14] Roth A, Mercier A, Lepers C, Hoy D, Duituturaga S, Benyon E, et al. Concurrent tidyverse/forcats; 2022 [acccessed 21 March 2023].
outbreaks of dengue, chikungunya and Zika virus infections - an unprecedented [25] Fagerland MW, Lydersen S, Laake P. Recommended confidence intervals for
epidemic wave of mosquito-borne viruses in the Pacific 2012–2014. Euro Surveill two independent binomial proportions. Stat Methods Med Res 2015;24:224–54.
2014;19:20929. doi:10.2807/1560-7917.es2014.19.41.20929. doi:10.1177/0962280211415469.
[15] World Health Organization, Pacific Community A practical guide to implementing syn- [26] Franz VH. Ratios: A short guide to confidence limits and proper use. Arxiv 10 Oc-
dromic surveillance in Pacific island countries and territories. Geneva: World Health tober 2007 [accessed 21 March 2023].
Organization; 2010. [27] Kleinman KP, Abrams AM. Assessing surveillance using sensitivity,
[16] Rogerson PA, Yamada I. Approaches to syndromic surveillance when data consist of specificity and timeliness. Stat Methods Med Res 2006;15:445–64.
small regional counts. MMWR 2004;53:79–85 Suppl. doi:10.1177/0962280206071641.
[17] Salmon M, Schumacher D, Höhle M. Monitoring Count Time Series in R: [28] Signorell A, Aho K, Alfons A, Anderegg N, Aragon T, et al. DescTools: Tools for de-
Aberration Detection in Public Health Surveillance. J Stat Soft 2016;70:1–35. scriptive statistics, https://cran.r-project.org/web/packages/DescTools/index.html;
doi:10.18637/jss.v070.i10. 2023 [acccessed 21 March 2023].
[18] Fried R. Robust filtering of time series with trends. J Nonparametric Stat [29] Roussel R-L, Barber CB, Habel K, Grasman R, Gramacy R, Mozharovskyi P,
2004;16:313–28. doi:10.1080/10485250410001656444. et al. Geometry: mesh generation and surface tessellation, https://davidcsterratt.
[19] Meyer S, Held L, Höhle M. Spatio-Temporal Analysis of Epidemic Phenomena Using github.io/geometry/; 2023 [acccessed 21 March 2023].
the R package surveillance. J Stat Soft 2017;77. doi:10.18637/jss.v077.i11. [30] World Health Organization Manila, Phillipines. Pacific Health Information Network Re-
[20] Moritz S, Bartz-Beielstein T. ImputeTS: time series missing value imputation in R. R gional Meeting on Strengthening Health Information Systems and Digital Health. Geneva:
J 2017;9:207–18. doi:10.32614/RJ-2017-009. World Health Organization; 2023.
163

Craig 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Craig 2023

Uploaded by

Copyright:

Available Formats

IJID Regions 8 (2023) 157–163

Contents lists available at ScienceDirect

Comparison of statistical methods for the early detection of disease

Introduction of cases of an event of interest during a surveillance reporting or obser-

Figure 1. Comparative performance of 6 aber-

Figure 2. Mean proportion of outbreaks de-

Figure 3. Proportion of outbreaks detected

Figure 4. Proportion of multi-reporting

You might also like