Professional Documents
Culture Documents
Comparison - Two Biomech Measure Systems
Comparison - Two Biomech Measure Systems
PII: S0021-9290(23)00075-1
DOI: https://doi.org/10.1016/j.jbiomech.2023.111506
Reference: BM 111506
Please cite this article as: D. Koska, D. Oriwol and C. Maiwald, Comparison of statistical models
for characterizing continuous differences between two biomechanical measurement systems.
Journal of Biomechanics (2023), doi: https://doi.org/10.1016/j.jbiomech.2023.111506.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
of
Comparison of statistical models for characterizing
continuous differences between two biomechanical
pro
measurement systems
Daniel Koskaa,∗, Doris Oriwola , Christian Maiwalda
a
Chemnitz University of Technology, Thüringer Weg 11, 09126 Chemnitz, Germany
Abstract
re-
Most biomechanical processes are continuous in nature. Measurement
systems record this continuous behavior as curve data, which is often treated
inappropriately in validation studies. The current paper compares different
statistical models for analyzing the agreement of curves from two measure-
ment systems. All models were evaluated in various error scenarios (simu-
lP
lated and real-world data). Excellent results were obtained using a functional
method, with coverage probabilities close to the desired level in all data sets.
Pointwise constructed bands had a lower coverage probability, but still con-
tained most of the curve points and may thus be an option in scenarios where
assumptions of functional models are violated (e.g., when curves are much
noisier than those presented here, or in the presence of drift). Models that
account for within-subject variation showed a higher coverage probability
rna
and less uncertainty about the variation of band limits. We hope this study,
along with the provided research code, will inspire researchers to use meth-
ods for curve data more frequently and appropriately.
∗
Corresponding author: Daniel Koska, Email: daniel.koska@hsw.tu-chemnitz.de
Tel: +49 371 531-32024, Fax: +49 371 531-832024
1
Journal Pre-proof
of
1 1. Introduction
pro
2 The validity of biomechanical measurement systems is typically evaluated
3 based on simultaneous recordings of two measurement systems (new vs. gold
4 standard). In most studies, the resulting differences between these systems
5 are examined using statistical methods that were originally developed for
6 discrete data. For instance, the Limits of Agreement approach (LoA) by
7
9
re-
Altman and Bland (1983) was developed in a medical context for scalar
variables such as blood pressure. The LoA model describes a symmetric
uncertainty interval around the mean difference in which 95% of the normally
10 distributed differences are expected to lie. LoAs are simple and intuitive by
design, and thus are presumably the most widely used statistical method in
lP
11
15
16 the process value along a discrete set of time points. The term ”discrete” is
17 used throughout the manuscript to refer to non-functional univariate quan-
18 tities rather than to discrete values, such as counts, or categories.
19 Reducing curves to single points or discrete variables, such as local ex-
20 trema (Blair et al., 2018), rates of change (Kluitenberg et al., 2012), or ranges
Jou
2
Journal Pre-proof
of
23 2. Time-dependent variations of the measurement error are not captured.
pro
24 3. The validity of statements derived from discrete variables runs the risk
25 of examining points that have little relevance for the system under
26 investigation (Donoghue et al., 2008; Pataky et al., 2008; Richter et al.,
27 2014; Park et al., 2017).
30
31
re-
is to apply discrete statistics to several or all curve points (Schwartz et al.,
2004; Pini et al., 2019). This discrete or ”pointwise” approach is character-
ized by the fact that each point is treated independently of the others. For
32 curve data, the coverage probability is typically defined as the probability
lP
33 that all points of the curve are contained in the band simultaneously. Point-
34 wise bands, however, will almost surely be too narrow (Duhamel et al., 2004;
35 Cutti et al., 2014; Degras, 2017), resulting in lower than desired coverage
36 probabilities. This is because they ignore the local correlation structure of
rna
37 continuous curves, which implies that the actual number of independent pro-
38 cesses is less than the number of sampling points (Pataky, 2010). Therefore,
39 pointwise bands often lead to a multiple comparisons problem.
40 A variety of statistical methods are available to address the problem of
41 multiple comparisons in curve data, with a majority rooted in either random
Jou
42 field theory (Adler and Taylor, 2007; Friston et al., 2007) or functional data
43 analysis frameworks (Ramsay and Silverman, 2005). Lenhoff et al. (1999)
44 investigated a method of approximating continuous curves from a discretely
3
Journal Pre-proof
of
45 observed set of points using Fourier series and bootstrapping to create si-
multaneous prediction bands (Sutherland et al., 1988; Olshen et al., 1989).
pro
46
47 Using a set of joint angle curves, they showed that their functional prediction
48 bands achieved better coverage probability (86% at 90% nominal coverage
49 level) than pointwise constructed Gaussian bands (54% coverage probability).
50 Prediction bands are defined to estimate a range in which a future observa-
51 tion (i.e., a curve) will fall with a certain probability. They are therefore well
52
53
re-
suited for method comparisons.
Røislien et al. (2012) developed an approach to extend the LoA for curve
54 data. Similar to Lenhoff et al. (1999), they use Fourier series to create
55 functional bands. Their method, however, must be regarded as pointwise,
lP
56 since the actual calculation of the band limits is performed separately and
57 independently for each point of the curve (which somewhat contradicts the
58 original intention).
59 Røislien et al. (2012) also highlight another problem: Many methods,
rna
4
Journal Pre-proof
of
68 number of subjects is tested, which is a common phenomenon in validation
studies, e.g., N =6 in Morrow et al. (2017), N =7 in Røislien et al. (2012),
pro
69
76
re-
age probability of pointwise vs. functional prediction bands?, (ii) What is
the amount of uncertainty regarding the variation of band limits?, and (iii)
77 How does including information about within-subject variation (via multiple
78 curves per subject) affect the parameters in (i) and (ii)? We compare the
lP
79 following models:
80 • Pointwise LoA including multiple curves per subject (Bland and Alt-
81 man, 1999, 2007)
rna
86 fore all models are analyzed using simulated and real joint angle curves in
87 different measurement error scenarios.
5
Journal Pre-proof
of
88 2. Methods
pro
89 2.1. Data sets
90 Four data sets containing joint angle curves from two measurement sys-
91 tems, a gold standard (GOLD) and a new system (NEW), were used to
92 evaluate the models in section 2.2: Three simulated (GAUSS, NONGAUSS,
93 XSHIFT) and one with real movement data (REAL). The data sets represent
94
95
96
re-
a broad range of error characteristics typically encountered when comparing
biomechanical measurement systems. Fig. 1 shows the original curves of
two measurement systems in each of the four data sets. Fig. 3 contains
97 the associated difference curves. The following list presents an overview of
the data sets. A more detailed description of the simulated models and an
lP
98
100 1. GAUSS (Fig. 1A): The shape of the curves in both measurement
101 systems is similar. The observed differences are the result of small,
rna
103 • This represents the ideal case in which the output of both mea-
104 surement systems hardly differs.
105 2. NONGAUSS (Fig. 1B): In this scenario, the curves in NEW have a
Jou
6
Journal Pre-proof
of
109 • This scenario may occur as a result of sensor-to-segment align-
ment artifacts, e.g., when determining joint angles using inertial
pro
110
112 3. XSHIFT (Fig. 1C): The amplitude of the curves in NEW differs
113 less from the reference system than in NONGAUSS, but a shift in x-
114 axis direction (i.e., a temporal shift) is introduced. The differences
115
116
117
NONGAUSS.
re-
are non-Gaussian as well, but display a different distribution than in
120 4. REAL (Fig. 1D): This data set contains real-world sagittal plane
121 hip joint angle curves from healthy subjects walking on a treadmill
without gradient. Data were simultaneously recorded using an optical
rna
122
123 motion capture system (GOLD) and an inertial measurement unit. The
124 observed differences display more complex behavior than the simulated
125 curves (Fig. 3), which manifests in, e.g., larger differences in the within-
126 subject variation of different subjects.
Jou
128 All data sets contain N = 220 curves from two measurement systems.
129 This corresponds to Dij (t) = 110 pointwise difference curves (NEW - GOLD),
7
Journal Pre-proof
of
130 where i = 1, ..., 11 is the number of subjects, j = 1, ..., 10 is the number of
curves per subject, and t = 1, ..., 101 is the number of curve points. All
pro
131
132 curves are of equal length. The number of curves was deliberately kept small
133 to reflect the typical sample size of many validation studies. Systematic
134 offsets between the two measurement systems were not modeled, since this
135 paper focuses on random error components.
136
137
138
2.2. Prediction band models
re-
Prediction bands were constructed from difference curves between two
measurement systems for each of the four data sets. The bands were created
139 using three different methods - two pointwise methods and one functional
lP
140 method. The significance level α was set to 0.05 in the following description.
This model was presented as an extension of the basic LoA model, using
rna
143
150 The upper (u(t)) and lower (l(t)) limits for 1 − α = 0.95 bands are
8
Journal Pre-proof
of
151 calculated as:
pro
[l(t), u(t)] = d(t) ± 1.96 ∗ SD(D(t)), (1)
152 where d(t) and SD(D(t)) denote the time-dependent mean and stan-
153 dard deviation of the difference curves. The model assumes iid curves,
154 therefore, only one (random) curve per subject is drawn each time the
155
156
model is fit. re-
2. POINT: Pointwise LoA including multiple curves per subject (Bland
157 and Altman, 1999)
159 assumed that differences can be represented as the sum of the mean
160 difference and two variance components:
q
2 2
[l(t), u(t)] = d(t) ± σdb (t) + σdw (t), (2)
rna
2 2
161 where σdb (t) is the between-subject and σdw (t) the within-subject vari-
162 ance at each independent point t of the difference curves. Both vari-
163 ance components can be estimated from a one-way analysis of variance
164 (Bland and Altman, 2007).
Jou
9
Journal Pre-proof
of
165 2.2.2. Functional (simultaneous) bands
pro
166
172
173
re-
tions K was set to 50 to avoid unintended smoothing effects. The
method originally proposes the use of Fourier series, but is not re-
174 stricted to them. Other smoothing methods, such as B-splines, can
175 also be used to extend the method’s applicability to a wider range
lP
176 of signals, including non-periodic curves and curves with underlying
177 trends. The method is fully functional, i.e., band limits are calculated
178 simultaneously using all curve points, rather than independently for
179 each point:
rna
180 where fˆ(t) represents the mean Fourier curve and σ̂fˆ(t) the standard
181 deviation of the Fourier curves. The constant C is determined by re-
182 peatedly bootstrapping the original sample and calculating the maxi-
Jou
183 mum normalized deviation of the original curves (fˆi (t)) from the bth
184 bootstrap mean (fˆb (t)). The deviation is normalized to the standard
185 deviation of the bth sample σ̂fbˆb (t) . C is chosen to make Eq. 4 approxi-
10
Journal Pre-proof
of
186 mately equal to the desired coverage probability 1-α:
pro
B j
1 X 1X |fˆi (t) − fˆb (t)|
[ I(max{ } ≤ C)] ≈ 1 − α, (4)
B b=1 j i=1 t σ̂fbˆb (t)
190
191
re-
The applied (naı̈ve) bootstrap method assumes iid curves, therefore
- similar to RØISLIEN - only a single random curve per subject is
192 drawn each time the model is fit (BOOTiid). When addressing re-
193 search question ’iii’, we implemented a second, modified version of the
lP
194 BOOT method, in which multiple curves per subject are accounted for
195 (BOOTrep). Therefore, BOOTrep includes the two-stage bootstrap
196 process described in Davison and Hinkley (1997), in which subjects
197 (including all of their curves) are sampled with replacement in the first
rna
198 stage, and one curve per subject is drawn without replacement in the
199 second stage. This way, both within and between subjects variation
200 are accounted for.
202 The coverage probability in all methods was evaluated using leave-one-
203 out cross validation (LOOCV). For this, the data set is split into training
204 and test data, where the test data set consists of exactly one difference curve
11
Journal Pre-proof
of
205 d(t). The remaining D − 1 training curves are used to calculate upper (u(t))
and lower (l(t)) prediction band limits and determine how many points of
pro
206
207 d(t) are contained in the resulting band. This process is repeated D = 110
208 times, so that every curve is left out once. For each iteration, the percentage
209 of covered points (PCP) is calculated as:
T
1X
P CP (c) = I{l(t) ≤ dc (t) ≤ u(t)} ∗ 100, (5)
T t=1
210
211
re-
where t = 1, ..., T is the number of curve points, c = 1, ..., D is the curve
index, and I is an indicator function which returns 1 if the tth curve point
212 is within the limits.
lP
213 The actual coverage probability is determined as the mean percentage of
214 D LOOCV bands that contain at least x = 100(1 − α)% of the curve points:
D
1 X
Px% = I{P CP (c) ≥ x} (6)
D c=1
rna
215 with x = {100%, 95%, 90%, 50%} representing different percentages of curve
216 points contained in the band. For instance, P50% = 0.9 means that 90% of
217 the LOOCV prediction bands cover at least 50% of the test curve. P100%
218 refers to the coverage for entire curves and corresponds to the conventional
219 definition of coverage probability.
Jou
12
Journal Pre-proof
of
220 2.4. Uncertainty estimation
The presented models differ with regard to their approach for estimating
pro
221
222 the amount of curve variance and the underlying sampling strategies. They
223 are therefore likely to result in different degrees of uncertainty about the
224 width of the band limits, i.e., the degree to which the upper and lower band
225 limits are unknown. We analyzed two primary sources of uncertainty:
226
227
228
re-
1. The uncertainty resulting from repeated, random sampling (’Monte
Carlo variability’). This applies to all methods except POINT, where
there is no random sampling from the original sample.
229 2. The uncertainty about an inference (i.e., the uncertainty about the
lP
230 population of band limits based on a random sample from that popu-
231 lation).
T
X
Jou
13
Journal Pre-proof
of
232 where U and L are matrices containing k ∗300 repetitions = 3300 band limits
each. The overall uncertainty, denoted as Area of Uncertainty (AU, see Fig.
pro
233
236
237
re-
The lower the AU, the lower the variation of the band limits and therefore
the uncertainty regarding the band width. We expect models that account
238 for within-subject variation (POINT, BOOTrep) to yield lower AU values.
239 POINT is likely to have the lowest AU value, since the band limits remain
lP
240 unchanged when the method is calculated several times using the same sam-
241 ple. Thus, only the leave-one-subject-out variation is reflected in the results
242 of POINT, which limits the comparability with the AU values of the other
243 methods.
rna
244 All methods and data sets were implemented in R (v4.0.5) (R Core Team,
245 2021) using RStudio (RStudio Team, 2022). The code to reproduce the
246 analysis is provided at https://zenodo.org/badge/latestdoi/334994253.
247 3. Results
Jou
248 Fig. 3 displays difference curves and prediction bands in the four data
249 sets. The respective coverage probabilities are summarized in Table 1. Table
250 2 contains the uncertainty areas for the band limits.
14
Journal Pre-proof
of
251 [Figure 3 about here.]
pro
Table 1: Leave-one-out cross validated (LOOCV) prediction band coverage probabilities
of four models (POINT, RØISLIEN, BOOTrep, BOOTiid) across data sets. Coverage
probabilities (P ) were calculated for different percentages of curve points contained in the
band: 100%, 95%, 90%, 50%. For instance, P95% = 0.9 indicates that 90% of the LOOCV
prediction bands cover at least 95% of the points of the test curve. P100 % refers to the
coverage for entire curves.
15
Journal Pre-proof
of
Table 2: Area of Uncertainty (AU) for the distribution of band limits across methods and
data sets. AUs were calculated using repeated k-fold cross validation. Higher values repre-
sent more scattered band limits. The results in POINT differ from the remaining methods
pro
in that no variation of the band limits occurs when calculating the bands repeatedly with
the same sample. Therefore, only the variation across k cross validation folds is reflected
in the results of POINT.
252
253
re-
Regarding the coverage probability for entire curves (Table 1: P100% ), the
BOOTrep bands achieved nominal coverage across all data sets, while the
254 BOOTiid bands showed a slightly lower coverage probability in three out
lP
255 of four data sets. The prediction bands of the pointwise methods (POINT,
256 RØISLIEN) were noticeably narrower (Fig. 3) and achieved a lower coverage
257 probability. Both pointwise prediction bands were similar in quality, but the
258 coverage probability in RØISLIEN was lower for all data sets.
rna
259 The high coverage probabilities in the remaining coverage levels (Table 1:
260 P50%−95% show that all methods covered the vast majority of curve points.
261 For both BOOTiid and BOOTrep, 95% of the curve points were covered with
262 a probability close to 1. For POINT, the same probability was still ≈ 0.8.
263 Here, again, the RØISLIEN method had the lowest coverage probability in
Jou
16
Journal Pre-proof
of
267 yielded considerably higher AU values (Table 2). BOOTiid exhibited the
greatest uncertainty, with the bands tending to be very wide in some cases
pro
268
270 4. Discussion
274
275
re-
tems. We compared different models in various error scenarios with regard to
their coverage probability and the amount of uncertainty about the variation
of band limits.
280
281 the pointwise bands still contain the majority of curve points and represent
282 the course of the difference curves rather well. This is further confirmed by
283 a visual inspection of the band limits (Fig. 3), which are plausible even in
284 the presence of severe violations of parametric model assumptions. Although
285 not investigated in this paper due to our focus on random error components,
Jou
17
Journal Pre-proof
of
289 The coverage probabilities in our study were higher for models that in-
clude multiple curves per subject (POINT, BOOTrep) in all data sets. This
pro
290
291 corresponds to our expectations, since the additional source of variation leads
292 to wider bands and thus increases the likelihood of covering a future obser-
293 vation. However, unlike the pointwise methods, BOOTiid can be expected
294 to achieve nominal coverage when a larger number of curves is included.
295
296
297
re-
4.2. Uncertainty about band limits
The use of multiple curves per subject also has a positive effect on the
uncertainty about the variation of band limits. This is best demonstrated
298 using the two BOOT models. The interpretation of the uncertainty values
lP
299 in POINT is somewhat limited, since the prediction bands in POINT are
300 calculated from the entire original sample, while the other models each draw a
301 random subsample. Therefore, an important source of uncertainty is missing
302 in POINT.
rna
308
309 curves with more variation between curves. This, in turn, results in less
310 variation of the band width across different subsamples.
18
Journal Pre-proof
of
311 The difference between BOOTiid and BOOTrep largely depends on the
size of the within-subject variation relative to the overall variance of the
pro
312
313 curves in the data set. In GAUSS, this relative within-subject variation is
314 smaller than in XSHIFT and REAL. Accordingly, the ratio of the uncertainty
315 areas of BOOTiid and BOOTrep (AUiid/rep ) is larger in GAUSS (AUiid/rep =
316 3.6) than in XSHIFT (AUiid/rep = 1.4) and REAL (AUrelative = 2.1). There-
317 fore, and in general, we recommend including within-subject variation when
318
319
re-
investigating measurement errors. In addition to improving the coverage
performance and uncertainty regarding the variation of the band limits, it
320 is informative in its own right (see Discussion section 4.4). Of course, the
321 dependency of multiple curves of the same subject should be accounted for
lP
322 in the model to avoid falsely narrow bands (Montenij et al., 2016).
323 Generally, it seems advisable to err on the side of caution and draw larger
324 samples than typically encountered in validation studies to limit the uncer-
325 tainty about the random measurement error. Since the width of prediction
rna
326 bands, unlike that of confidence bands, does not converge to zero as n → ∞,
327 drawing unnecessarily large samples is less of a concern when calculating
328 prediction bands. This, of cause, does not include ethical aspects (Altman,
329 1980) and cost considerations. To determine whether the chosen sample is
330 large enough, one may study the convergence of band limits, either a priori
Jou
19
Journal Pre-proof
of
332 4.3. Pointwise vs. functional models
pro
333
334 aspects affect the results. Pataky et al. (2015) suggest that the model of
335 randomness (i.e., pointwise vs. functional analysis) is more important than
336 the distinction between parametric and non-parametric methods. This is re-
337 flected in our results as well: Parametric model assumptions were fulfilled in
338 GAUSS, but the difference between the coverage probabilities of the paramet-
339
340
re-
ric (POINT, RØISLIEN) and non-parametric (BOOT) models was slightly
larger in GAUSS than in the other data sets. We interpret this as further
341 evidence for the superiority of functional methods.
342 It is possible, however, to imagine scenarios in which the use of pointwise
lP
343 bands may be justified. This may be the case when the mean error function
344 varies over time, e.g., in the presence of drift. Drift occurs in measurement
345 systems such as force plates or gyroscopes and causes a violation of the
346 assumption of stationarity, a major assumption in many time series models
rna
347 including BOOT. Pointwise methods may further be useful when curves are
348 less smooth than in our data sets (e.g., EMG data). In such cases, it is
349 hardly possible to fit a mathematical function that adequately represents the
350 signal. A detailed analysis of non-smooth curves, however, was not part of
351 this paper and smoothness-related issues may further be addressed by prior
Jou
20
Journal Pre-proof
of
355 ping, methods such as the block bootstrap have been established, in which
curves are reduced to a few presumably independent regions (blocks) (Kun-
pro
356
357 sch, 1989). It could be assumed, e.g., that 10 nearby points form a block
358 and that the correlation structure of the entire curve can be adequately rep-
359 resented with these blocks. Another aspect that deserves attention is the
360 design of asymmetric bands to describe the contribution of the respective
361 measurement systems to the random error, e.g., using percentiles instead
362
363
re-
symmetric bands around a measure of central tendency.
364 This paper focuses on bootstrap methods for constructing functional pre-
lP
365 diction bands, but there are several other methods in the literature on func-
366 tional data analysis (Goldsmith et al., 2012; Degras, 2017), some of which
367 have desirable small sample properties (Telschow and Schwartzman, 2022),
368 or offer other advantages (Liebl and Reimherr, 2019). However, those meth-
rna
369 ods may be difficult to apply in scenarios such as the ones presented in this
370 paper, where bands are computed for differences and multiple curves per
371 subject are present.
372 All models in this paper include the between-subject variation. There are
373 situations, however, in which these bands are too wide, since only within-
subject effects, e.g. in intra-individual pre-post interventions, are of interest.
Jou
374
375 Bland and Altman (1999) therefore suggest to calculate prediction intervals
376 based solely on the within-subject standard deviation in such cases. These
21
Journal Pre-proof
of
377 are narrower and therefore reduce the risk of false-negative results. In cases
where measurement systems are validated without a clearly defined applica-
pro
378
379 tion area, we recommend using different statistical intervals for within and
380 between-subject designs.
381 5. Conclusion
382 The findings of this paper suggest that there are methods that allow for
383
384
re-
an adequate characterization of difference curves in various error scenarios.
The relevance of this work, however, goes far beyond the mere comparison of
385 measurement systems, but concerns any biomechanical study in which two
386 groups of curves are compared. If possible, a functional approach should
lP
387 be chosen to account for the problem of multiple comparisons in pointwise
388 models. Pointwise bands have a lower coverage probability, but still contain
389 most of the curve points. They may thus be an option in scenarios where
390 functional models are bound to fail, e.g., when curves are much more noisy
rna
391 than in our examples. Any model should account for within-subject variation,
392 i.e., multiple curves per subject should be included to increase the coverage
393 probability and reduce the uncertainty about the width of the band limits.
394 The construction of prediction bands is accompanied by an increased de-
395 gree of complexity in comparison with discrete statistical methods. This, in
Jou
396 turn, requires at least some programming experience and statistical back-
397 ground. We suspect that the low prevalence of adequate models for curve
398 data in validation studies is directly related to a lack of such experience and
22
Journal Pre-proof
of
399 background. To lower these obstacles, we provide the associated R code in
addition to the paper (https://zenodo.org/badge/latestdoi/334994253).
pro
400
401 Acknowledgements
402 The authors would like to thank Lisa Peterson for linguistic correction of
403 this paper as a native speaker. We would also like to thank the reviewers for
404 their helpful comments.
405
406
References
re-
Adler, R., Taylor, J., 2007. Random Fields and Geometry. Springer New
407 York. doi:10.1007/978-0-387-48116-6.
lP
408 Altman, D.G., 1980. Statistics and ethics in medical research: III how large
409 a sample? BMJ 281, 1336–1338. doi:10.1136/bmj.281.6251.1336.
410 Altman, D.G., Bland, J.M., 1983. Measurement in medicine: The analysis of
rna
412 Blair, S., Duthie, G., Robertson, S., Hopkins, W., Ball, K., 2018. Con-
413 current validation of an inertial measurement system to quantify kicking
414 biomechanics in four football codes. Journal of Biomechanics 73, 24–32.
415 doi:10.1016/j.jbiomech.2018.03.031.
Jou
416 Bland, J.M., Altman, D.G., 1999. Measuring agreement in method com-
417 parison studies. Statistical Methods in Medical Research 8, 135–160.
418 doi:10.1177/096228029900800204.
23
Journal Pre-proof
of
419 Bland, J.M., Altman, D.G., 2007. Agreement between methods of measure-
ment with multiple observations per individual. Journal of Biopharmaceu-
pro
420
422 Cutti, A., Parel, I., Raggi, M., Petracci, E., Pellegrini, A., Accardo,
423 A., Sacchetti, R., Porcellini, G., 2014. Prediction bands and in-
424 tervals for the scapulo-humeral coordination based on the bootstrap
425
426
427
re-
and two gaussian methods. Journal of Biomechanics 47, 1035–1044.
doi:10.1016/j.jbiomech.2013.12.028.
Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and their Applica-
428 tion. Cambridge University Press. doi:10.1017/cbo9780511802843.
lP
429 Degras, D., 2017. Simultaneous confidence bands for the mean of functional
430 data. Wiley Interdisciplinary Reviews: Computational Statistics 9, e1397.
431 doi:10.1002/wics.1397.
rna
432 Donoghue, O.A., Harrison, A.J., Coffey, N., Hayes, K., 2008. Func-
433 tional data analysis of running kinematics in chronic achilles tendon
434 injury. Medicine & Science in Sports & Exercise 40, 1323–1335.
435 doi:10.1249/mss.0b013e31816c4807.
436 Duhamel, A., Bourriez, J., Devos, P., Krystkowiak, P., Destée, A., Deram-
Jou
437 bure, P., Defebvre, L., 2004. Statistical tools for clinical gait analysis. Gait
438 and Posture 20, 204–212. doi:10.1016/j.gaitpost.2003.09.010.
24
Journal Pre-proof
of
439 Friston, K., Ashburner, J., Nichols, T., Penny, W., 2007. Statistical Para-
metric Mapping: The analysis of funtional brain images. Academic Press.
pro
440
441 Fusca, M., Negrini, F., Perego, P., Magoni, L., Molteni, F., Andreoni,
442 G., 2018. Validation of a wearable IMU system for gait analysis: Pro-
443 tocol and application to a new system. Applied Sciences 8, 1167.
444 doi:10.3390/app8071167.
445
446
447
re-
Goldsmith, J., Greven, S., Crainiceanu, C., 2012. Corrected confidence bands
for functional data using principal components. Biometrics 69, 41–51.
doi:10.1111/j.1541-0420.2012.01808.x.
448 Kluitenberg, B., Bredeweg, S.W., Zijlstra, S., Zijlstra, W., Buist, I., 2012.
lP
449 Comparison of vertical ground reaction forces during overground and
450 treadmill running. a validation study. BMC Musculoskeletal Disorders
451 13. doi:10.1186/1471-2474-13-235.
rna
455 Koska, D., Gaudel, J., Hein, T., Maiwald, C., 2018. Validation of an inertial
456 measurement unit for the quantification of rearfoot kinematics during run-
Jou
458 Kunsch, H.R., 1989. The jackknife and the bootstrap for general stationary
459 observations. The Annals of Statistics 17. doi:10.1214/aos/1176347265.
25
Journal Pre-proof
of
460 Lenhoff, M.W., Santner, T.J., Otis, J.C., Peterson, M.G., Williams, B.J.,
Backus, S.I., 1999. Bootstrap prediction and confidence bands: a superior
pro
461
462 statistical method for analysis of gait data. Gait & Posture 9, 10–17.
463 doi:10.1016/s0966-6362(98)00043-5.
464 Liebl, D., Reimherr, M., 2019. Fast and fair simultaneous confidence bands
465 for functional parameters arXiv:1910.00131.
466
467
468
re-
Ludbrook, J., 2010. Confidence in altman-bland plots: A critical review of
the method of differences. Clinical and Experimental Pharmacology and
Physiology 37, 143–149. doi:10.1111/j.1440-1681.2009.05288.x.
469 Montenij, L., Buhre, W., Jansen, J., Kruitwagen, C., de Waal, E., 2016.
lP
470 Methodology of method comparison studies evaluating the validity of car-
471 diac output monitors: a stepwise approach and checklist. British Journal
472 of Anaesthesia 116, 750–758. doi:10.1093/bja/aew094.
rna
473 Morrow, M.M., Lowndes, B., Fortune, E., Kaufman, K.R., Hallbeck, M.S.,
474 2017. Validation of inertial measurement units for upper body kinematics.
475 Journal of Applied Biomechanics 33, 227–232. doi:10.1123/jab.2016-0120.
476 Olshen, R.A., Biden, E.N., Wyatt, M.P., Sutherland, D.H., 1989.
477 Gait analysis and the bootstrap. The Annals of Statistics 17.
Jou
478 doi:10.1214/aos/1176347372.
479 Park, J., Seeley, M.K., Francom, D., Reese, C.S., Hopkins, J.T., 2017.
480 Functional vs. traditional analysis in biomechanical gait data: An al-
26
Journal Pre-proof
of
481 ternative statistical approach. Journal of Human Kinetics 60, 39–49.
doi:10.1515/hukin-2017-0114.
pro
482
483 Pataky, T.C., Caravaggi, P., Savage, R., Parker, D., Goulermas, J.Y.,
484 Sellers, W.I., Crompton, R.H., 2008. New insights into the plantar
485 pressure correlates of walking speed using pedobarographic statistical
486 parametric mapping (pSPM). Journal of Biomechanics 41, 1987–1994.
487
488
489
re-
doi:10.1016/j.jbiomech.2008.03.034.
493 Pini, A., Markström, J.L., Schelin, L., 2019. Test–retest reliability measures
494 for curve data: an overview with recommendations and supplementary
rna
496 R Core Team, 2021. R: A Language and Environment for Statistical Com-
497 puting. R Foundation for Statistical Computing. Vienna, Austria. URL:
498 https://www.R-project.org/.
Jou
499 Raimondo, G.D., Vanwanseele, B., van der Have, A., Emmerzaal, J.,
500 Willems, M., Killen, B.A., Jonkers, I., 2022. Inertial sensor-to-segment
27
Journal Pre-proof
of
501 calibration for accurate 3d joint angle calculation for use in OpenSim.
Sensors 22, 3259. doi:10.3390/s22093259.
pro
502
503 Ramsay, J.O., Graves, S., Hooker, G., 2021. fda: Functional Data Analy-
504 sis. URL: https://CRAN.R-project.org/package=fda. r package version
505 5.5.1.
506 Ramsay, J.O., Silverman, B.W., 2005. Functional Data Analysis. Springer
507
508
re-
New York. doi:10.1007/b98888.
Richter, C., O’Connor, N.E., Marshall, B., Moran, K., 2014. Compari-
509 son of discrete-point vs. dimensionality-reduction techniques for describ-
510 ing performance-related aspects of maximal vertical jumping. Journal of
lP
511 Biomechanics 47, 3012–3017. doi:10.1016/j.jbiomech.2014.07.001.
512 Røislien, J., Rennie, L., Skaaret, I., 2012. Functional limits of agreement:
513 A method for assessing agreement between measurements of gait curves.
Gait & Posture 36, 495–499. doi:10.1016/j.gaitpost.2012.05.001.
rna
514
517 Schwartz, M.H., Trost, J.P., Wervey, R.A., 2004. Measurement and man-
518 agement of errors in quantitative gait data. Gait & Posture 20, 196–203.
Jou
519 doi:10.1016/j.gaitpost.2003.09.011.
520 Sutherland, D., Olshen, R., Biden, E., Wyatt, M., 1988. Development of
521 mature walking. Mac Keith Press.
28
Journal Pre-proof
of
522 Telschow, F.J., Schwartzman, A., 2022. Simultaneous confidence bands for
functional data using the gaussian kinematic formula. Journal of Statistical
pro
523
re-
lP
rna
Jou
29
Journal Pre-proof
of
525 Figure Captions
pro
526 Figure 1: Curves from two measurement systems (gold standard (black)
527 and new measurement system) in four data sets (A - GAUSS, B - NON-
528 GAUSS, C - XSHIFT, D - REAL).
529
530 Figure 2: Distribution of band limits across 300 calculations in four differ-
531
532
533
re-
ent models (A - BOOTiid (yellow), B - BOOTrep (blue), C - POINT (grey),
D - RØISLIEN (pink)) in the GAUSS data set. Areas of uncertainty are
displayed as colored ribbons next to the actual differences.
534
536 (pink), BOOTrep (blue) and BOOTiid (yellow) in four data sets (A - GAUSS,
537 B - NONGAUSS, C - XSHIFT, D - REAL). Difference curves are displayed
538 as black curves in the background. Note that the bands in RØISLIEN,
BOOTrep, and BOOTiid represent a single random subsample and therefore
rna
539
540 do not necessarily reflect the coverage probabilities in Table 1. E.g., the
541 orange band (BOOTiid) in D is wider than the blue one (BOOTrep), even
542 though the coverage probability of BOOTiid is lower.
Jou
30
Journal Pre-proof
of
pro
re-
Figure 1
lP
rna
Jou
31
Journal Pre-proof
of
pro
re-Figure 2
lP
rna
Jou
32
Journal Pre-proof
of
pro
re-Figure 3
lP
rna
Jou
33
Journal Pre-proof
Doris Oriwol: Conceptualization, Methodology, Formal analysis, Writing – Review & Editing
of
Christian Maiwald: Conceptualization, Methodology, Formal analysis, Resources, Writing –
Review & Editing, Supervision
pro
re-
lP
rna
Jou
Journal Pre-proof
Conflict of Interest Statement
The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
of
pro
re-
lP
rna
Jou