Lins Et Al 2021

Applied Ergonomics 93 (2021) 103357
Contents lists available at ScienceDirect
Applied Ergonomics
journal homepage: http://www.elsevier.com/locate/apergo
OWAS inter-rater reliability

Christian Lins a, c, *, Sebastian Fudickar c, Andreas Hein b, c
a
Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Marie-Curie-Straße 2, 26129, Oldenburg, Germany
b
OFFIS – Institute for Information Technology, Escherweg 2, 26121, Oldenburg, Germany
c
Carl von Ossietzky University Oldenburg, Ammerländer Heerstr. 140, 26129, Oldenburg, Germany
A R T I C L E I N F O A B S T R A C T
Keywords: The Ovako Working posture Assessment System (OWAS) is a commonly used observational assessment method
OWAS for determining the risk of work-related musculoskeletal disorders. OWAS claims to be suitable in the application
Working postures for untrained persons but there is not enough evidence for this assumption. In this paper, inter-rater (inter-
Risk assessment
observer) reliability (agreement) is examined down to the level of individual postures and categories. For this
Reliability analysis
purpose, the postures of 20 volunteers have been observed by 3 varying human raters in a laboratory setting and
the inter-rater agreement against reference values was determined. A high agreement of over 98% (κ = 0.98) was
found for the postures of the arms but lower agreements were found for posture classification of the legs
(66 − 97%, κ = 0.85) and the upper body (80 − 96%, κ = 0.85). No significant difference was found between
raters with and without intense prior training in physical therapy. Consequently, the results confirm the general
reliability of the OWAS method especially for raters with non-specialized background but suggests weaknesses in
the reliable detection of a few particular postures.
1. Introduction notes on standardized observation-sheets include REBA (Hignett and

McAtamney, 2000), RULA (Dockrell et al., 2012), and EAWS (Schaub
Work-related musculoskeletal disorders (WMSD) constitute a sig et al., 2013).
nificant cause of lost work and early retirement among employees One of these established methods is the Ovako Working posture
(Punnett and Wegman, 2004). In particular, workers in physically Analyzing System (OWAS) (Karhu et al., 1977, 1981). OWAS, as initially
demanding occupations, such as industrial workers are affected (Kjell described by Karhu et al. (1977), seems widely used (van der Beek et al.,
berg et al., 2016). The total annual cost of lost productivity due to 2005) and is a versatile method of assessing physical risk factors for
WMSD is estimated as 2% of the GDP in Europe (Bevan, 2015). WMSDs musculoskeletal disorders. In OWAS, subject postures are evaluated by a
are often result from constrained or deviated postures that are repeated human rater at a fixed time interval (usually between 30s and 5min
for an extended time-period (Hoy et al., 2010; Amell and Kumar, 2001; (Brandl et al., 2017)), in three (or four) categories (arms, legs, and back
Matsui et al., 1997). Typical WMSD are chronic back pain or injuries of separately and sometimes load/force (Mattila and Vilkki, 1999)). A
knees or shoulders. In order to take targeted and individualized pre detailed description of the OWAS method follows in Section 2.1. The
ventive measures or customized, personalized interventions such as efficacy of the WMSD assessment has been already demonstrated for
rearrangements of the workplace structures and schedules, it is essential back pain using OWAS. Burdorf et al. (1991) have confirmed a weak
to get a precise overview of the type and frequency of the non-neutral correlation between OWAS ratings and later prevalence for back pain,
postures. For such assessments of postures and movements during which is a common WMSD. OWAS was explicitly designed for practi
work, both pen-and-paper and technology-supported assessment meth tioners and states that it does not require physiotherapeutic or occupa
odologies e.g. with video cameras or motion detection systems (Plantard tional health education from human raters (Karhu et al., 1977).
et al., 2016; Haggag et al., 2013; Diego-Mas and Alcaide-Marzal, 2014) Therefore, the reliability of the method in heterogeneous rater groups is
are applicable. Commonly used pen-and-paper assessment methods crucial. Reliability is defined here as the agreement between repeated
(Takala et al., 2010; Lowe et al., 2019) in which human observers take observations of the same subject.
* Corresponding author. Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Marie-Curie-Straße 2, 26129,
Oldenburg, Germany.
E-mail addresses: christian.lins@idmt.fraunhofer.de (C. Lins), sebastian.fudickar@uol.de (S. Fudickar), andreas.hein@uol.de (A. Hein).
https://doi.org/10.1016/j.apergo.2021.103357
Received 29 July 2019; Received in revised form 1 October 2020; Accepted 4 January 2021
Available online 30 January 2021
0003-6870/© 2021 Elsevier Ltd. All rights reserved.
C. Lins et al. Applied Ergonomics 93 (2021) 103357
The reliability of assessment methods for the analysis of postures has individual categories include the postures shown in Table 1. In each of
already been investigated in a number of studies. Kazmierczak et al. the three categories, the rater (observer) selects the partial posture that
(2006) have investigated the inter- and intra-rater reliability for most closely corresponds to the actual posture of the subject. Each
assessing working postures. However, their study was based on video partial posture is assigned a corresponding numerical code. Together,
recordings and did not use any well-known pen-and-paper method such the codes of back, arms, and legs form a three-digit code that describes
as OWAS. The study by Dartt et al. (2009) also used video recordings, the posture. For this OWAS code there are 72 possible combinations (if
here to determine the inter- and intra-rater reliability in the assessment one excludes leg position 7).
of upper limb postures using Multimedia Video Task Analysis (MVTA) —
a software that supports the manual assessment via RULA and OWAS. 2.2. Study design
Bao et al. (2009) also used video recordings but assessed the reliability
of the observations not on the basis of a specific method, but with regard In the study 20 participants were separately observed by three
to the flexion and abduction angles of the individual upper limbs. Trask human raters while posing in each of the 72 (excluding lower limbs
et al. (2017) investigated (again using video recordings) the influence of posture 7 ‘walking’) OWAS postures (see Table 1). Two groups of raters
partly or fully visible limbs on the reliability of the observations. Rhén were selected, the first group being students of the graduating class of a
and Forsman (2020) used video clips that were presented to professional physiotherapy school, the second group being employees
OHS-ergonomists to assess the inter- and intra-rater reliability for the and students of the University of Oldenburg without such a previous
OCRA checklist method. Oliv et al. (2019) have determined the education (see Fig. 1).
inter-rater reliability for both the total score as well as individual items The three raters were equipped with pens and structured sheets (see
of the Quick Exposure Check (QEC) assessment method. Appendix Fig. 5) and viewed each participant from one viewing-angle
Karhu et al. (1977) have given inter- and intra-rater agreement (see Fig. 2). The raters sat next to each other at a distance of around
values (23–99% and 70–100% respectively) for OWAS but for the whole 1 m among each other, and were requested to remain seated, to ensure
body assessment. De Bruijn et al. have already investigated inter- and the independent evaluation of the raters. Due to the resulting position-
intra-rater reliability of OWAS in more detail (De Bruijn et al., 1998). offset, the viewing angle from the outer rater seats on the participants
Their study also examined the differentiability of individual OWAS was not frontal, but slightly shifted. However, raters were allowed to
categories (back, arms, legs), i.e. how confident the raters could shift their viewing angle while remaining seated to aid visual perception
distinguish between the individual categories. Pictures of people in
different postures were presented to the raters in this study for evalua Table 1
tion. However, the different postures belonging to a specific category OWAS categories and postures.
(back, arms, legs) are assigned to different action classes, i.e. their
Category Code Posture Description
associated risk is varying. Therefore, a statement about the possibility of
differentiating the individual categories is not sufficient. It is crucial to Back 1xx Straight
know the degree to which raters can distinguish between the individual
postures of a category. Since this has not yet been investigated, in this 2xx Bent
paper we present the results of a controlled study that examined
inter-rater reliability down to the level of individual postures of real
persons and not only pictures of them. 3xx Twisted
The OWAS method is a comparatively simple assessment method and

thus predestined for use by personnel who have not been explicitly 4xx Bent and twisted
trained in ergonomics. Weir et al. (2011) have shown that for
video-based observations the classification accuracy of raters with and
without ergonomics training is not significantly different. For direct Arms x1x Both arms below shoulder level
observation, i.e. not video recordings, this has not yet been shown, so we
have also investigated this in this study. A part of our raters were stu
x2x One arm at or above shoulder level
dents of a physiotherapist school in their last year of training, so that one
can assume a solid education in physical therapy and knowledge of the
musculoskeletal system. x3x Both arms at or above shoulder level
Based on the above, this paper examines the following two research
questions:
Legs xx1 Sitting
1. How strong is the inter-rater agreement for the different postures of
the OWAS assessment method?
2. How do OWAS assessments differ from observations with and xx2 Standing on both straight legs
without prior training in physical therapy (physiotherapeutic

background)?
xx3 Standing on one straight leg
This work is structured as follows: in Section 2 the OWAS method,

the study design, and the statistical analysis are described in detail. The xx4 Standing on both knees bent
results are presented with confusion matrices in Section 3, and discussed
in Section 4. The article is concluded in Section 5.
xx5 Standing on one knee bent
2. Material and methods

xx6 Kneeling on one or both knees
2.1. OWAS method
xx7 Walking
The OWAS classification system classifies the possible postures into
three categories (representing the body part): back, arms, and legs. The
2
Fig. 3. Screenshot of the study software. The text below the figures are a tex
tual explanation of the part posture (in German).
OWAS method. The constellation of the rater-group was altered per

participant.
The Institutional Review Board (IRB) of the University of Oldenburg
approved the study design (Drs. 24/2017).
Fig. 1. Study participant groups.
2.2.1. Process of one trial
1. A custom software (see Fig. 3) generates a random OWAS code using

a Pseudo-RNG initialized with the participant number. Within one
trial the generated OWAS code can only occur once. The OWAS
posture is displayed to the participant modeling the posture, but not
to the raters (see Figs. 2 and 3).
2. The participant interprets the posture and performs it independently.
The raters can observe the participant while he or she models the
posture.
3. The three raters are given up to 30s to rate the posture independently
(no communication between the raters was allowed) using a paper
sheet. They are not allowed to stand up, but they may move their
upper body and adjust their view of the participant.
4. Steps 1–3 are repeated for every one of the 72 posture combinations.
2.3. Statistical analysis
After the study was completed, the paper-based observation sheets

were digitized manually.
With regard to the first research question, the inter-rater reliability
was first determined using Fleiss’ kappa. This allows some conclusions
to be drawn about how the three raters agree on a rating. The com
Fig. 2. Top view of the study setup. Circles represent raters and study parison of the raters with each other, however, reveals no evidence as to
participants.
whether their rating is also correct in relation to a reference value. The
raters could, for example, all be incorrect in their assessment.
For both the calculation of the kappa values and the χ 2 -tests
and lower the influence of positioning offset. LibreOffice 7.0 was used (α = 0.05).
The participants wore authentic industrial workers’ workwear,
typically made of robust and wide cut fabric, which realistically com 2.3.1. Kappa calculation
plicates the recognition of postures. The participant was asked to model For the calculation of agreement between two raters, the kappa
the postures independently and to align himself or herself frontally with values (Cohen’s kappa) was applied as calculated in accordance to
the raters. The study director explicitly did not correct the postures in (Kvålseth, 1989; Cohen, 1960) via:
order not to influence the raters. po − pc
Since neither participants nor raters have been previously trained in κC = (1)
1 − pc
OWAS, both participants and raters received a short introduction in the
The kappa value is the fraction of the agreement that remains after
3
the random agreement pc has been removed from the overall agreement Table 3
po . Here, the kappa calculation is used for the agreement between the Confusion matrix for reference against rater votes for the back postures.
human raters and the ground truth reference. Reference
Cohen’s kappa is defined for two raters, so the generalization Fleiss’
1 2 3 4
kappa for n raters and N participants is used as well (Fleiss, 1971).
Consider the following notations from Fleiss’ work: Rater 1 (Straight) 96% 1% (13) 2% (25) 0% (4)
(1049)
2 (Bent) 1% (16) 93% 0% (5) 6% (60)
1 ∑N
P= Pi (2) (999)
N i=1 3 (Twisted) 1% (8) 1% (11) 86% 12%
(893) (125)
Pi in Equation (2) is the fraction of agreement of all rater pairs (here: 4 (Bent & 0% (1) 6% (63) 14% 80%
Twisted) (153) (891)
rater 1 and 2, rater 1 and 3, rater 2 and 3) regarding participant i. So P is
the average agreement over all participants.
∑
k
Table 4
Pe = p2j (3)
j=1
Confusion matrix for reference against rater vote for arms postures.
Reference
pj in Equation (3) is the fraction of all ratings to category j. Pe is the mean 1 2 3
agreement that can be expected by random ratings.
Rater 1 (Arms below shoulders) 98% 0% (7) 1% (16)
(1428)
P − Pe
κF = (4) 2 (One arm above shoulder) 0% (3) 98% 1% (19)
1 − Pe (1412)
3 (Both arms above 0% (7) 1% (9) 99%
In Equation (4) Fleiss’ kappa κF is the normalized agreement between
shoulder) (1417)
n raters minus the random agreement.
In this paper, Fleiss’ kappa is used to compare the consistency of the
ratings between the three raters observing each posture, and Cohen’s
Table 5
kappa to compare the consistency of the posture ratings made by the
Confusion matrix for reference against rater votes for legs postures. 1: Sitting, 2:
human raters with the known reference posture.
Standing on straight legs, 3: Standing on one straight leg, 4: Standing on both
Therefore, in the second step, the individual ratings of the raters
knees bent, 5: Standing on one knee bent, 6: Kneeling on one or both knees.
were treated as single ratings when compared to the reference posture.
Reference
The postures, which the study software selects at random and then
displays to the participant modelling the posture, serve as reference. 1 2 3 4 5 6
When comparing the observations against a reference that is inde Rater 1 98% 1% (4) 0% (3) 1% (4) 0% (2) 0% (0)
pendently interpreted by the individual modeling the posture, a misin (707)
terpretation of the instructions may cause additional errors or a bias. To 2 0% (2) 94% 1% (5) 4% (31) 1% (5) 0% (0)
(693)
cancel out this reference interpretation bias, a majority vote was
3 0% (4) 2% (19) 66% 1% (8) 30% 1% (5)
calculated, i.e. when two of three raters share the same vote there is high (648) (291)
probability that the third rater’s observation is incorrect. Cohen’s kappa 4 0% (2) 1% (6) 0% (3) 94% 3% (19) 1% (8)
was used to assess this agreement, which is intended for two raters (in (669)
our case the agreement between the majority vote for observations and 5 0% (0) 0% (1) 13% 1% (3) 81% 4% (20)
(61) (384)
the reference values). 6 0% (1) 0% (0) 0% (0) 1% (6) 1% (9) 97%
With regard to the second research question, the two rater type (681)
groups were considered separately and Fleiss’ kappa values were
calculated for each group.
3. Results
2.3.2. Confusion matrices
A confusion matrix visualizes the error (therefore sometimes called In total, 32 raters participated (8 of 32 trained in physical therapy) in
error matrix) that observations can have in comparison with a reference. the study. As a result, we have a total of 4320 individual posture ob
Here the possible categories are listed in the rows and columns, where servations (20 participants times 72 possible OWAS postures times 3
the rows refer to recognized categories of the raters and the columns raters). The individual observations at the level of the three categories
represent the “true” categories of the reference. Therefore, the diagonal back, arms, and legs were compared. Tables 3–5 shows the confusion
elements of the matrix contain the true positives, i.e. the correct matches matrix for the reference against rater votes for the back, arms, and legs
between raters and reference, whereas the remaining elements contain postures.
errors, i.e. no matches, of the raters (see Table 2).
3.1. Rater votes compared with reference
For the back postures (Table 3) the results show a high agreement of
Table 2
Example confusion matrix for reference against raters for two possible
96% and 93% for the OWAS codes 1 and 2 and significantly1 lower
categories. agreements for codes 3 (86%) and 4 (80%). The kappa for the back
postures is κ = 0.85. The differences between all four category ratings
Reference
on the diagonal are significant (p = 0.02).
1 2 The overall agreement for the arms postures (Table 4) is greater 98%
Rater 1 Rater is correct Rater is wrong (sees 1 but 2 is
correct)
2 Rater is wrong (sees 2 but 1 is Rater is correct 1 2
χ -test of OWAS codes 1/2 against 3/4 ratings with α = 0.05 results in p =
correct)
0.472.
4
(κ = 0.98) with only few not significant disagreements. The differences Table 8
between the three diagonal values are not significant (p = 0.98). Confusion matrix for reference against rater (PT-trained) votes for arms
The results for the legs postures (Table 5) are mixed: there was a high postures.
agreement for sitting posture (98%), standing on one straight leg (94%), Reference
standing on both knees bent (94%), and kneeling (97%). Least agree 1 2 3
ment was for the standing on one straight leg (66%) which was often
Rater 1 (Arms below shoulders) 98% (570) 1% (3) 1% (8)
confused with standing on one knee bent (81%). The differences between
2 (One arm above shoulder) 0% (2) 98% (558) 2% (9)
all six category ratings are significant (p = 7.92⋅10− 14 ). 3 (Both arms above shoulder) 1% (3) 1% (3) 99% (571)
3.2. Raters groups (PT-trained vs. PT-untrained) compared with

reference Table 9
Confusion matrix for reference against rater (PT-untrained) votes for arms
3.2.1. Back postures postures.
Table 6 and Table 7 show the agreement of both rater groups for the Reference
back postures. The differences between all six category ratings of the PT- 1 2 3
trained raters are not significant (p = 0.62) whereas the differences in
Rater 1 (Arms below shoulders) 99% (858) 0% (4) 1% (8)
the ratings of the PT-untrained raters are significant (p = 0.03). 2 (One arm above shoulder) 0% (1) 99% (854) 1% (10)
3 (Both arms above shoulder) 0% (4) 1% (6) 99% (846)
3.2.2. Arms postures
Table 8 and Table 9 show the agreement for both rater groups for
arms postures. The differences in the agreement of the three categories Table 10
Confusion matrix for reference against rater (PT-trained) votes for legs pos
are not significant for both PT-trained (p = 0.95) and PT-untrained (p =
tures. 1: Sitting, 2: Standing on straight legs, 3: Standing on one straight leg, 4:
0.98) raters.
Standing on both knees bent, 5: Standing on one knee bent, 6: Kneeling on one or
both knees.
3.2.3. Legs postures
Reference
Table 10 and Table 11 show the agreement for both rater groups for
legs postures. The differences in the agreements of the six categories are 1 2 3 4 5 6
significant for both PT-trained (p = 0.01) and PT-untrained (p = 4.60⋅ Rater 1 96% 1% (4) 0% (0) 1% (4) 1% (2) 0% (0)
10− 12 ) raters. (281)
2 0% (1) 94% 1% (2) 4% (10) 0% (1) 0% (0)
(269)
3.2.4. Summary 3 1% (2) 3% (10) 73% 0% (1) 22% 1% (2)
Table 12 summarizes the different kappa values and results of the (272) (81)
χ 2 -test (α = 0.05) for both groups (with and without prior training in 4 0% (0) 1% (2) 0% (0) 94% 3% (9) 0% (1)
physical therapy). (269)
5 0% (0) 0% (0) 7% (14) 1% (2) 89% 0% (1)
For the χ 2 -test the hypotheses were H0 : no difference between both (186)
groups, H1 : difference between both groups. 6 0% (1) 0% (0) 0% (0) 1% (4) 2% (7) 94%
(282)
Table 6 3.3. 2-Rater compared with 3rd rater agreement

Confusion matrix for reference against rater (PT-trained) votes for back
postures. In this evaluation, if two raters agreed, their votes were compared
Reference with those of the third rater. The following tables show the agreement
1 2 3 4
for the individual posture categories, an overall summary is given in
Table 16 (both absolute and relative concordance, kappa values and
Rater 1 (Straight) 99% 1% (3) 0% (2) 0% (0)
results of the χ 2 -test).
(415)
2 (Bent) 2% (9) 92% 0% (2) 5% (21) Table 13 shows the agreements for the postures of the back. The
(391) differences in the agreements of the four categories are significant (p =
3 (Twisted) 0% (2) 0% (1) 89% 10% (43) 0.09).
(372) Table 14 shows the agreements for the postures of the arms. The
4 (Bent & 0% (0) 9% (43) 11% (52) 79%
Twisted) (368)
differences in the agreements of the three categories are not significant
(p = 0.96).
Table 15 shows the agreement for the postures of the legs. The dif
ferences in the agreements of the six categories are significant (p =
Table 7
9.85⋅10− 10 ).
Confusion matrix for reference against rater (PT-untrained) votes for back
postures. For the χ 2 -test (see Table 16) the hypotheses were H0 : no difference
between comparison with reference and comparison two 2-raters
Reference
concordance, H1 : difference between both comparison approaches.
1 2 3 4
Rater 1 (Straight) 94% 1% (10) 3% (23) 1% (4) 4. Discussion

(634)
2 (Bent) 1% (7) 93% 0% (3) 6% (39)
(608) 4.1. Interpretation of the results
3 (Twisted) 1% (6) 2% (10) 84% 13% (82)
(521) The results of our investigation indicate a substantial agreement
4 (Bent & 0% (1) 3% (20) 16% 81% between the raters validated against the reference for almost all pos
Twisted) (101) (523)
tures. Especially the classification of the arms postures agreed excellent
5
Table 11 incorrectly considered low-risk (risk class 1–2), although it should

Confusion matrix for reference against rater (PT-untrained) votes for legs pos correctly be considered high-risk (risk class 3–4).
tures. 1: Sitting, 2: Standing on straight legs, 3: Standing on one straight leg, 4: There are two ways to deal with the low rater agreement in postures
Standing on both knees bent, 5: Standing on one knee bent, 6: Kneeling on one or 3/4 of the arms and 3/5 of the legs. It would be conceivable to sum
both knees. marize the postures which are difficult to distinguish and to evaluate
Reference them as a combined posture. This would simplify the assessments.
1 2 3 4 5 6 However, such a combination would only be permissible if the associ
ated action class is at least comparable. According to Mattila et al.
Rater 1 99% 0% (0) 1% (3) 0% (0) 0% (0) 0% (0)
(426) (Mattila and Vilkki, 1999) and our results, however, this is usually not
2 0% (1) 93% 1% (3) 5% (21) 1% (4) 0% (0) possible, especially for postures “standing on one straight leg” and
(424) “standing on one bent leg” the associated risk classes are almost con
3 0% (2) 1% (9) 62% 1% (7) 34% 0% (3) trary. The second way to deal with a low rater agreement is to increase
(376) (210)
4 0% (2) 1% (4) 1% (3) 93% 2% (10) 2% (7)
the accuracy of observation. This could be done with additional obser
(400) vations or raters from multiple angles, or with technical motion capture
5 0% (0) 0% (0) 17% 0% (1) 73% 7% (19) (MoCap) systems (e.g. Xsens MVN, SIRKA (Lins et al., 2015; Lins et al.,
(47) (198) 2018)) that, if correctly calibrated, have higher spatial precision than
6 0% (0) 0% (0) 0% (0) 0% (2) 0% (2) 98%
human raters from one angle.
(399)
The OWAS method is designed to be carried out by non-specialized
raters, i.e. such which have not experienced occupational medical or
physiotherapeutic training. Regarding the comparison of the different
Table 12
groups of observers (with and without previous training in physical
Different kappa values for both rater groups.
therapy) the results show no significant difference in the ratings of the
Group Back Arms Legs two groups. This indicates that a corresponding training in physical
Trained in physical κ = 0.86 κ = 0.98 κ = 0.89 therapy is not necessary for a high inter-rater reliability. The design of
therapy OWAS was based on the practical and universal applicability of the
Untrained in physical κ = 0.84 κ = 0.98 κ = 0.83 method, which is confirmed by our results. Notably, the χ 2 -test value for
therapy
1.849 < 7.815 0.179 < 5.991 10.987 < 11.070
the posture of the legs is close to the significance threshold value, but
χ2 -test
p = 0.604 p = 0.915 p = 0.052
still within the significance range of the null hypothesis.
It was also examined whether – if two evaluators agree – this
agreement could be used as a reference to cancel out the interpretation
bias of the participant modelling the posture. However, the results of
Table 13 this article show a significant inter-rater deviation for the postures of the
Confusion matrix for 2-raters vs. 3rd-rater votes for back postures. legs. At least for the postures of the legs it would not be permissible to
Two raters (as reference) use the agreement of two evaluators as a substitute reference.
1 2 3 4
4.2. Comparison with related work
3rd 1 (Straight) 91% 3% (13) 6% (21) 0% (0)
Rater (341)
2 (Bent) 2% (8) 91% 1% (3) 6% (23) In general, our results are consistent with the inter-rater agreements
(326) found by previous studies. The early work of Karhu et al. (1977) gives an
3 (Twisted) 2% (7) 2% (7) 75% 21% (76) inter-rater agreement of 69% (median) for raters without ergonomics
(266)
4 (Bent & 0% (1) 3% (10) 14% (48) 83%
background for the complete poses. Our study showed gives a median
Twisted) (280) inter-rater agreement of 71% for complete poses. Also the
category-specific inter-rater agreements of De Bruijn’s study (De Bruijn
et al., 1998) (arms = 87%, legs = 93%, back = 87%) correspond with
our results (arms = 99%, legs = 88%, back = 89%), even though our
Table 14
Confusion matrix for 2-raters vs. 3rd-rater votes for arms postures. study found a slightly increased agreement for the arms and a decreased
agreement for the legs. These variations of the inter-rater agreement
Two raters (as reference)
among both studies might be a consequence of the applied work-wear
1 2 3 and varying individual training differences of the rater: in the study
3rd 1 (Arms below shoulders) 97% 2% (8) 1% (6) by De Bruijn et al. the raters had 30 s time for evaluation but were only
rater (471) allowed to look at the slide which depicted the posture for 3 s. In
2 (One arm above shoulder) 1% (5) 96% 3% (14) contrast, the raters in our study had the full 30 s to observe and evaluate
(461)
3 (Both arms above 1% (6) 1% (7) 97%
the posture. This allowed the raters in our study to examine the par
shoulder) (460) ticipants’ postures more closely and may explain the differences from
previous studies. Additionally, in this study here the raters were able to
slightly change their view (as long as they remained seated) to aid visual
among both groups. Since it is sometimes difficult to distinguish be perception to get an improved sight on the participant.
tween “twisted” and “twisted and bent” postures, both postures are
regularly confused. In over 40% of cases, ratings of the lower extremities 4.3. Limitations
do not agree on whether the person is standing on a straight or a slightly
bent leg. Fig. 4 shows examplarily both postures to clarify the similarity In our study the raters were able to slightly change their view to aid
among both postures indicating the causality of the varying raters’ visual perception. And although participants were asked to face the
opinion. It can be seen, that the knees were hidden under the wide raters frontally, there was no technical validation to ensure the proper
trousers, so it is difficult to decide whether the knee is bent or not. This alignment. Whether the viewing angle of observations significantly af
has a significant impact on the risk assessment of posture since, fects intra-rater agreement has yet to be investigated.
depending on posture, the posture of the lower extremities may be The study was conducted under controlled laboratory conditions.
6
Table 15
Confusion matrix for 2-rater vs. 3rd-rater votes for legs postures. 1: Sitting, 2: Standing on straight legs, 3: Standing on one straight leg, 4: Standing on both knees bent,
5: Standing on one knee bent, 6: Kneeling on one or both knees.
Two raters (as reference)
1 2 3 4 5 6
3rd rater 1 97% (231) 2% (4) 0% (0) 0% (1) 1% (2) 0% (0)

2 1% (2) 87% (216) 4% (9) 7% (17) 0% (1) 0% (0)
3 1% (2) 3% (10) 76% (247) 1% (3) 17% (54) 2% (5)
4 0% (1) 3% (7) 3% (8) 87% (209) 1% (2) 3% (8)
5 0% (0) 0% (0) 33% (55) 2% (3) 53% (89) 10% (16)
6 0% (1) 0% (0) 0% (1) 1% (2) 1% (3) 94% (207)
Table 16
Agreements summarized for 2-raters vs. 3rd rater (see Section 3.3) and all raters
vs. reference (see Section 3.1) agreements together with χ 2 -test results.
Back Arms Legs
Reference comparison κ = 0.85 (89%) κ = 0.98 (99%) κ = 0.85 (88%)

2-raters vs. 3rd-rater κ = 0.80 (85%) κ = 0.95 (97%) κ = 0.81 (82%)
χ2 -test (α = 0.05) 1.179 < 7.815 0.046 < 5.991 13.748 > 11.070
p = 0.758 p = 0.977 p = 0.017
Although the participants’ work clothes were realistically chosen, the

postures are posed and may not correspond to the postures to be adopted
in reality. This is a compromise, as such a controlled study could hardly
have been carried out in an industrial company.
It remains to be examined whether an occupational medical back
ground in contrast to training in physical therapy has an advantageous Fig. 4. Different, but difficult to distinguish leg postures. Left picture depicts
effect on the OWAS observation agreement. Likewise, it was not exam OWAS leg posture code 3 (standing on one straight leg), right posture code 5
ined how the agreement of the raters develops over time. It would be (standing on one bent leg).
plausible that the reliability of very experienced OWAS raters increases
over time. However, these questions will be subject to future studies.
5. Conclusion and usefulness of OWAS as simple and easy to use assessment method for
postures. This is also shown by several ergonomics studies carried out
In the conducted study, the inter-rater (inter-observer) agreement of with the help of the OWAS method and its wide usage throughout the
the OWAS method was examined down to the level of the individual community (Diego-Mas and Alcaide-Marzal, 2014; Hellig et al., 2018;
postures. With a few exceptions, a very high agreement was found for White and Lee Kirby, 2003; Li and Lee, 1999; Mattila et al., 1993). Thus,
the individual postures. Comparatively low agreements were found for the OWAS method remains a practical and easy-to-use assessment
the back (“twisted” and “bent and twisted”) as well as for the lower method for postures in work environments.
limbs (“load on a straight leg” and “load on bent leg”). This might be
caused by confusion in terms of perception among raters. The elusive
postures cannot be summarized either, because in the literature different Declaration of competing interest
action classes are given for them (Mattila and Vilkki, 1999). For the
practical application of OWAS, these results show that OWAS users have The authors declare that they have no known competing financial
to pay special attention to these elusive postures of back and legs. interests or personal relationships that could have appeared to influence
Training instructions should take this into account. the work reported in this paper.
The influence of prior training in physical therapy on the inter-rater
agreement was also investigated. No significant variations of the inter- Acknowledgements
rater agreement among both considered rater groups (with and
without prior training in physical therapy) have been found, even This work was supported by the funding initiative Niedersächsisches
though non-trained raters performed with slightly lower agreements. Vorab of the Volkswagen Foundation and the Ministry of Science and
This supports the claim of OWAS to be practical usable for everyone Culture of the Lower Saxony State (MWK) as a part of the Interdisci
without the requirement of specific training. plinary Research Centre on Critical Systems Engineering for Socio-
Although our study indicates that it is difficult to differentiate be Technical Systems II.
tween distinct partial postures (leg postures standing on one straight leg The authors would like to thank both the editor and the anonymous
and standing on one leg bent), this does not reduce the overall suitability reviewers for their valuable comments and suggestions.
7
Appendix
Fig. 5 shows the checklist that was used throughout the study to manually assess the postures of the participants. One row is for one posture.
Fig. 5. OWAS checklist for up to 26 posture samples (see rows). Posture images are from (Diego-Mas and Alcaide-Marzal, 2014).
References Limb Assessment (RULA) as a method of assessment of children’s computing posture.

Appl. Ergon. 43 (3), 632–636. https://doi.org/10.1016/j.apergo.2011.09.009.
Fleiss, J.L., 1971. Measuring nominal scale agreement among many raters. Psychol. Bull.
Amell, T., Kumar, S., 2001. Work-related musculoskeletal disorders: design as a
76 (5), 378. https://doi.org/10.1037/h0031619.
prevention strategy. A review. J. Occup. Rehabil. 11 (4), 255–265. https://doi.org/
Haggag, H., Hossny, M., Nahavandi, S., Creighton, D., 2013. Real time ergonomic
10.1023/A:1013344508217.
assessment for assembly operations using kinect. In: Proceedings - UKSim 15th
Bao, S., Howard, N., Spielholz, P., Silverstein, B., Polissar, N., 2009. Interrater reliability
International Conference on Computer Modelling and Simulation, vol. 2013. UKSim,
of posture observations. Hum. Factors 51 (3), 292–309. https://doi.org/10.1177/
p. 495. https://doi.org/10.1109/UKSim.2013.105.
0018720809340273 arXiv:NIHMS150003.
Hignett, S., McAtamney, L., 2000. Rapid entire body assessment (REBA). Appl. Ergon. 31
Bevan, S., 2015. Economic impact of musculoskeletal disorders (MSDs) on work in
(2), 201–205. https://doi.org/10.1016/S0003-6870(99)00039-3.
Europe. Best Pract. Res. Clin. Rheumatol. 29 (3), 356–373. https://doi.org/10.1016/
Hoy, D., Brooks, P., Blyth, F., Buchbinder, R., 2010. The Epidemiology of low back pain.
j.berh.2015.08.002.
Best Pract. Res. Clin. Rheumatol. 24 (6), 769–781. https://doi.org/10.1016/j.
Brandl, C., Mertens, A., Schlick, C.M., 2017. Effect of sampling interval on the reliability
berh.2010.10.002.
of ergonomic analysis using the Ovako working posture analysing system (OWAS).
Karhu, O., Kansi, P., Kuorinka, I., 1977. Correcting working postures in industry: a
Int. J. Ind. Ergon. 57, 68–73. https://doi.org/10.1016/j.ergon.2016.11.013.
practical method for analysis. Appl. Ergon. 8 (4), 199–201. https://doi.org/10.1016/
Burdorf, A., Govaert, G., Elders, L., 1991. Postural load and back pain of workers in the
0003-6870(77)90164-8.
manufacturing of prefabricated concrete elements. Ergonomics 34 (7), 909–918.
Karhu, O., Härkönen, R., Sorvali, P., Vepsäläinen, P., 1981. Observing working postures
https://doi.org/10.1080/00140139108964834.
in industry: examples of OWAS application. Appl. Ergon. 12 (1), 13–17. https://doi.
Cohen, J., 1960. A Coefficient of Agreement for Nominal Scales, Educational and
org/10.1016/0003-6870(81)90088-0.
Psychological Measurement XX (1), 37–46. https://doi.org/10.1177/
Kazmierczak, K., Erik, S., Neumann, P., Winkel, J., 2006. Observer reliability of
001316446002000104.
industrial activity analysis based on video recordings. Int. J. Ind. Ergon. 36,
Dartt, A., Rosecrance, J., Gerr, F., Chen, P., Anton, D., Merlino, L., 2009. Reliability of
275–282. https://doi.org/10.1016/j.ergon.2005.12.006.
assessing upper limb postures among workers performing manufacturing tasks. Appl.
Kjellberg, K., Lundin, A., Falkstedt, D., Allebeck, P., Hemmingsson, T., 2016. Long-term
Ergon. 40 (3), 371–378. https://doi.org/10.1016/j.apergo.2008.11.008.
physical workload in middle age and disability pension in men and women: a follow-
De Bruijn, I., Engels, J.A., Van Der Gulden, J.W.J., 1998. A simple method to evaluate the
up study of Swedish cohorts. Int. Arch. Occup. Environ. Health 89 (8), 1239–1250.
reliability of OWAS observations. Appl. Ergon. 29 (4), 281–283. https://doi.org/
https://doi.org/10.1007/s00420-016-1156-0.
10.1016/S0003-6870(97)00051-3.
Kvålseth, T.O., 1989. Note on Cohen’s kappa. Psychol. Rep. 65 (1), 223–226. https://doi.
Diego-Mas, J.A., Alcaide-Marzal, J., 2014. Using Kinect sensor in observational methods
org/10.2466/pr0.1989.65.1.223.
for assessing postures at work. Appl. Ergon. 45 (4), 976–985. https://doi.org/
White, H.A., Lee Kirby, R., 2003. Folding and unfolding manual wheelchairs: an
10.1016/j.apergo.2013.12.001.
ergonomic evaluation of health-care workers. Appl. Ergon. https://doi.org/10.1016/
Dockrell, S., O’Grady, E., Bennett, K., Mullarkey, C., Mc Connell, R., Ruddy, R.,
S0003-6870(03)00079-6.
Twomey, S., Flannery, C., 2012. An investigation of the reliability of Rapid Upper
8
Hellig, T., Mertens, A., Brandl, C., 2018. The interaction effect of working postures on Plantard, P., Shum, H.P., Le Pierres, A.-S., Multon, F., 2016. Validation of an ergonomic
muscle activity and subjective discomfort during static working postures and its assessment method using Kinect data in real workplace conditions. Appl. Ergon. 1–8.
correlation with OWAS. Int. J. Ind. Ergon. https://doi.org/10.1016/j. https://doi.org/10.1016/j.apergo.2016.10.015.
ergon.2018.06.006. Punnett, L., Wegman, D.H., 2004. Work-related musculoskeletal disorders: the
Lins, C., Eichelberg, M., Rölker-Denker, L., Hein, A., 2015. SIRKA: Sensoranzug zur epidemiologic evidence and the debate. J. Electromyogr. Kinesiol. 14 (1), 13–23.
individuellen Rückmeldung körperlicher Aktivität, Dokumentationsband zur 55. https://doi.org/10.1016/j.jelekin.2003.09.015.
DGAUM–Jahrestagung, pp. 301–303. https://doi.org/10.13140/RG.2.1.4269.5128. Rhén, I.-M., Forsman, M., 2020. Inter- and intra-rater reliability of the ocra checklist
Lins, C., Fudickar, S., Gerka, A., Hein, A., 2018. A wearable vibrotactile interface for method in video-recorded manual work tasks. Appl. Ergon. 84, 103025. https://doi.
unfavorable posture awareness warning. In: Proceedings of 4th International org/10.1016/j.apergo.2019.103025. http://www.sciencedirect.com/science/artic
Conference on Information and Communication Technologies for Aging Well and E- le/pii/S0003687019302339.
Health (ICT4AWE). INSTICC, Funchal. https://doi.org/10.5220/ Schaub, K., Caragnano, G., Britzke, B., Bruder, R., 2013. The European Assembly
0006734901780183. Worksheet. Theor. Issues Ergon. Sci. 14 (6), 616–639. https://doi.org/10.1080/
Lowe, B.D., Dempsey, P.G., Jones, E.M., 2019. Ergonomics assessment methods used by 1463922X.2012.678283.
ergonomics professionals. Appl. Ergon. 81 (June), 102882. https://doi.org/10.1016/ Takala, E.-P., Pehkonen, I., Forsman, M., Hansson, G.-Å., Mathiassen, S.E., Neumann, W.
j.apergo.2019.102882. P., Sjøgaard, G., Veiersted, K.B., Westgaard, R.H., Winkel, J., 2010. Systematic
Matsui, H., Maeda, A., Tsuji, H., Naruse, Y., 1997. Risk Indicators of Low Back Pain evaluation of observational methods assessing biomechanical exposures at work.
Among Workers in Japan. Association of Familial and Physical Factors with Low Scand. J. Work. Environ. Health 36 (1), 3–24. https://doi.org/10.5271/sjweh.2876.
Back Pain. https://doi.org/10.1097/00007632-199706010-00015. http://www.nc Trask, C., Erik, S., Rostami, M., Heiden, M., 2017. Observer variability in posture
bi.nlm.nih.gov/pubmed/9201863. assessment from video recordings : the effect of partly visible periods. Appl. Ergon.
Mattila, M., Karwowski, W., Vilkki, M., 1993. Analysis of working postures in hammering 60, 275–281. https://doi.org/10.1016/j.apergo.2016.12.009.
tasks on building construction sites using the computerized OWAS method. Appl. van der Beek, A.J., Mathiassen, S.E., Windhorst, J., Burdorf, A., 2005. An evaluation of
Ergon. 24 https://doi.org/10.1016/0003-6870(93)90172-6. methods assessing the physical demands of manual lifting in scaffolding. Appl.
Mattila, M., Vilkki, M., 1999. OWAS methods. In: The Occupational Ergonomics Ergon. 36 (2), 213–222. https://doi.org/10.1016/j.apergo.2004.10.012. http://
Handbook. CRC Press LLC, pp. 447–459. www.sciencedirect.com/science/article/pii/S0003687004001474.
Li, K.W., Lee, C.L., 1999. Postural analysis of four jobs on two building construction sites: Weir, P.L., Andrews, D.M., Wyk, P.M.V., Callaghan, J.P., 2011. The influence of training
an experience of using the OWAS method in Taiwan. J. Occup. Health. https://doi. on decision times and errors associated with classifying trunk postures using video-
org/10.1539/joh.41.183. based posture assessment methods. Ergonomics 54 (2), 197–205. https://doi.org/
Oliv, S., Gustafsson, E., Baloch, A.N., Hagberg, M., Sandén, H., 2019. The Quick Exposure 10.1080/00140139.2010.547603.
Check (QEC) Inter-rater reliability in total score and individual items. Appl. Ergon.
76, 32–37. https://doi.org/10.1016/j.apergo.2018.11.005. February 2018.

Lins Et Al 2021

Uploaded by

Copyright:

Available Formats

You might also like

Lins Et Al 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lins Et Al 2021

Uploaded by

Copyright:

Available Formats

Applied Ergonomics 93 (2021) 103357

Contents lists available at ScienceDirect

OWAS inter-rater reliability

1. Introduction notes on standardized observation-sheets include REBA (Hignett and

The OWAS method is a comparatively simple assessment method and

without prior training in physical therapy (physiotherapeutic

This work is structured as follows: in Section 2 the OWAS method,

2. Material and methods

OWAS method. The constellation of the rater-group was altered per

1. A custom software (see Fig. 3) generates a random OWAS code using

2.3. Statistical analysis

After the study was completed, the paper-based observation sheets

3.2. Raters groups (PT-trained vs. PT-untrained) compared with

Table 6 3.3. 2-Rater compared with 3rd rater agreement

Rater 1 (Straight) 94% 1% (10) 3% (23) 1% (4) 4. Discussion

Table 11 incorrectly considered low-risk (risk class 1–2), although it should

3rd rater 1 97% (231) 2% (4) 0% (0) 0% (1) 1% (2) 0% (0)

Reference comparison κ = 0.85 (89%) κ = 0.98 (99%) κ = 0.85 (88%)

Although the participants’ work clothes were realistically chosen, the

References Limb Assessment (RULA) as a method of assessment of children’s computing posture.

You might also like