Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Expert Systems With Applications 207 (2022) 117925

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Deep transfer learning in sheep activity recognition using


accelerometer data
Natasa Kleanthous a, *, Abir Hussain a, b, Wasiq Khan a, Jennifer Sneddon c, Panos Liatsis d
a
School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool, Byrom Street, L3 3AF, United Kingdom
b
Department of Electrical Engineering, University of Sharjah, Sharjah, UAE
c
School of Biological and Environmental Sciences, Liverpool John Moores University, Liverpool, Byrom Street, L3 3AF, United Kingdom
d
Department of Electrical Engineering and Computer Science, Khalifa University of Science and Technology, Abu Dhabi, UAE

A R T I C L E I N F O A B S T R A C T

Keywords: Machine learning and sensor devices lined up with agriculture for the development of systems can efficiently
Accelerometer sensors provide real-time knowledge on animal behavior without the need of intense human observation, which is time
Animal activity recognition consuming and labor demanding. In this study, we propose an intelligent system to classify the three important
Convolutional neural networks
activities of sheep namely, “grazing”, “active”, and “inactive” states. We acquire primary data from Hebridean
Deep learning
Sheep behavior
ewes using two types of sensors to capture the required activities. To address the problem of sensor heterogeneity
Transfer learning in data and sensor orientation placement, we use convolutional neural networks (CNN) in conjunction with
hand-crafted features, to improve the model generalization, specifically in terms of sensor orientation and po­
sition. Additionally, we utilise transfer learning (TL) for the model generalisation, which indicates substantial
potential in future studies concerning animal activity recognition. More specifically, we use the TL to enable the
reusability of pre-trained model for purely unseen data without performing model training and data labelling,
which are highly time-consuming tasks. We performed experiments on datasets using CNN with automatic
learned features, and on datasets using additional hand-crafted features. Our method obtained an overall ac­
curacy of 98.55% on the source data (i.e., dataset captured via first sensor), and 96.59% on the target data (i.e.,
dataset captured via second sensor) when using the datasets which comprised the supplementary features. On the
other hand, when CNN was applied to the raw datasets without any additional features, an accuracy of 97.46%
and 94.79% was obtained on the source and target dataset respectively. This study is the first of its kind to
propose convolutional neural network-based TL for the sheep activity recognition and demonstrate the signifi­
cance of proposed approach in the context of data capturing, data labelling, and heterogeneity of sensor devices.

1. Introduction processing, and analysis of real-time behavioral data to efficiently


monitor and control animal behavior and distribution in pastures.
Advancements in the field of computer science and electronics en­ Having this vast amount of data, decisions about animal health, animal
gineering have paved the way for renewed interest in the use of Internet spatial distribution, and land utilization (Norton, Barnes, & Teague,
of Things (IoT) systems in efficient and automated agriculture (Rutter, 2013) can help prevent soil erosion and contamination, water pollution,
2017). Machine intelligence has been deployed in this domain to pro­ and the spread of animal diseases (Rutter, 2017), thereby facilitating
vide real-time information specifically related to animal activities. For animal welfare. Furthermore, automated monitoring of animals allows
instance, automation in real-time information processing has been used early detection of illness, particularly lameness, which is present in an
to identify the current position of animals, their activities and current estimated 80% of flocks (Winter, 2008). Lameness in sheep can be
state, where they mostly graze, and what their nutritional habits are identified based on behavioral changes in animals (Al-Rubaye, Al-
during the course of a day (Anderson, Estell, Holechek, Ivey, & Smith, Sherbaz, McCormick, & Turner, 2016; Barwick et al., 2018; Gougoulis,
2014). For this purpose, smart devices are essential in the collection, Kyriazakis, & Fthenakis, 2010). Additionally, evidence has shown that

* Corresponding author.
E-mail addresses: natasa.kleanthous@ku.ac.ae (N. Kleanthous), A.Hussain@ljmu.ac.uk, abir.hussain@sharjah.ac.ae (A. Hussain), W.Khan@ljmu.ac.uk (W. Khan),
J.C.Sneddon@ljmu.ac.uk (J. Sneddon), panos.liatsis@ku.ac.ae (P. Liatsis).

https://doi.org/10.1016/j.eswa.2022.117925
Received 29 November 2020; Received in revised form 27 May 2022; Accepted 18 June 2022
Available online 26 June 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

reduced animal activity or food intake may be an indicator of disease raw data from a single accelerometer to classify eight different activities
(Gougoulis et al., 2010). Therefore, machine intelligence-based moni­ (such as falling, jumping, walking, and running). The study also re­
toring of animals in real-time is in demand in sheep production systems. ported that DL outperformed conventional ML models that utilized
Grazing is of high interest in the sheep industry, representing the act hand-crafted features extracted from raw data. On the other hand
of feeding on growing grasses and herbage while standing or walking (Thakur & Biswas, 2021), used CNN and fusion of both handcrafted
with the head down. In contrast, the ‘inactive’ state of animals includes features and automatically extracted features for HAR to enhance the
lying down or standing still. Animal inactivity is an indicator of welfare performance of HAR. The results demonstrated that the proposed
as it is a sign of relaxation. Moreover, when animals are inactive (e.g., feature fusion based HAR model provides higher accuracy compared
standing or resting), they take time to ruminate, which is considered with state-of-the-art HAR models. Recently, (Wan, Qi, Xu, Tong, & Gu,
necessary and vital for animal welfare (Giovanetti et al., 2017). In this 2020) proposed HAR using smartphone sensor data and DL, where the
study, the active state is defined as the behavior when the animal is study utilized multiple models, including CNN, LSTM, and conventional
walking with the head up or scratching. The majority of existing studies ML algorithms (i.e., SVM and MLP) to recognize multiple activities
focus on the commercialization of smart devices, mainly in cattle; (lying, sitting, walking, etc.). The outcomes of this study indicate the
however, there are a limited number of studies on the commercialization superiority of DL, mainly in processing raw data without performing
of sheep monitoring devices owing to the lack of sufficient research on time-consuming feature extraction tasks. An up to date and detailed
sheep (Barwick et al., 2020). Some of these studies considered the review of HAR through DL algorithms that use data from mobile and
classification of sheep behavior, mostly using accelerometers; however, wearable sensor networks is presented in (Nweke, Teh, Al-garadi, & Alo,
there are no clear recommendations on the optimal deployment of such 2018) Likewise, (Chen, Zhang, Yao, Guo, & Yu, 2021) recently presented
devices to identify sheep behavior, which remains an open challenge a comprehensive overview of HAR employing DL and sensor based data,
(Radeski & Ilieski, 2017). with applications, challenges and potential research gaps. Both review
More importantly, sensor devices are often upgraded; however, there studies clearly highlight the usefulness and superiority of DL in a variety
is no evidence that signal processing and classification algorithms of applications and diverse data properties in the domain of HAR
designed and applied to data captured from one type of sensor can be re- Topham, Khan, Al-Jumeily, and Hussain (2022), presented a compre­
used on data from different types of sensors. Therefore, the imple­ hensive review of human pose estimation and gait identification using
mentation of a reusable technique is essential for the efficient use of time DL along with variety of potential applications.
and effort, which are mainly required for data gathering and labelling (i. The contributions of this research are as follows:
e., annotations) using similar methods. In this study, we propose a sys­
tem for the intelligent monitoring of sheep activity using accelerometer (1) A real-time data-driven approach for animal activity recognition
data that is robust to accelerometer specifications, position, and orien­ is proposed, comprising a) grazing, b) active, and c) inactive
tation. To address the problem of heterogeneity of accelerometer sen­ states, using a composite of the CNN and hand-crafted features,
sors, we used the transfer learning (TL) approach along with which significantly improves the classification performance.
convolutional neural networks (CNN) for sheep activity classification. (2) Two primary datasets were acquired using two different types of
Recently, TL has gained popularity and has been deployed in interdis­ sensors that were made publicly available to the research
ciplinary domains. This is mainly because the construction of annotated community.
datasets (specifically, in the context of big data and deep learning) is (3) We utilized deep TL over the data acquired from the two different
highly time consuming. The availability of sufficient data to train an ML sensors located on the collar of sheep in a non-fixed orientation.
model is a major challenge in several domains (Tan, Sun, Kong, Zhang, This introduces variability in the dataset (i.e., varying orientation
Yang, & Liu, 2018). TL resolves this challenge by enabling the exploi­ and sensor characteristics), and therefore supports the evaluation
tation of knowledge from existing datasets to produce pre-trained of the generalization properties of the proposed sheep activity
models for use across different application contexts, thereby recognition approach. To the best of our knowledge, the proposed
improving processing efficiency and generalization, enabling quick TL approach has been used for the first time in the context of
adaptation, and saving time, which traditional approaches need for animal behavior recognition.
initial training. Detailed discussions of TL and its applications can be
found elsewhere (Tan et al., 2018). In this study, the aim is to use the The remainder of this paper is organized as follows. Section 2 pro­
pre-trained model obtained from the source dataset (acquired via sensor vides an overview of state-of-the-art methods related to machine
1) and test it on a purely unseen target dataset (acquired via sensor 2) learning and deep learning for animal activity recognition using motion
using the learned features and weights. This allowed the validation of sensors. Section 3 describes the proposed activity recognition algorithm
the hypothesis in relation to model generalization in animal behavior and presents a description of the datasets and the system methodology.
classification, independent of the type and orientation of the acceler­ Section 4 presents the experimental design of the study. Section 5 pre­
ometer device. sents the results and discussion, followed by Section 6, which summa­
In sheep activity recognition, few studies have used CNN to classify rizes the conclusions of this study and proposes avenues for future work.
the activities of animals using accelerometer signals. This technique has
been widely used in human activity recognition (HAR) problems with 2. Review of data-driven animal activity recognition methods
significant success. Human activity recognition (HAR) is an adjacent
domain where diverse applications have been introduced, mainly In the agricultural industry, various studies have explored the
employing deep learning (DL) algorithms to leverage the excess data identification of animal activities based on accelerometer data (Barwick
available through various means. For example, (Ronao & Cho, 2016) et al., 2020; González, Bishop-Hurley, Handcock, & Crossman, 2015;
proposed HAR based on smartphone sensors and DL to classify various Kamminga, 2017; Le Roux, Wolhuter, Niesler, Roux, Wolhuter, Niesler,
activities (such as walking, sitting, and standing lying). A similar study is & Niesler, 2017). Wearable devices using accelerometers have been
presented (Hassan, Uddin, Mohamed, & Almogren, 2018) proposed HAR commonly used with ML techniques to recognize the behavior of cattle
using DL and smartphone sensor data. The study uses deep belief net­ (Andriamandroso et al., 2017; Dutta et al., 2015; González et al., 2015;
works to train over the features extracted from smartphone inertial Gou et al., 2019; Rahman et al., 2018; Riaboff et al., 2019; Robert,
sensors to recognize multiple activities, including standing, sitting, White, Renter, & Larson, 2009; Smith et al., 2016; Vázquez Diosdado
walking, etc., with transitional activities (e.g., lying sit to lying, stand to et al., 2015; Werner et al., 2019). Furthermore, ML was used to identify
sit, etc.). The reported outcomes indicate reliable recognition of human the activities of horses (Gutierrez-Galan et al., 2018), sharks (Hounslow,
activities in a given context. Chen and Xue (2016) used a CNN trained on Brewster, Lear, Guttridge, Daly, Whitney, & Gleiss, 2019), seals (Ladds,

2
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Thompson, Kadar, Slip, & D., P Hocking, D., G Harcourt, R., Harcourt, versus performance, reporting an accuracy of 94%.
R., 2017), goats (Kamminga et al., 2018; Navon, Mizrach, Hetzroni, & All the above studies attempt to address open challenges, such as
Ungar, 2013) and other domesticated or wild animals. variable sampling rate, sliding window size, sensor orientation, and data
Although many studies have addressed the classification of animal gathering procedures. Despite the wider applicability of TL within
behavior, specifically sheep, there is still a need for further investigation interdisciplinary application domains and specifically, in the fields of
on the optimization of the devices and techniques used (Barwick et al., object recognition (Xiao et al., 2020) and human activity recognition
2020; Kleanthous et al., 2022). Various studies have proposed diverse (Akbari & Jafari, 2019), very limited attempts have been reported on the
models, setups, and devices. For example, Umstatter et al., (Umstätter, use of TL for the animal behavior and activity recognition (J.W. Kam­
Waterhouse, & Holland, 2008) used tilt sensors to distinguish between minga H.C. Bisby D.V. Le N. Meratnia P.J.M. Havinga Generic Online
the active and inactive states of sheep using 30 s windows and achieved Animal Activity Recognition on Collar Tags 2017, 2017; Kleanthous
predictions of over 90% using linear discriminant analysis (LDA), de­ et al., 2022). Oquab et al., (Oquab, Bottou, Laptev, & Sivic, 2014) pro­
cision trees (DT), and threshold-based decision trees (Umstätter et al., posed the use of TL to extract mid-level features from the ImageNet
2008). On the other hand, Nadimi et al., used accelerometer data ac­ dataset (Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009) and reuse the
quired from the collar of sheep to classify behavior and achieved accu­ representations on smaller datasets. Similarly, Xia et al. (Xia et al., 2019)
racies of 83.8%, 83.2%, 73.8%, 71.8%, and 68.5% for grazing, lying, proposed ensemble concepts of multiple transfer CNNs (TCNN) to
walking, standing, and others, respectively (Nadimi, Jørgensen, Blanes- improve model generalization by introducing three ensemble TCNNs.
Vidal, & Christensen, 2012). Both (Umstätter et al., 2008) and (Nadimi The authors used several datasets and reported an enhanced accuracy.
et al., 2012) were concerned with animal behavior on pasture; however, They demonstrated better generalization over CNNs and a single TCNN.
the focus was very different, that is, from the type of sensor device used The latter research works demonstrate the importance of using TL in
to behaviors of interest. Furthermore, similar behaviors were investi­ utilizing knowledge acquired from one domain to another, which saves
gated by Marais et al. (2014), where accelerometer data were gathered time and supports the application of powerful models, specifically, in the
from the collar of sheep at a sample rate of 100 Hz. The authors case of limited-size datasets. Additionally, in the domain of animal ac­
extracted 10 features using a 5.12 s window. This study analyzed lying, tivity recognition and wearable devices in general, it provides oppor­
standing, walking, running, and grazing behaviors using LDA and tunities to explore a variety of sensor devices and easily adapt to new
quadradic discriminant analysis (QDA), achieving an overall accuracy of sensor configurations, building upon the time and effort spent on the
87.1% and 89.7% for LDA and QDA, respectively. LDA was also used by base work. In summary, TL supports the adaptation of new sensors in the
Solomon et al. to identify standing, walking, grazing, running, and lying problem domain without the need for exhaustive training in developing
down activities of sheep using a 5.3 s window, achieving an overall new predictive models. This research is the first of its kind to propose a
accuracy of 82.40% (le Roux, Marias, Wolhuter, & Niesler, 2017). CNN TL for animal activity recognition, specifically in sheep (Klean­
McLennan et al., used a 25 s window to identify low activity (i.e., lying thous et al., 2022), and highlights the benefits of such an approach in the
ruminating, lying), medium activity (i.e., standing, standing ruminating, context of sensor devices, that is, sensor heterogeneity, variations in
grazing), and high activity (i.e., walking) (McLennan et al., 2015). The sensor orientation, and data gathering.
authors gathered accelerometer data from the collars of sheep and
performed statistical analysis. The overall accuracies were 59.09%, 3. Methodology
3.37%, and 74.56% for high-, medium-, and low-activity behaviors,
respectively. Furthermore, (McLennan et al., 2015) investigated active Two types of accelerometer sensors (MetamotionR and SenseHat)
and inactive behaviors and reported classification accuracies of 79.98% were placed on the sheep collar to capture the primary data with a
and 74.56%, respectively, for active and inactive states. sample rate of 12.5 Hz. Let DS represent the data captured from meta­
Alvarenga et al. gathered accelerometer data from sheep while motionR, which is considered as the source data, whereas DT represents
placing the sensor at a fixed orientation and position on the halter under the data acquired through Raspberry Pi (with the SenseHat board
the jaw of the animals (Alvarenga et al., 2016). In their study, they attached) and is used to validate the reusability of TL, that is, the target
examined the performance of DT using various setups. For example, they data. Both datasets were labeled manually and normalized using z-
tested the algorithm using windows of 3, 5, and 10 s. Additionally, the scores. The CNN model was used to identify sheep activities while
accelerometer data were gathered at three different sample rates: 5 Hz, trained over the raw data as well as the DS+ and DT+ datasets, including
10 Hz, and 25 Hz. Five mutually exclusive behaviors were examined, the supplementary time and frequency domain features (as illustrated in
including grazing, lying, running, standing, and walking, using five Table 1) for the second experiment.
features. The study reported the best results based on a kappa score of It is well known that DL models can extract discriminating features
0.7935 and accuracy of 85.5%. from data; however, features derived from devices with limited re­
The same sensor placement was used by Giovanetti et al. (2017), who sources are not as discriminatory as predefined handcrafted features
applied stepwise discriminant analysis (SDA), canonical discriminant (Ravi, Wong, Lo, & Yang, 2016). DL models on wearable devices must
analysis (CDA), and discriminant analysis (DA) using 60 s windows to include fewer layers to reduce the computational cost and save battery
identify grazing, ruminating, and resting, and DA obtained an overall life. Thus, models with fewer layers may be unable to find a complete set
accuracy of 93%. Decandia et al., presented) evaluation of several of discriminatory features, which may result in poor performance.
window sizes to identify grazing, ruminating, and other sheep behaviors Additionally, DL might not be able to generalize the modalities of sensor
(Decandia et al., 2018). CDA and DA were applied on the 5, 10, 30, 60, data (i.e., location, orientation, noise, and amplitude) for classification
120, 180 and 300 s windows at a sample rate of 62.5 Hz. The authors (Ravi et al., 2017). Subsequently, in this study, we investigated the use
extracted 15 parameters from accelerometer data gathered from a sensor of CNN learned features with complementary information from hand­
placed under the jaw of the animals, and the best accuracy of 89.7% was crafted features to be used for classification and to test whether an
achieved using DA and a 30 s window. Kamminga et al., gathered improved performance can be achieved.
accelerometer, gyroscope, and magnetometer signals from sheep and Temporal and spectral features were extracted using a sliding win­
goats to classify stationary, foraging, walking, trotting, and running dow of 2 s with a 50% overlap, resulting in two additional datasets from
using one type of sensor (J.W. Kamminga H.C. Bisby D.V. Le N. Meratnia the metamotionR and SenseHat sensors, which are referred to as DS+ and
P.J.M. Havinga Generic Online Animal Activity Recognition on Collar DT+, respectively. It should be noted that DS+ and DT+ relate to the
Tags 2017, 2017). The authors measured the complexity in terms of augmented datasets, which include the handcrafted features, whereas
memory and CPU usage between numerous ML algorithms and noted DS and DT consist only of the x, y, and z accelerometer values and their
that the DNN classifier was the most promising in terms of complexity magnitude.

3
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Table 1 datasets. Let Dk represent the joint dataset, where Dk={ti, xi, yi, zi, ci}
List of extracted features. and i = 1,..,n, where n is the number of observations and k={S,T}.
# Feature name Equation Parameter t relates to the timestamp, whereas ×, y, and z are the
accelerometer measurements in the x-, y-, and z-axes, respectively, and c
1 Mean 1∑n
a =
n i=1
ai is the label variable, where c∈{active, grazing, inactive}. Fig. 2 shows
2 Standard
s =
1∑n
(ai − a)2 the distribution of the three activities within the datasets. The charts
Deviation n i=1 clearly indicate the imbalanced distribution of activities within both
3 Skewness 1∑n (ai − a )3
skewness = datasets, as expected, owing to the nature of the study and the activities
n i=1 s3
4 Kurtosis 1∑n ( ai − a )4
considered.
kurtosis =
n i=1 ̅ s4
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
5 RMS 1∑n
rms = ai 2 3.2. Data pre-processing
n i=1
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
6 RMS velocity 1∑n
rmsV = diffinv(ai )2 , diffinv() is the inverse The magnitude of the accelerometer measurements was calculated,
n i=1
function of the diff()

resulting in an additional feature in Dk = { ti, xi, yi, zi, magi, ci},
7 Sum of Changes soc = ni=1 diff(ai ) , where diff() computes the consecutive calculated using Equation (1):
differences of the vector √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
8 Mean of 1∑n ⃒⃒ ⃒ mag = x2 + y2 + z2 (1).where ×, y, and z represent the 3-dimen­
moc = diff(ai )⃒
Changess n i=1 sional accelerometer data. The dataset was normalized for zero mean (μ)
and a standard deviation (σ) of 1. The normalized datasets were then
∫T ∫T ∫T
9 Integrals Integrals = t=0 |a1 |dt + t=0 |a2 |dt + t=0 |a3 |dt
10 Squared
Integrals2 =
(∫
T
)2 ( ∫
T
)2
partitioned into training, validation, and testing, with a ratio of 50%,
t=0 |a1 |dt + t=0 |a2 |dt +
Integrals (∫ )2 25%, and 25%, respectively. Overlapping was used to enable real-time
T
t=0 |a3 |dt classification with a ratio of 50%, which has been shown to be effec­
11 Madogram [50] 1 tive in previous activity recognition studies (Bao & Intille, 2004). Using
γp (t) = E[ai − ai+u ] , where t=lag, E[.]=expectation
2∑
12 Energy energy = ni=1 ai 2 a larger window size in real-life classification could lead to mislabeling,
13 Peak Frequency (fs ) since animals may exhibit more than one behavior in a short time in­
pf = fmax = arg maxn− 1
i=0 P(i)
n terval. Thus, a 2 s window was considered sufficient, while not
fs = sampling frequency, P(i) = power of the spectrum compromising the device battery life. Where animals exhibited more
a = accelerometer signal x, y, and z where a1=x, a2=y, a3=z. than one behavior within the same time window, this was labelled with
n is the number of rows in the signal window the most frequently occurring activity.

Extensive simulation experiments were conducted on six CNN 3.3. Feature extraction
models trained using DS and DS+. The outcome of the simulations was
the selection of the top-performing CNN configuration based on the We performed temporal and spectral analyses of the overlapping
accuracy obtained from the test sets. These models were then stored, and window data to extract meaningful information from the datasets that
TL was used for DT and DT+. Fig. 1 illustrates the overall procedure, may support behavior recognition. The features were selected based on
including the application of TL. our previous findings (Kleanthous et al., 2018, 2019), as well as state-of-
the-art research in terms of feature importance using gait information in
3.1. Datasets human and animal activity recognition (Bouten, Westerterp, Verduin, &
Janssen, 1994; Gneiting, Ševčíková, & Percival, 2012; González et al.,
In this work, we collected two primary datasets comprising the 2015). A total of 13 features were calculated for each of the ×, y, and z
accelerometer measurements from a flock of 9 Hebridean ewes 35 ± 5 accelerometer data and the magnitude of the acceleration signal for each
kg, 9 ± 5 years old at a farm located in Cheshire Shotwick (OS location activity, resulting in a 52-dimensional feature set, as illustrated in
333781, 371970). Ethical approval to collect the datasets and conduct Table 1.
the experiments was obtained from Liverpool John Moores University We utilized the Boruta algorithm (Kursa, Jankowski, & Rudnicki,
(Ref: AH_NKO/2018-13). To acquire data, two types of devices were 2010) to explore and rank the level of importance of the extracted
mounted on the collar of the animals. The first device was mounted on a feature set and confirm that all features contribute meaningful infor­
fixed position (at 270◦ ) and orientation; however, the second device was mation, as shown in Fig. 3. It can be concluded that the mean x-axis
mounted in a non-fixed manner (at 0◦ , 90◦ , or 180◦ ) to test the perfor­ acceleration, skewness and kurtosis of the acceleration magnitude,
mance of the proposed methodology, independent of the sensor orien­ fractal dimension of the z-axis acceleration, and skewness of the y-axis
tation and position. MetamotionR (MBIENTLAB INC, 2018) was used to acceleration are the most significant features. On the contrary, the en­
collect motion and environmental data, whereas Raspberry Pi with the ergy, integrals, rms, peak frequency, and squared integrals of the y-axis
SenseHat board collected temperature, humidity, pressure, and motion acceleration are the features that are ranked the lowest. However, all
measurements. Both sensor outputs were logged, and the data were features contributed to the discrimination of the three activities, which
stored at a sampling rate of 12.5 Hz. In total, recordings of over 65 h of was confirmed by the Boruta algorithm. Therefore, feature elimination
activity were obtained, resulting in a dataset of 2,925,000 samples. was not performed.
The datasets collected from ewes were loaded into the ELA­ The features presented in Table 1 were extracted from the DS and DT
N_5.7_AVFX Freeware tool (ELAN - The Language Archive, n.d.). Be­ datasets which included the x, y, z and magnitude of the accelerometer
haviors such as running, fighting, and shaking were very limited and signals, resulting in newly created datasets DS+ and DT+. Then, the DS+
were, therefore, excluded from the datasets. Walking and scratching and DT+ which included the manually created features, are taken as
behaviors were merged and considered as a unified behavior, labelled as input to the CNN. An illustration of the procedure is shown in Fig. 4.
‘active’. Likewise, standing and resting were combined and labelled as
‘inactive’. Chewing with the animal head down while walking or 3.4. Convolutional neural networks and transfer learning
standing was labelled as ‘grazing’.
The dataset acquired with MetamotionR (DS) contained 1,048,575 The recorded measurements and associated features were used in the
samples, whereas the dataset captured via Raspberry Pi (DT) comprised proposed DL models to classify the target activities as well as during TL.
762,860 samples. We refer to these datasets as the source and target CNN is a hierarchical feed-forward neural network that is widely

4
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Fig. 1. System methodology.

Fig. 2. Duration of the activities for source and target domains.

5
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Fig. 3. Feature Importance illustration from the Boruta Algorithm for DS+. Note that features shadowMin, shadowMean, shadowMax are created by the Boruta
algorithm in order to rank the original features. More information can be found in (Kursa et al., 2010).

Fig. 4. The proposed CNN method using manually extracted features as input to CNN.

used in image classification tasks because of its ability to perform well responsible for the classification. For the fully connected layer to be
on very large and complex datasets (Gu et al., 2018; Wu, 2017). In recent stacked, the data must first be flattened into a one-dimensional vector.
years, CNN has also became popular in activity recognition problems Finally, the output layer computes the probability distribution of the
(San, Kakar, Li, Krishnaswamy, Yang, & Nguyen, 2017; Yang, Nguyen, predictions by using a SoftMax layer. To reduce the generalization error
San, Li, & Krishnaswamy, 2015). A CNN includes three types of layers: i. of the model, regularization techniques such as dropout, early stopping,
e., convolutional, pooling, and fully connected (dense). The convolu­ and weight decay are utilized in the proposed model. Dropout has been
tional layers were composed of filters and feature maps. The convolu­ proved to reduce and prevent neural networks from overfitting (Sri­
tional input layer is a 3-dimensional tensor, that includes filters. A filter vastava, Hinton, Krizhevsky, & Salakhutdinov, 2014) and has been
(F) is a matrix that slides over the input and generates dot products to considerably effective in image classification, and speech recognition
identify patterns. Once the features are extracted, an activation function (Cai, Shu, Chen, Ooi, Wang, Zhang Effective, & Dropout, 2019; Garbin,

is applied to the weighted sum m i=1 wi xi , to learn and make nonlinear Zhu, & Marques, 2020; Krizhevsky, Sutskever, & Hinton, 2017; Li, Chen,
decisions. We used the ReLu function, which is the most commonly used Hu, & Yang, 2019). On the other hand, (Ioffe & Szegedy, 2015)
activation function for similar problems. The pooling layer in the CNN demonstrated batch normalization (BN), that improved the training
reduces the blocks of data, whereas the fully connected layer is speed and stability of many architectures, and therefore has been

6
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

implemented in numerous network structures (Hu, 2022; Zhang, Zhou, few layers can limit the learning ability of the model and might result in
Lin, & Sun, 2017). Both dropout and BN are powerful techniques, underfitting. The selection of the number of convolutional layers and the
however, combining them together in a network structure, sometimes choice of hyperparameters in this study were based on the literature’s
lead to worse performance (Li et al., 2019). Therefore, for our experi­ best practices and empirical observations (Cruciani, Vafeiadis, Nugent,
ments, we conducted various pre-tests using dropout, batch normaliza­ Cleland, & McCullagh, 2019; Dahl, Sainath, & Hinton, 2013; Kingma &
tion, and a combination of both together, by “trial-and-error” method to Ba, 2014; Ravi et al., 2017; Srivastava et al., 2014).
identify which structure is more suitable. Based on the results we Each CNN model consists of convolutional layers defined with a filter
identified that dropout alone gave better performance results and size in the range of 9 to 128 (9, 16, 32, 64, 128) using various combi­
therefore we used dropout in our CNN architectures. Further details nations, as there is no one-fits-all solution. A kernel size of 2 was used
about the CNN can be found in (Aloysius & Geetha, 2018; Wu, 2017). because it proved sufficient in previous research on signal measure­
The main idea behind TL is to gain knowledge from a dataset (source ments (Cruciani et al., 2019; Ravi et al., 2017). The structure of each
domain DS), and then transfer knowledge to a new dataset (target model is presented in Table 2. The common configuration for all the
domain DT) in order to improve learning in the target domain (Weiss, models is as follows:
Khoshgoftaar, & Wang, 2016). Thus, we define a source domain DS:XS
→ YS, with feature space XS, and a label set YS such that DS = { (xi, yi)…, a) For each layer, the ReLu activation function was applied. The ReLu
(xn, yn)}, for i = 1,…,n, where n is the number of observations in the function is widely used and is the most popular activation function in
dataset and xi ∈ XS, yi ∈ YS. Additionally, we have a target domain DT, the deep learning field, as it proved to be highly successful in
XT → YT, with feature space XT, and a label space YT, such that DT = { (xj, improving training speed compared to the sigmoid function (Dahl
yj),…,(xm, ym)}, where xj ∈ XT, and yj ∈ YT, for j = 1,…,m, where m is the et al., 2013) while supporting discriminative performance (Kriz­
number of observations in the DT. Furthermore, a “task” (T) consists of hevsky et al., 2017).
label Y and predictive function f(), and is denoted as T = {Y, f(•)), which b) Smaller dropouts were applied to the convolution layers, whereas
can be learned from the training data and used to predict the labels of larger dropouts were applied to the fully connected layers. Specif­
unseen vectors. ically, a dropout rate of 0.5 was shown to be very effective in
In the case, where XS ∕ = XT, i.e., source and target datasets may come
from different domains, including different marginal/predictive distri­
butions and feature/label spaces, TL is described as heterogeneous.
Table 2
Alternatively, when XS = XT, TL is defined as being homogeneous. In the Configuration summary of the CNN models evaluated on DS and DS+.
current study, homogeneous TL was used because the feature space and
CNN Model A CNN Model B CNN Model C
domain characteristics from both the source and target domains were
the same. Variations in the source and target datasets in this study Layer Parameters Layer Parameters Layer Parameters
include the type of accelerometer sensors used as well as the orientation Conv2D F = 16, k= Conv2D F = 32, k= Conv2D F = 9, k=
and position of the sensors, which are considered challenging problems (2,2), ReLu (2,2), ReLu (2,2), ReLu
in this application domain (Qiu et al., 2022; Ribani & Marengoni, 2019; Dropout 10% Dropout 10% Dropout 10%
Conv2D F = 32, k= Conv2D F = 32, k= Conv2D F = 16, k=
Tan et al., 2018). In addition, the motion measurements of the second
(2,2), ReLu (2,2), ReLu (2,2), ReLu
sensor device exhibited increased levels of noise, which introduces Dropout 20% Dropout 10% Dropout 10%
additional challenges for the proposed TL model. Finally, the size of the Flatten Conv2D F = 32, k= Conv2D F = 32, k=
second dataset is comparatively small, which is useful for validating the (2,2), ReLu (2,2), ReLu
Dense F = 64, ReLu Dropout 10% Dropout 20%
appropriate use of TL in this context. As complex problems require larger
Dropout 50% Conv2D F = 32, k= Conv2D F = 128, k=
datasets, which, in turn, implies increased time resources for data (2,2), ReLu (2,2), ReLu
collection, the problem becomes even more challenging owing to the Dense F = 3, Dropout 10% Dropout 20%
common requirement for manual annotation to establish the ground SoftMax
truth. Specifically, in the case of labelling animal behavior, the process is Flatten Flatten
Dense F = 128, Dense F = 128,
extremely time consuming as it requires substantial amounts of time to
ReLu ReLu
observe the animals, video recording, time synchronization of the data Dropout 50% Dropout 50%
with the recordings, and labelling the behaviors per second, or even Dense F = 64, ReLu Dense F = 64, ReLu
milliseconds. Considering these challenges, utilizing TL not only im­ Dropout 50% Dropout 50%
Dense F = 3, Dense F = 3,
proves the model’s generalization and reliability but also reduces the
SoftMax SoftMax
time required for the training of the DL model in the target domain. In CNN Model D CNN Model E CNN Model F
other words, the pre-trained model from the source domain acts as a Layer Parameters Layer Parameters Layer Parameters
base (i.e., starting point) in the target domain with the obtained Conv2D F = 16, k= Conv2D F = 16, k= Conv2D F = 16, k=
knowledge to perform fine-tuning, which results in reducing the exis­ (2,2), ReLu (2,2), ReLu (2,2), ReLu
Dropout 10% Dropout 10% Dropout 10%
tence of two different problems (Ribani & Marengoni, 2019).
Conv2D F = 32, k= Conv2D F = 16, k= Conv2D F = 32, k=
(2,2), ReLu (2,2), ReLu (2,2), ReLu
4. Experimental design Dropout 20% Dropout 10% Dropout 20%
Conv2D F = 32, k= Flatten Conv2D F = 32, k=
(2,2), ReLu (2,2), ReLu
To set the baseline for our experiments, a number of classification
Dropout 20% Dense F = 64, Dropout 20%
trials were conducted to investigate the performance of the proposed SoftMax
methodology and configure the deep learning models. For the CNN, we Flatten Dropout 50% Flatten
used datasets DS and DS+, which included the original measurements Dense F = 64, ReLu Dense F = 3, Dense F = 128,
and handcrafted features, with 50%, 25%, and 25% ratios for training, SoftMax ReLu
Dropout 50% Dropout 50%
validation, and testing, respectively, for both datasets. Dense F = 3, Dense F = 64, ReLu
Six CNN configurations (models A to F) were utilized, where the SoftMax
number of convolutional layers and dropout rate were varied (in the Dropout 50%
range of [0.1, 0.5]). In trial-and-error testing, we chose convolution Dense F = 3,
SoftMax
layers in the range of 2–4, as too many layers will allow the model to
learn too much information and might result in overfitting; however, too Note: F ¼ filter, and k ¼ kernel size.

7
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

reducing overfitting in fully connected layers (Krizhevsky et al., second layers, respectively, to stabilize the learning. In the CNN model, a
2017). fully connected layer with 64 neurons followed two convolutional
c) For all the six CNN models, training was performed for a maximum of layers. The fully connected layer was again monitored by a 50%
500 epochs. Early stopping was applied when there was no decrease dropout, as this rate was shown to be successful in reducing overfitting
in the error during a period of 20 consecutive epochs (patience = 20). (Krizhevsky et al., 2017). The ReLU activation function was used for all
d) In terms of model optimization, the Adam optimizer (Kingma & Ba, convolutional layers, apart from the output layer, and the Adam opti­
2014) was applied with a learning rate of 0.001, and the loss function mizer with a learning rate of 0.001 was selected. Finally, the output
was set to categorical cross-entropy (Wu, 2017). The Adam optimizer layer uses the SoftMax classifier to classify active, grazing, and inactive
is simple and computationally efficient, that is, it requires low behaviors, yielding a probability distribution over the three classes. The
memory resources while demonstrating robustness in a broad range CNN was trained for a maximum of 500 epochs and was based on an
of ML optimization problems (Kingma & Ba, 2014). early stopping criterion (refer to Fig. 6).
e) A SoftMax layer was used to classify target activities (active, grazing,
and inactive). The SoftMax activation function is preferred for mul­ 5. Results and discussions
ticlass problems (Jiang & Yin, 2015). It converts this 3-length vector
into estimated posterior probabilities for the three classes. We conducted two sets of experiments using CNN model A because it
f) During the training, we used weight regularization with an L2 vector achieved the best results, while partitioning the datasets into 50%, 25%,
norm of [0.1, 0.01, 0.001, 0.0001, 0.00001] to identify the best and 25% for training, validation, and testing purposes, respectively. The
configuration. Vector norm was applied only to the fully connected first experiment used the original source and target datasets (DS and DT),
layers of the models. while the second experiment followed the same procedure as the first,
but this time we used the datasets (DS+ and DT+) comprising the hand-
The performance of the models for each configuration is listed in crafted features. The purpose is to investigate whether the addition of
Table 3. A graphical representation of the accuracy obtained from all the time- and frequency-domain features has a positive impact on the
models is illustrated in Fig. 5. performance of the model and the generalization of the algorithm,
From Table 3, we observe that Model A has the highest accuracy because the direction and placement of the accelerometers differ be­
when L2 = 0.00001, and therefore was selected for further use for TL tween the two sensor setups. For each experiment, several statistical
(the results and procedure are presented in Section 5). Model A consisted metrics, including precision, recall, F1 score, and accuracy, were used to
of two convolutional layers, one fully connected layer, and an output evaluate the performance of DL and TL over different combinations of
layer. First, the data are fed into the first convolutional layer, which training and testing datasets. Fig. 6 illustrates the architecture of the
consists of 16 convolutional filters, and the size of each convolution proposed model and the TL procedure.
kernel is 2. In the second convolutional layer, 32 filters were applied to
perform deeper feature extraction. The kernel size of the second layer is
5.1. Experiment A: Transfer learning from DS to DT
2. Both layers were followed by a 10% and 20% dropout on the first and
Model A was trained on the training set DtrS , validated on Dval S , and
tested on DtsS . For transfer learning, the trained model was stored such
Table 3
CNN Performance on DS and DS+ for each model using weight decay with the l2
that it could be later reused in the target domain, that is, DT. The only
norm. trainable layers during transfer learning on DtrT were the fully connected
layers, which were responsible for the classification of the target activ­
Accuracy
ities. The results obtained from both experiments are listed in Table 4.
DS DSþ Figs. 7 and 8 illustrate the accuracy and loss, respectively, of the model
Model L2 Training set Test set Training set Test set per epoch.
A 1.00E− 01 0.9600 0.9574 0.9564 0.9576 Table 4 indicates that the overall accuracy obtained was 97.46% for
A 1.00E− 02 0.9701 0.9652 0.9622 0.9624 DtsS and 94.79% for DtsT . The highest precision, recall, and F1 score on both
A 1.00E− 03 0.9753 0.9700 0.9716 0.9690 test sets is noticed on the ‘inactive’ behavior, having scores above
A 1.00E− 04 0.9776 0.9700 0.9787 0.9768
99.35%. The lowest recall was obtained in DtsT having 88.98% of the
A 1.00E− 05 0.9759 0.9746 0.9892 0.9855
B 1.00E− 01 0.975 0.9670 0.9799 0.9792 ‘active’ behavior, which is similar to DtsS , indicating that recall is 89.42%.
B 1.00E− 02 0.9807 0.9715 0.9850 0.9821 On the other hand, the precision of active behavior on both test sets was
B 1.00E− 03 0.9813 0.9684 0.9853 0.9816 higher than that of grazing behavior, with 95.05% on DtsS and 93.22% on
B 1.00E− 04 0.9800 0.9685 0.9832 0.9809
DtsT . The recall results for grazing behavior were 97.45% for DtsS and
B 1.00E− 05 0.9778 0.9677 0.9847 0.9806
C 1.00E− 01 0.9403 0.9365 0.8696 0.8697 94.17% for DtsT , respectively. The best predictive rate was achieved with
C 1.00E− 02 0.9606 0.9566 0.9578 0.9608 ‘inactive’ behavior. The F1 scores for DtsS were 92.15, 95.34, and 99.49%
C 1.00E− 03 0.9641 0.9607 0.9691 0.9700 for active, grazing, and inactive behaviors, respectively. For DtsT , the F1
C 1.00E− 04 0.9713 0.9659 0.9689 0.9702
C 1.00E− 05 0.9744 0.9658 0.9796 0.9796
scores for active and grazing behavior were 91.05, and 91.96%,
D 1.00E− 01 0.9535 0.9548 0.9622 0.9605 respectively, which were lower than the associated scores on DtsS .
D 1.00E− 02 0.9498 0.9479 0.9752 0.9733 However, inactive behavior on DtsT had an F1 score of 99.63%. Overall, it
D 1.00E− 03 0.9576 0.9545 0.9804 0.9781 can be observed that the model performed better on the source dataset in
D 1.00E− 04 0.9552 0.9501 0.9863 0.9824
D 1.00E− 05 0.9664 0.9580 0.9897 0.9841
all cases and the accuracy decreased, when the model was transferred to
E 1.00E− 01 0.7904 0.7932 0.9493 0.9485 the target dataset. The reason for this decrease may be that the orien­
E 1.00E− 02 0.9193 0.9210 0.9683 0.9675 tation of the sensor on the second device was not fixed, showing
E 1.00E− 03 0.9449 0.9453 0.9774 0.9743 different patterns.
E 1.00E− 04 0.9183 0.9187 0.9830 0.9798
On the other hand, multiple factors may contribute to the biased
E 1.00E− 05 0.9584 0.9532 0.9873 0.9821
F 1.00E− 01 0.9739 0.9710 0.9826 0.9789 performance of the model in the case of inactive behavior (i.e., higher
F 1.00E− 02 0.9790 0.9716 0.9877 0.9822 precision, recall, and F1 score than other classes). First, grazing behavior
F 1.00E− 03 0.9795 0.9726 0.9871 0.9815 can easily be misclassified as walking or scratching, as these behaviors
F 1.00E− 04 0.9734 0.9721 0.9833 0.9810 exhibit similar movements in some cases. Likewise, inactive behavior
F 1.00E− 05 0.9753 0.9730 0.9851 0.9829
can be easily classified, in contrast to active and grazing since the

8
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Fig. 5. Graphical representation of accuracy obtained from the CNN models trained on DtrS , DtrS+ , and tested DtsS , DtsS+ .

pattern does not exhibit changes and remains stable due to motionless behavior in Experiment B, in comparison with Experiment A, which
behavior. Finally, the distribution of the data samples for inactive achieved a recall of 89.42% in the source domain. The results from all
behavior was comparatively larger than that of the other two classes. behaviors for Experiment B in the source domain were comparatively
Thus, class imbalance may be one of the major causes of the model balanced. On the other hand, recall of active behavior in the target
performing better in the identification of inactive behaviors. domain was 87.12%, which was slightly decreased (i.e., by 1.8%)
compared to the recall on the target domain from experiment A. The F1
5.2. Experiment B: Transfer learning from DS+ to DT+ scores obtained on the source domain were 94.60%, 96.57%, and
99.91% for active, grazing, and inactive behaviors, respectively, which
Experiment B was identical to Experiment A, except for the datasets were also slightly higher than the F1 scores obtained in experiment A
that included handcrafted features (i.e., DS+ and DT+). Model A was over the source domain. On the other hand, testing the model on the
trained on DtrS+ , validated on Dval target domain showed a slight decrease in the F1 score (i.e., 1.05%) for
S+ , and then transfer learning was per­
active behavior, but an increase of 2.01% in grazing behavior. In both
formed on the target data, that is, DtrT+ . The final model was then tested
experiments, the F1 scores for inactive behavior were above 99.33% in
on the unseen data from DtsS+ and DtrT+ . As in Experiment A, only the fully
the target domain. Regarding grazing behavior, the precision and recall
connected layer was trained. The results obtained from both tests are
of the source domain were 96.46% and 96.69%, respectively. The pre­
presented in Table 5. Figs. 9 and 10 illustrate the accuracy and loss,
cision of grazing behavior on the target dataset was 92.48%, while the
respectively, of the model per epoch.
recall was 95.51%.
Table 5 presents the model performance over the datasets with the
handcrafted features, which indicate overall accuracies of 98.55% and
96.59% for DtsS+ and DtsT+ , respectively. These outcomes also align with 5.3. Comparison of classification results on raw datasets vs datasets
the results from Experiment A, where the accuracy decreases when the comprising hand-crafted features
model is transferred to the target domain. Likewise, the classification
accuracy of inactive behavior was comparatively better with precision The experimental results using the source dataset comprising the raw
and recall of 99.87% and 99.94%, respectively, for DtsS+ , and 99.17% and features (DS) and the results of the source dataset with supplementary
99.49% for DtsT+ , respectively. Precision and recall for active behavior in features (DS+) are presented in Fig. 11. The CNN using the handcrafted
the source domain achieved rates of 94.98% and 94.22%, respectively. features of the source dataset outperformed the CNN performance when
There was a noticeable increase of 4.8% in the recall result on active using only the automatically learned features from the raw dataset. This

9
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Fig. 6. CNN Architecture for the proposed model.

of grazing activity was approximately the same for both datasets, with
Table 4
less than a 1% difference in performance.
CNN model A classification results on DtsS , and using Transfer Learning on DtsT .
In Fig. 12, the precision and recall of the target dataset with and
DtsS DtsT without supplementary features are presented. The precision of the
Activities Accuracy: 0.9746 Accuracy: 0.9479 active class using the raw dataset (DT) and the dataset comprising the
Precision Recall F1 score Precision Recall F1 score handcrafted features (DT+) is approximately 93% in both situations,
Active 0.9505 0.8942 0.9215 0.9322 0.8898 0.9105 showing no improvement. However, similar to the source dataset, the
Grazing 0.9332 0.9745 0.9534 0.8986 0.9417 0.9196 grazing activity has a precision of 2.62% higher when using DT+ instead
Inactive 0.9963 0.9935 0.9949 0.9982 0.9943 0.9963 of DT. Again, the inactive class has precision and recall in the range of
Average 0.9600 0.9541 0.9566 0.9430 0.9419 0.9421 99.17%-99.82%.
A summary of the obtained results showing the overall accuracy and
signifies that the manually extracted features are more relevant for F1-score using the raw datasets and the datasets with manually extrac­
identifying the overall activities of sheep. From the chart in Fig. 11, it ted features is presented in Tables 6 and 7. From the overall results, it is
can be observed that the inactive class has a precision and recall above observed that the CNN achieved the best accuracy on both the source
99.35% in both situations (DS and DS+). On the other hand, the precision and target domains when using datasets with handcrafted features. The
of the active class was 0.07% (precision of 95.05%) higher when use of supplementary features resulted in an improvement in perfor­
applying the CNN to the DS, but the recall was 89.42% (4.8% lower). The mance of approximately 1% and 2% on the source and target datasets,
precision of grazing activity when using the dataset with the hand- respectively. Furthermore, the F1 score also improved for all three
crafted features was 96.46%, which was 3.24% higher than the results classes in the source dataset, with an increase of 2.45% for active, 1.23%
obtained from the raw dataset with the deep learning features. The recall for grazing, and 0.42% for inactive. In contrast, the F1 score of active

Fig. 7. Experiment A: CNN model A training on source domain DtrS .

10
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Fig. 8. Experiment A: CNN model A transfer learning on target domain DtrT .

achieved for grazing behavior when TL was applied to DtsT+ . However,


Table 5
when conducting experiment B on DtsS+ , F1 scores of 96.57% for grazing
CNN model A classification results on DtsS+ , and using Transfer Learning on.DtsT+ .
and 94.60% for active behaviors were obtained. These outcomes indi­
DtsS+ DtsT+
cate the superiority of the proposed model compared with previously
Activities Accuracy: 0.9855 Accuracy: 0.9659
published studies. For instance, (Fogarty, Swain, Cronin, Moraes, &
Precision Recall F1 score Precision Recall F1 score Trotter, 2020) indicated 69.8% and 45.2% precision performance for
Active 0.9498 0.9422 0.9460 0.9309 0.8712 0.9000 lying and standing behaviors, respectively, as compared to 99.87% and
99.17%, respectively, in the proposed study for the source and target
Grazing 0.9646 0.9669 0.9657 0.9248 0.9551 0.9397 domains to classify inactivity (lying, standing). Likewise, (Vázquez-
Inactive 0.9987 0.9994 0.9991 0.9917 0.9949 0.9933
Diosdado et al., 2019) and (Decandia et al., 2018) reported an overall
Average 0.9710 0.9695 0.9703 0.9491 0.9404 0.9443 accuracy of 85.18% and 89.7%, respectively, which is significantly
lower than that of the proposed model, which achieved accuracies in the
range of 96.59%-98.55%. However, it is important to note that the
behavior decreased by 1% when the model was tested in the target number of activities was different in these studies compared with the
domain; however, the grazing behavior classification increased by 2%. current work.
Regarding the training time, CNN training stopped at 140 epochs and As mentioned, active behavior is comparatively complex, comprising
started from 88% accuracy when using the raw data, in comparison with overlapping behaviors such as running, shaking, scratching, and
the CNN that was trained on the dataset comprising the extra features as walking, which exhibit more complex movements. For instance, Barwick
the training stopped at 70 epochs and started with an accuracy of reported poor classification (54%) for the standing activity, using collar
96.59%. data (Barwick, Lamb, Dobos, Welch, & Trotter, 2018). Likewise, [66]
The experiments also indicated high F1 scores for inactive behavior, indicated limited performance with accuracies of 69.8%, 45.2%, and
in the range of 99.33%-99.91%. The lowest F1 score (90.00%) was 25.1% for lying, standing, and walking, respectively. Based on these

Fig. 9. Experiment B: CNN model A training on source domain DtrS+

Fig. 10. Experiment B: CNN model A transfer learning on target domain DtrT+ .

11
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Fig. 11. Classification results on source dataset. Comparison of precision and recall on raw dataset (DS) vs dataset comprising the hand-crafted features (DS+) for
active, grazing, and inactive classes.

Fig. 12. Classification results on source dataset. Comparison of precision and recall on raw dataset (DT) vs dataset comprising the hand-crafted features (DT+) for
active, grazing, and inactive classes.

Table 6 Table 7
Overall Results on DS and DS+ for CNN (automatic vs hand-crafted features Results from Transfer Learning on DT and DT+ for CNN (automatic vs hand-
datasets, respectively). crafted features datasets, respectively).
Activity Experiment A Experiment B Activity Experiment A Experiment B

DtsS DtsS+ DtsT DtsT+


Active F1 score 92.15% 94.60% Active F1 score 91.05% 90.00%
Grazing F1 score 95.34% 96.57% Grazing F1 score 91.96% 93.97%
Inactive F1 score 99.49% 99.91% Inactive F1 score 99.63% 99.33%
Accuracy 97.46% 98.55% Accuracy 94.79% 96.59%

findings and expert recommendations, the proposed model integrated sensor attachment for data collection. For instance, (Barwick et al.,
lying and standing activities into a single behavior (i.e., inactive), which 2020) reported accuracies of 67%–88% for collar-based data, as
significantly improved the accuracy (over 99%). A similar work pro­ compared to 86–95% for ear-tag sensors. Indeed, Barwick (Barwick
posed by Umstater et al. (Umstätter et al., 2008) also indicated that there et al., 2018) obtained better results when the sensor was attached to the
are instances where walking and grazing show overlapping patterns ear tag of the animal, with prediction accuracies of 94%, 96%, and 99%
because sheep may graze while walking, which makes it more difficult for grazing, standing, and walking behaviors, respectively.
for ML models to distinguish between these two behaviors. The statistical results indicate the robustness of the CNN in terms of
Studies have also indicated performance variations with respect to generalization on unseen data, which supports its use in real-life

12
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

applications. In real-life applications and real-time decision making, the property, Willow Cottage South, located in Shotwick Village (Chester),
CNN model is very useful because it has the ability to automatically and allowing the use of her 9 Hebridean ewes for the experiments in this
extract features from raw sensor data while producing robust results, as study.
demonstrated in our experiments. When using CNN with hand-crafted
features in experiment B, we achieved improved results, and we show Availability of data
that the transfer learning application is more robust when compared to
experiment A. In other words, the use of CNN with handcrafted features The raw datasets DS and DT of this article are available in the GitHub
supports real-time operation in real-life scenarios of animal monitoring repository, https://github.com/nkleanthous2015/Sheep_activity_Data.
and warning generation.
In addition to reliable and efficient performance, the use of transfer References
learning is advantageous because of the reusability and generalization of
pre-trained models from other applications within similar domains on Akbari, A., & Jafari, R. (2019). In Transferring activity recognition models for new wearable
sensors with deep generative domain adaptation (pp. 85–96). New York, New York,
unseen datasets. Furthermore, as CNN performs better with larger USA: ACM Press. https://doi.org/10.1145/3302506.3310391.
datasets, transfer learning can be leveraged to provide a cost-effective Aloysius, N., & Geetha, M. (2018). In A review on deep convolutional neural networks (pp.
solution (with regard to time and resources) while reusing it for 588–592). IEEE. https://doi.org/10.1109/ICCSP.2017.8286426.
Al-Rubaye, Z., Al-Sherbaz, A., McCormick, W. D., & Turner, S. J. (2016). The use of
limited-size datasets. In this way, the new dataset can be used in transfer multivariable wireless sensor data to early detect lameness in sheep. School of Science
learning while using the knowledge acquired from the model trained in and Technology Annual Research Conference.
larger datasets. Alvarenga, F. A. P., Borges, I., Palkovič, L., Rodina, J., Oddy, V. H., & Dobos, R. C.
(2016). Using a three-axis accelerometer to identify and classify sheep behaviour at
pasture. Applied Animal Behaviour Science, 181, 91–99. https://doi.org/10.1016/j.
6. Conclusion and future directions applanim.2016.05.026
Anderson, D. M., Estell, R. E., Holechek, J. L., Ivey, S., & Smith, G. B. (2014). Virtual
herding for flexible livestock management - a review. The Rangeland Journal, 36(3),
In this study, the problem of sheep activity recognition was consid­
205–221. https://doi.org/10.1071/RJ13092
ered in the context of two sensor types and configurations. The first Andriamandroso, A. L. H., Lebeau, F., Beckers, Y., Froidmont, E., Dufrasne, I.,
sensor (MetamotionR) was placed in a fixed orientation and was char­ Heinesch, B., … Bindelle, J. (2017). Development of an open-source algorithm based
acterized by a lower noise density, whereas the second sensor had on inertial measurement units (IMU) of a smartphone to detect cattle grass intake
and ruminating behaviors. Computers and Electronics in Agriculture, 139, 126–137.
varying orientation and higher noise density. Our research investigated https://doi.org/10.1016/J.COMPAG.2017.05.020
the use of CNN and transfer learning in two problem settings. First, we Bao, L., & Intille, S. S. (2004). Activity recognition from user-annotated acceleration
applied a CNN and transfer learning to the original datasets. Next, we data. Pervasive Computing, 3001, 1–17. https://doi.org/10.1007/978-3-540-24646-6_
1
extracted a large number of temporal and spectral features, resulting in Barwick, J., Lamb, D., Dobos, R., Schneider, D., Welch, M., & Trotter, M. (2018).
two augmented datasets. The simulation studies and associated analysis Predicting lameness in sheep activity using tri-axial acceleration signals. Animals, 8
indicated that CNN and transfer learning can generate high accuracy in (1), 1–16. https://doi.org/10.3390/ani8010012
Barwick, J., Lamb, D. W., Dobos, R., Welch, M., Schneider, D., & Trotter, M. (2020).
terms of classifying three sets of activities: active, grazing, and inactive Identifying sheep activity from tri-axial acceleration signals using a moving window
behaviors. High-quality classification results were achieved in terms of classification model. Remote Sensing, 12(4), 646. https://doi.org/10.3390/
accuracy, F1 score, precision, and recall quality measures when rs12040646
Barwick, J., Lamb, D. W., Dobos, R., Welch, M., & Trotter, M. (2018). Categorising sheep
benchmarked with other results in the literature. It was observed that activity using a tri-axial accelerometer. Computers and Electronics in Agriculture, 145,
the inclusion of handcrafted features improved the performance of both 289–297. https://doi.org/10.1016/J.COMPAG.2018.01.007
the employed CNN and transfer learning solutions. Furthermore, the Bouten, C. V., Westerterp, K. R., Verduin, M., & Janssen, J. D. (1994). Assessment of
energy expenditure for physical activity using a triaxial accelerometer. Medicine and
simulation results demonstrated the advantage of using deep learning in
Science in Sports and Exercise, 26(12), 1516–1523.
terms of generalization, indicating its reusability when datasets are S. Cai Y. Shu G. Chen B.C. Ooi W. Wang M. Zhang Effective and Efficient Dropout for
limited in animal behavior recognition. Deep Convolutional Neural Networks 2019 https://doi.org/10.48550/
Future research will consider the development of a multifunctional arxiv.1904.03392.
Chen, Y., & Xue, Y. (2016). A Deep Learning Approach to Human Activity Recognition
virtual fencing system to control the spatial distribution of the animals Based on Single Accelerometer. In Proceedings - 2015 IEEE International Conference on
among the land they graze on using acoustic intermittent cues for the Systems, Man, and Cybernetics, SMC 2015 (pp. 1488–1492). IEEE. https://doi.org/
manipulation of their location. In this case, real-time monitoring and 10.1109/SMC.2015.263.
K. Chen D. Zhang L. Yao B. Guo Z. Yu Y. Liu January 21) 2021 Overview, challenges, and
smart border fencing have been proposed to automatically monitor the opportunities. ACM Computing Surveys Deep learning for sensor-based human
well-being and welfare of animals. activity recognition 10.1145/3447744.
F. Cruciani A. Vafeiadis C. Nugent I. Cleland P. McCullagh K. Votis … R. Hamzaoui
Comparing CNN and human crafted features for human activity recognition 2019
CRediT authorship contribution statement IEEE 960 967 10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00190.
Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for
Natasa Kleanthous: Conceptualization, Methodology, Visualiza­ LVCSR using rectified linear units and dropout. In 2013 IEEE International Conference
on Acoustics, Speech and Signal Processing (pp. 8609–8613). IEEE. https://doi.org/
tion, Investigation, Writing – original draft. Abir Hussain: Supervision, 10.1109/ICASSP.2013.6639346.
Writing – review & editing. Wasiq Khan: Writing – review & editing. Decandia, M., Giovanetti, V., Molle, G., Acciaro, M., Mameli, M., Cabiddu, A., …
Jennifer Sneddon: Writing – review & editing. Panos Liatsis: Super­ Dimauro, C. (2018). The effect of different time epoch settings on the classification
of sheep behaviour using tri-axial accelerometry. Computers and Electronics in
vision, Writing – review & editing.
Agriculture, 154, 112–119. https://doi.org/10.1016/j.compag.2018.09.002
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-
Declaration of Competing Interest scale hierarchical image database. In 2009 IEEE conference on computer vision and
pattern recognition (pp. 248–255).
Dutta, R., Smith, D., Rawnsley, R., Bishop-Hurley, G., Hills, J., Timms, G., & Henry, D.
The authors declare that they have no known competing financial (2015). Dynamic cattle behavioural classification using supervised ensemble
interests or personal relationships that could have appeared to influence classifiers. Computers and Electronics in Agriculture, 111, 18–28. https://doi.org/
the work reported in this paper. 10.1016/j.compag.2014.12.002
AN - The Language Archive. (n.d.). Retrieved December 16, 2019, from https://tla.mpi.
nl/tools/tla-tools/elan/.
Acknowledgments Fogarty, E. S., Swain, D. L., Cronin, G. M., Moraes, L. E., & Trotter, M. (2020). Behaviour
classification of extensively grazed sheep using machine learning. Computers and
Electronics in Agriculture, 169, Article 105175. https://doi.org/10.1016/j.
We would like to thank The Douglas Bomford Trust and Liverpool compag.2019.105175
John Moores university for funding this study. We would also like to
thank Dr. Jenny Sneddon for permitting access to her residential

13
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Garbin, C., Zhu, X., & Marques, O. (2020). Dropout vs. batch normalization: An empirical le Roux, S. P., Marias, J., Wolhuter, R., & Niesler, T. (2017). Animal-borne behaviour
study of their impact to deep learning. Multimedia Tools and Applications, 79(19–20), classification for sheep ({Dohne} Merino}) and {Rhinoceros ({Ceratotherium}
12777–12815. https://doi.org/10.1007/s11042-019-08453-9 simum and Diceros bicornis). Animal Biotelemetry, 5, 25. https://doi.org/10.1186/
Giovanetti, V., Decandia, M., Molle, G., Acciaro, M., Mameli, M., Cabiddu, A., … s40317-017-0140-0
Dimauro, C. (2017). Automatic classification system for grazing, ruminating and Li, X., Chen, S., Hu, X., & Yang, J. (2019). Understanding the disharmony between
resting behaviour of dairy sheep using a tri-axial accelerometer. Livestock Science, dropout and batch normalization by variance shift. In Proceedings of the IEEE
196, 42–48. https://doi.org/10.1016/j.livsci.2016.12.011 Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2019-
Gneiting, T., Ševčíková, H., & Percival, D. B. (2012). Estimators of fractal dimension: June, pp. 2677–2685). https://doi.org/10.1109/CVPR.2019.00279.
Assessing the roughness of time series and spatial data. Statistical Science, 247–277. Marais, J., Wolhuter, R., Niesler, T., Le Roux, S., Wolhuter, R., Niesler, T., … Niesler, T.
González, L. A., Bishop-Hurley, G. J., Handcock, R. N., & Crossman, C. (2015). (2014). Automatic classification of sheep behaviour using 3-axis accelerometer data.
Behavioral classification of data from collars containing motion sensors in grazing Pattern Recognition Association of South Africa.
cattle. Computers and Electronics in Agriculture, 110, 91–102. https://doi.org/ MBIENTLAB INC. (2018). MetaMotionR – MbientLab. Retrieved December 15, 2019,
10.1016/j.compag.2014.10.018 from https://mbientlab.com/metamotionr/.
González, S., Sedano, J., Villar, J. R., Corchado, E., Herrero, Á., & Baruque, B. (2015). McLennan, K. M., Skillings, E. A., Rebelo, C. J. B. B., Corke, M. J., Pires Moreira, M. A.,
Features and models for human activity recognition. Neurocomputing, 167, 52–60. Morton, A. J., & Constantino-Casas, F. (2015). Technical note: Validation of an
https://doi.org/10.1016/j.neucom.2015.01.082 automatic recording system to assess behavioural activity level in sheep (Ovis aries).
Gou, X., Tsunekawa, A., Peng, F., Zhao, X., Li, Y., & Lian, J. (2019). MEthod for Small Ruminant Research, 127, 92–96. https://doi.org/10.1016/j.
classifying behavior of livestock on fenced temperate Rangeland in Northern China. smallrumres.2015.04.002
Sensors (Switzerland), 19(23), 1–15. https://doi.org/10.3390/s19235334 Nadimi, E. S., Jørgensen, R. N., Blanes-Vidal, V., & Christensen, S. (2012). Monitoring
Gougoulis, D. A., Kyriazakis, I., & Fthenakis, G. C. (2010). Diagnostic significance of and classifying animal behavior using ZigBee-based mobile ad hoc wireless sensor
behaviour changes of sheep: A selected review. Small Ruminant Research, 92(1), networks and artificial neural networks. Computers and Electronics in Agriculture, 82
52–56. https://doi.org/10.1016/j.smallrumres.2010.04.018 (Supplement C), 44–54. https://doi.org/10.1016/j.compag.2011.12.008
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., … Chen, T. (2018). Recent Navon, S., Mizrach, A., Hetzroni, A., & Ungar, E. D. (2013). Automatic recognition of jaw
advances in convolutional neural networks. Pattern Recognition, 77, 354–377. movements in free-ranging cattle, goats and sheep, using acoustic monitoring.
https://doi.org/10.1016/J.PATCOG.2017.10.013 Biosystems Engineering, 114(4), 474–483. https://doi.org/10.1016/j.
Gutierrez-Galan, D., Dominguez-Morales, J. P., Cerezuela-Escudero, E., Rios-Navarro, A., biosystemseng.2012.08.005
Tapiador-Morales, R., Rivas-Perez, M., … Linares-Barranco, A. (2018). Embedded Norton, B. E., Barnes, M., & Teague, R. (2013). Grazing Management Can Improve
neural network for real-time animal behavior classification. Neurocomputing, 272, Livestock Distribution: Increasing accessible forage and effective grazing capacity.
17–26. https://doi.org/10.1016/j.neucom.2017.03.090 Rangelands, 35(5), 45–51. https://doi.org/10.2111/RANGELANDS-D-13-00016.1
Hassan, M. M., Uddin, M. Z., Mohamed, A., & Almogren, A. (2018). A robust human Nweke, H. F., Teh, Y. W., Al-garadi, M. A., & Alo, U. R. (2018). Deep learning algorithms
activity recognition system using smartphone sensors and deep learning. Future for human activity recognition using mobile and wearable sensor networks: State of
Generation Computer Systems, 81, 307–313. https://doi.org/10.1016/j. the art and research challenges. Expert Systems with Applications. https://doi.org/
future.2017.11.029 10.1016/j.eswa.2018.03.056
Hounslow, J. L. L., Brewster, L. R. R., Lear, K. O. O., Guttridge, T. L. L., Daly, R., Whitney, Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level
N. M. M., & Gleiss, A. C. C. (2019). Assessing the effects of sampling frequency on image representations using convolutional neural networks. Proceedings of the IEEE
behavioural classification of accelerometer data. Journal of Experimental Marine Computer Society Conference on Computer Vision and Pattern Recognition, 1717–1724.
Biology and Ecology, 512(December 2018), 22–30. https://doi.org/10.1016/j. https://doi.org/10.1109/CVPR.2014.222
jembe.2018.12.003. Qiu, S., Zhao, H., Jiang, N., Wang, Z., Liu, L., An, Y., … Fortino, G. (2022). Multi-sensor
Hu, Z. (2022). A web application for crowd counting by building parallel and direct information fusion based on machine learning for real applications in human activity
connection-based CNN architectures. Cognitive Systems and Signal Processing in Image recognition: State-of-the-art and research challenges. Information Fusion, 80,
Processing, 47–82. https://doi.org/10.1016/B978-0-12-824410-4.00012-X 241–265. https://doi.org/10.1016/J.INFFUS.2021.11.006
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training Radeski, M., & Ilieski, V. (2017). Gait and posture discrimination in sheep using a tri-
by reducing internal covariate shift. In F. Bach, & D. Blei (Eds.), 32nd International axial accelerometer. Animal, 11(7), 1249–1257. https://doi.org/10.1017/
Conference on Machine Learning, ICML 2015 (Vol. 1, pp. 448–456). Lille, France: S175173111600255X
PMLR. Rahman, A., Smith, D. V., Little, B., Ingham, A. B., Greenwood, P. L., & Bishop-
Jiang, W., & Yin, Z. (2015). Human Activity Recognition Using Wearable Sensors by Hurley, G. J. (2018). Cattle behaviour classification from collar, halter, and ear tag
Deep Convolutional Neural Networks. In Proceedings of the 23rd ACM international sensors. Information Processing in Agriculture. https://doi.org/10.1016/j.
conference on Multimedia - MM ’15 (pp. 1307–1310). https://doi.org/10.1145/ inpa.2017.10.001
2733373.2806333. Ravi, D., Wong, C., Lo, B., & Yang, G. Z. (2016). Deep learning for human activity
Kamminga, J. W. (2017). Generic online animal activity recognition on collar tags. Data recognition: A resource efficient implementation on low-power devices. In In BSN
Archiving and Networked Services (DANS). 2016–13th Annual Body Sensor Networks Conference. https://doi.org/10.1109/
J.W. Kamminga H.C. Bisby D.V. Le N. Meratnia P.J.M. Havinga Generic Online Animal BSN.2016.7516235
Activity Recognition on Collar Tags 2017 ACM New York, NY, USA 597 606 Ravi, D., Wong, C., Lo, B., Yang, G.-Z.-Z., Ravì, D., Wong, C., … Yang, G.-Z.-Z. (2017).
10.1145/3123024.3124407. A deep learning approach to on-node sensor data analytics for mobile or wearable
Kamminga, J. W., Le, D. V., Meijers, J. P., Bisby, H., Meratnia, N., & Havinga, P. J. M. devices. IEEE Journal of Biomedical and Health Informatics, 21(1), 56–64. https://doi.
(2018). Robust sensor-orientation-independent feature selection for animal activity org/10.1109/JBHI.2016.2633287
recognition on collar tags. Proceedings of the ACM on Interactive, Mobile, Wearable and Riaboff, L., Aubin, S., Bédère, N., Couvreur, S., Madouasse, A., Goumand, E., …
Ubiquitous Technologies, 2(1), 1–27. https://doi.org/10.1145/3191747 Plantier, G. (2019). Evaluation of pre-processing methods for the prediction of cattle
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. https://doi. behaviour from accelerometer data. Computers and Electronics in Agriculture, 165,
org/10.48550/arxiv.1412.6980. Article 104961. https://doi.org/10.1016/j.compag.2019.104961
Kleanthous, N., Hussain, A., Mason, A., Sneddon, J., Shaw, A., Fergus, P., … Al-Jumeily, Ribani, R., & Marengoni, M. (2019). In A Survey of Transfer Learning for Convolutional
D. (2018). Machine Learning Techniques for Classification of Livestock Behavior. In Neural Networks (pp. 47–57). IEEE. https://doi.org/10.1109/SIBGRAPI-
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial T.2019.00010.
Intelligence and Lecture Notes in Bioinformatics) (Vol. 11304 LNCS, pp. 304–315). Robert, B., White, B. J. J., Renter, D. G. G., & Larson, R. L. L. (2009). Evaluation of three-
https://doi.org/10.1007/978-3-030-04212-7_26. dimensional accelerometers to monitor and classify behavior patterns in cattle.
Kleanthous, N., Hussain, A., Mason, A., & Sneddon, J. (2019). Data Science Approaches Computers and Electronics in Agriculture, 67(1–2), 80–84. https://doi.org/10.1016/j.
for the Analysis of Animal Behaviours. In Lecture Notes in Computer Science (including compag.2009.03.002
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. Ronao, C. A., & Cho, S. B. (2016). Human activity recognition with smartphone sensors
11645 LNAI, pp. 411–422). https://doi.org/10.1007/978-3-030-26766-7_38. using deep learning neural networks. Expert Systems with Applications, 59, 235–244.
Kleanthous, N., Hussain, A. J., Khan, W., Sneddon, J., Al-Shamma’a, A., & Liatsis, P. https://doi.org/10.1016/j.eswa.2016.04.032
(2022). A survey of machine learning approaches in animal behaviour. Rutter, S. M. (2017). Advanced livestock management solutions. In D. M. Ferguson,
Neurocomputing. https://doi.org/10.1016/j.neucom.2021.10.126 C. Lee, & A. Fisher (Eds.), Advances in Sheep Welfare (pp. 245–261). Woodhead
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep Publishing. https://doi.org/10.1016/B978-0-08-100718-1.00013-3.
convolutional neural networks. Communications of the ACM, 60(6), 84–90. https:// San, P. P., Kakar, P., Li, X. L., Krishnaswamy, S., Yang, J. B., & Nguyen, M. N. (2017).
doi.org/10.1145/3065386 Deep Learning for Human Activity Recognition. In Big Data Analytics for Sensor-
Kursa, M. B., Jankowski, A., & Rudnicki, W. R. (2010). Boruta – A System for Feature Network Collected Intelligence (pp. 186–204). https://doi.org/10.1016/B978-0-12-
Selection. Fundamenta Informaticae, 101(4), 271–285. https://doi.org/10.3233/FI- 809393-1.00009-X.
2010-288 Smith, D., Rahman, A., Bishop-Hurley, G. J., Hills, J., Shahriar, S., Henry, D., &
Ladds, M. A., Thompson, A. P., Kadar, J.-P.-P., Slip, J., & D., P Hocking, D., G Harcourt, Rawnsley, R. (2016). Behavior classification of cows fitted with motion collars:
R., Harcourt, R.. (2017). Super machine learning: Improving accuracy and reducing Decomposing multi-class classification into a set of binary problems. Computers and
variance of behaviour classification from accelerometry. Animal Biotelemetry, 5(1), 8. Electronics in Agriculture, 131, 40–50.
https://doi.org/10.1186/s40317-017-0123-1 Srivastava, N., Hinton, G., Krizhevsky, A., & Salakhutdinov, R. (2014). Dropout: A simple
Le Roux, S., Wolhuter, R., Niesler, T., Roux, S., Wolhuter, R., Niesler, T., … Niesler, T. way to prevent neural networks from overfitting. Journal of Machine Learning
(2017). An Overview of Automatic Behaviour Classification for Animal-Borne Sensor Research, 15.
Applications in South Africa. https://doi.org/10.1145/3132711.3132716. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep
transfer learning. In Lecture Notes in Computer Science (including subseries Lecture

14
N. Kleanthous et al. Expert Systems With Applications 207 (2022) 117925

Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11141 LNCS, Weiss, K., Khoshgoftaar, T. M., & Wang, D. D. (2016). A survey of transfer learning.
pp. 270–279). https://doi.org/10.1007/978-3-030-01424-7_27. Journal of Big Data, 3(1), 9. https://doi.org/10.1186/s40537-016-0043-6
Thakur, D., & Biswas, S. (2021). Feature fusion using deep learning for smartphone based Werner, J., Umstatter, C., Leso, L., Kennedy, E., Geoghegan, A., Shalloo, L., … O’Brien, B.
human activity recognition. International Journal of Information Technology (2019). Evaluation and application potential of an accelerometer-based collar device
(Singapore), 13(4), 1615–1624. https://doi.org/10.1007/s41870-021-00719-6 for measuring grazing behavior of dairy cows. Animal, 13(9), 1–10. https://doi.org/
Topham, L., Khan, W., Al-Jumeily, D., & Hussain, A. J. (2022). Human body pose 10.1017/S1751731118003658
estimation for gait identification: A comprehensive survey of datasets and models. Winter, A. C. (2008). Lameness in sheep. Small Ruminant Research, 76(1–2), 149–153.
ACM Computing Surveys. https://doi.org/10.1145/3533384 https://doi.org/10.1016/j.smallrumres.2007.12.008
Umstätter, C., Waterhouse, A., & Holland, J. P. (2008). An automated sensor-based Wu, J. (2017). Introduction to convolutional neural networks. Introduction to
method of simple behavioural classification of sheep in extensive systems. Computers Convolutional Neural Networks, 1–31.
and Electronics in Agriculture, 64(1), 19–26. https://doi.org/10.1016/j. Xia, S., Xia, Y., Yu, H., Liu, Q., Luo, Y., Wang, G., & Chen, Z. (2019). Transferring
compag.2008.05.004 ensemble representations using deep convolutional neural networks for small-scale
Vázquez Diosdado, J. A., Barker, Z. E., Hodges, H. R., Amory, J. R., Croft, D. P., image classification. IEEE Access, 7, 168175–168186. https://doi.org/10.1109/
Bell, N. J., & Codling, E. A. (2015). Classification of behaviour in housed dairy cows ACCESS.2019.2912908
using an accelerometer-based activity monitoring system. Animal Biotelemetry, 3(1), Xiao, G., Wu, Q., Chen, H., Da, D., Guo, J., & Gong, Z. (2020). A deep transfer learning
15. https://doi.org/10.1186/s40317-015-0045-8 solution for food material recognition using electronic scales. IEEE Transactions on
Vázquez-Diosdado, J. A., Paul, V., Ellis, K. A., Coates, D., Loomba, R., & Kaler, J. (2019). Industrial Informatics, 16(4), 2290–2300. https://doi.org/10.1109/TII.2019.2931148
A combined offline and online algorithm for real-time and long-term classification of Yang, J. B., Nguyen, M. N., San, P. P., Li, X. L., & Krishnaswamy, S. (2015). Deep
sheep behaviour: Novel approach for precision livestock farming. Sensors convolutional neural networks on multichannel time series for human activity
(Switzerland), 19(14). https://doi.org/10.3390/s19143201 recognition. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2015-
Wan, S., Qi, L., Xu, X., Tong, C., & Gu, Z. (2020). Deep learning models for real-time Janua, pp. 3995–4001).
human activity recognition with smartphones. Mobile Networks and Applications, 25 Zhang, X., Zhou, X., Lin, M., & Sun, J. (2017). ShuffleNet: An Extremely Efficient
(2), 743–755. https://doi.org/10.1007/s11036-019-01445-x Convolutional Neural Network for Mobile Devices. https://doi.org/10.48550/
arxiv.1707.01083.

15

You might also like