Professional Documents
Culture Documents
Data Management Analytics and Innovation Proceedings of Icdmai 2021 Volume 2 1St Edition Neha Sharma Online Ebook Texxtbook Full Chapter PDF
Data Management Analytics and Innovation Proceedings of Icdmai 2021 Volume 2 1St Edition Neha Sharma Online Ebook Texxtbook Full Chapter PDF
https://ebookmeta.com/product/proceedings-of-data-analytics-and-
management-icdam-2021-volume-1-deepak-gupta/
https://ebookmeta.com/product/data-analytics-and-management-
proceedings-of-icdam-ashish-khanna-editor/
https://ebookmeta.com/product/security-privacy-and-data-
analytics-select-proceedings-of-ispda-2021-udai-pratap-rao/
https://ebookmeta.com/product/machine-learning-and-big-data-
analytics-proceedings-of-international-conference-on-machine-
learning-and-big-data-analytics-icmlbda-2021-1st-edition-rajiv-
Artificial Intelligence: Theory and Applications:
Proceedings of AITA 2023, Volume 2 1st Edition Harish
Sharma
https://ebookmeta.com/product/artificial-intelligence-theory-and-
applications-proceedings-of-aita-2023-volume-2-1st-edition-
harish-sharma/
https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
https://ebookmeta.com/product/proceedings-of-international-
conference-on-data-science-and-applications-
icdsa-2021-volume-1-1st-edition-mukesh-saraswat/
Neha Sharma
Amlan Chakrabarti
Valentina Emilia Balas
Alfred M. Bruckstein Editors
Data Management,
Analytics and
Innovation
Proceedings of ICDMAI 2021, Volume 2
Lecture Notes on Data Engineering
and Communications Technologies
Volume 71
Series Editor
Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data
technologies and communications. It will publish latest advances on the engineering
task of building and deploying distributed, scalable and reliable data infrastructures
and communication systems.
The series will have a prominent applied focus on data technologies and
communications with aim to promote the bridging from fundamental research on
data science and networking to data engineering and communications that lead to
industry products, business knowledge and standardisation.
Indexed by SCOPUS, INSPEC, EI Compendex.
All books published in the series are submitted for consideration in Web of Science.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
and Innovation, Brazil; Yogesh Kulkarni, Principal Architect (CTO Office), Icertis,
Pune; Dr. Aloknath De, Corporate Vice President of Samsung Electronics, South
Korea, and Chief Technology Officer of Samsung R&D Institute India, Bangalore;
Sourabh Mukherjee, Vice President, Data and Artificial Intelligence Group, Accen-
ture; Pallab Dasgupta, Professor, Department of Computer Science and Engineering,
IIT Kharagpur; and Alfred M. Bruckstein, Technion—Israel Institute of Technology,
Faculty of Computer Science, Israel. Pre-conference was conducted by Dipanjan
(DJ) Sarkar, Data Science Lead at Applied Materials; Usha Rengaraju, Polymath
and India’s first women Kaggle Grandmaster; Avni Gupta, Senior Data Analyst—
IoT, Netradyne; Kranti Athalye, Sr. Manager, University Relations, IBM; Sonali
Dey, Business Operations Manager, IBM; Amol Dhondse, Senior Technical Staff
Member, IBM; and Vandana Verma Sehgal, Security Solutions Architect, IBM. All
the experts took the participants through various perspectives of data and analytics.
The force behind organizing ICDMAI 2021 was of the general chair Dr. P. K.
Sinha, Vice-Chancellor and Director, IIIT, New Raipur; Prof. Amol Goje, Pres-
ident, S4DS; Prof. Amlan Charabarti, Vice President, S4DS; Dr. Neha Sharma,
Secretary, S4DS; Executive Body Members of S4DS—Dr. Inderjit Barara, Dr.
Saptarsi Goswami, Mr. Atul Benegiri and all the superactive volunteers. There
was a strong support from our technical partner—IBM, knowledge partner—
Wizer, academic partners—IIT Guwahati and NIT Durgapur and publication partner
Springer. Through this conference, we could build the strong data science ecosystem.
Our special thanks go to Fatos Xhafa, Technical University of Catalonia,
Barcelona, Spain (Series Editor, Springer, Lecture Notes on Data Engineering
and Communications Technologies) for the opportunity to organize this guest-
edited volume. We are grateful to Springer, especially to Mr. Aninda Bose (Senior
Publishing Editor, Springer India Pvt. Ltd.), for the excellent collaboration, patience
and help during the evolvement of this volume.
We are confident that the volumes will provide state-of-the-art information to
professors, researchers, practitioners and graduate students in the areas of data
management, analytics and innovation, and all will find this collection of papers
inspiring and useful.
Track I
Simulation of Lotka–Volterra Equations Using Differentiable
Programming in Julia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Ankit Roy
Feature Selection Strategy for Multi-residents Behavior Analysis
in Smart Home Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
John W. Kasubi and D. H. Manjaiah
A Comparative Study on Self-learning Techniques for Securing
Digital Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Dev Kumar, Shruti Kumar, and Vidhi Khathuria
An Intelligent, Geo-replication, Energy-Efficient BAN Routing
Algorithm Under Framework of Machine Learning and Cloud
Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Annwesha Banerjee Majumder, Sourav Majumder, Somsubhra Gupta,
and Dharmpal Singh
New Credibilistic Real Option Model Based
on the Pessimism-Optimism Character of a Decision-Maker . . . . . . . . . . . 55
Irina Georgescu, Jani Kinnunen, and Mikael Collan
Analysis of Road Accidents in India and Prediction of Accident
Severity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Sajal Jain, Shrivatsa Krishna, Saksham Pruthi, Rachna Jain,
and Preeti Nagrath
Mining Opinion Features and Sentiment Analysis with Synonymy
Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Sourya Chatterjee and Saptarsi Goswami
vii
viii Contents
Track II
Fake News Detection: Experiments and Approaches Beyond
Linguistic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Shaily Bhatt, Naman Goenka, Sakshi Kalra, and Yashvardhan Sharma
Object Recognition and Classification for Robotics Using
Virtualization and AI Acceleration on Cloud and Edge . . . . . . . . . . . . . . . . 129
Aditi Patil and Nida Sahar Rafee
Neural Networks Application in Predicting Stock Price of Banking
Sector Companies: A Case Study Analysis of ICICI Bank . . . . . . . . . . . . . 141
T. Ananth Narayan
Epilepsy Seizure Classification Using One-Dimensional
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Gautam Manocha, Harit Rustagi, Sang Pri Singh, Rachna Jain,
and Preeti Nagrath
Syntactic and Semantic Knowledge-Aware Paraphrase Detection
for Clinical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Sudeshna Jana, Abir Naskar, Tirthankar Dasgupta, and Lipika Dey
Enhanced Behavioral Cloning-Based Self-driving Car Using
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Uppala Sumanth, Narinder Singh Punn, Sanjay Kumar Sonbhadra,
and Sonali Agarwal
Early Detection of Parkinson’s Disease Using Computer Vision . . . . . . . . 199
Sabina Tandon and Saurav Verma
Sense the Pulse: A Customized NLP-Based Analytical Platform
for Large Organization—A Data Maturity Journey at TCS . . . . . . . . . . . . 209
Chetan Nain, Ankit Dwivedi, Rishi Gupta, and Preeti Ramdasi
Track III
Fact-Finding Knowledge-Aware Search Engine . . . . . . . . . . . . . . . . . . . . . . . 225
Sonam Sharma
Automated Data Quality Mechanism and Analysis
of Meteorological Data Obtained from Wind-Monitoring
Stations of India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Y. Srinath, Krithika Vijayakumar, S. M. Revathy, A. G. Rangaraj,
N. Sheelarani, K. Boopathi, and K. Balaraman
Contents ix
Track IV
A Survey on Energy-Efficient Task Offloading and Virtual
Machine Migration for Mobile Edge Computation . . . . . . . . . . . . . . . . . . . . 333
Vaishali Joshi and Kishor Patil
Quantitative Study on Barriers of Adopting Big Data Analytics
for UK and Eire SMEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
M. Willetts, A. S. Atkins, and C. Stanier
Post-quantum Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
Sawan Bhattacharyya and Amlan Chakrabarti
A Comprehensive Study of Security Attack on VANET . . . . . . . . . . . . . . . . 407
Shubha R. Shetty and D. H. Manjaiah
Developing Business-Business Private Block-Chain Smart
Contracts Using Hyper-Ledger Fabric for Security, Privacy
and Transparency in Supply Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
B. R. Arun Kumar
Data-Driven Frameworks for System Identification of a Steam
Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Nivedita Wagh and S. D. Agashe
Track V
An Efficient Obstacle Detection Scheme for Low-Altitude UAVs
Using Google Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Nilanjan Sinhababu and Pijush Kanti Dutta Pramanik
Estimating Authors’ Research Impact Using PageRank Algorithm . . . . . 471
Arpan Sardar and Pijush Kanti Dutta Pramanik
Research Misconduct and Citation Gaming: A Critical Review
on Characterization and Recent Trends of Research Manipulation . . . . . 485
Joyita Chakraborty, Dinesh K. Pradhan, and Subrata Nandi
x Contents
Neha Sharma is working with Tata Consultancy Services and is also a Founder
Secretary of Society for Data Science. Prior to this she has worked as Director of
premier Institute of Pune, that run post-graduation courses like MCA and MBA. She
is an alumnus of a premier College of Engineering and Technology, Bhubaneshwar
and completed her PhD from prestigious Indian Institute of Technology, Dhanbad.
She is an ACM Distinguished Speaker, a Senior IEEE member and Secretary of
IEEE Pune Section. She is the recipient of “Best PhD Thesis Award” and “Best
Paper Presenter at International Conference Award” at National Level. Her area of
interest includes Data Mining, Database Design, Analysis and Design, Artificial
intelligence, Big data, Cloud Computing, Block Chain and Data Science.
Prof. Amlan Chakrabarti is a Full Professor in the School of I.T. at the University
of Calcutta. He was a Post-Doctoral fellow at the Princeton University, USA during
2011–2012. He has almost 20 years of experience in Engineering Education and
Research. He is the recipient of prestigious DST BOYSCAST fellowship award in
Engg. Science (2011), JSPS Invitation Research Award (2016), Erasmus Mundus
Leaders Award from EU (2017), Hamied Visiting Professorship from University of
Cambridge (2018). He is an Associate Ed. of Elsevier Journal of Computers and
Electrical Engg. and Guest Ed. of Springer nature Journal in Applied Sciences. He
is a Sr. Member of IEEE and ACM, IEEE Comp. Society Distinguished Visitor,
Distinguished Speaker of ACM, Secretary of IEEE CEDA India Chapter and Vice
President of Data Science Society.
xi
xii About the Editors
Professor Alfred M. Bruckstein BSc, MSc in EE from the Technion IIT, Haifa,
Israel, and PhD in EE, from Stanford University, Stanford, California, USA, is a Tech-
nion Ollendorff Professor of Science, in the Computer Science Department there,
and is a Visiting Professor at NTU, Singapore, in the SPMS. He has done research
on Neural Coding Processes, and Stochastic Point Processes, Estimation Theory,
and Scattering Theory, Signal and Image Processing Topics, Computer Vision and
Graphics, and Robotics. Over the years he held visiting positions at Bell Laborato-
ries, Murray Hill, NJ, USA, (1987–2001) and TsingHua University, Beijing, China,
(2002–2023), and made short time visits to many universities and research centers
worldwide. At the Technion, he was the Dean of the Graduate School, and is currently
the Head of the Technion Excellence Program.
Track I
Simulation of Lotka–Volterra Equations
Using Differentiable Programming
in Julia
Ankit Roy
1 Introduction
A. Roy (B)
Westfield High School, Chantilly, VA, USA
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 3
N. Sharma et al. (eds.), Data Management, Analytics and Innovation,
Lecture Notes on Data Engineering and Communications Technologies 71,
https://doi.org/10.1007/978-981-16-2937-2_1
4 A. Roy
can turn into an extremely expensive process given a few hundred parameters, differ-
entiable programming instead allows us to take a pseudo-walk around parameter
space to find a good set to optimize. As a result, differentiable programming in deep
learning means that you can not only easily shift heavily parameterized models into
much simpler structures, but also heavily reduce the time and increase the efficiency
of a program. Additionally, differentiable programming exists in the intersection
between programming and calculus; it is a technique and language built specifically
for the optimization of various differential equations.
Already existing popular languages for artificial intelligence learning models such
as Python also lack efficient intrinsic parallelism, meaning that the program you run
is not efficient in running two or more parallel tasks at the accurate time. In differ-
entiable programming, speed is necessary as you differentiate a program, allowing
quick and easy tasks to be run through. Though existing Python libraries such as
PyTorch or TensorFlow are fast in running various models, such as a CNN or a
RNN, they lack the speed to execute networks built up of smaller operations. As
a result, programs such as Swift and Julia have become popular in their differen-
tiable programming implementation. In this paper, we aim to show the efficiency of
differentiable programming by running simulations of the Lotka–Volterra equations.
dx
= αx − βx y
dt
dy
= δx y − γ y
dt
We use the Flux libraries used for differentiable programming within Julia in
order to simulate differential equations. The Lotka–Volterra equations are defined
as two parallel first-order differential equations, and we aim to use differentiable
programming to be able to simulate both differential equations simultaneously.
2 Dataset
3 Approach Overview
dx
= αx − βx y
dt
dy
= δx y − γ y
dt
where x is the current number of prey, y is the current number of predators, dxdt
and
dy
dt
represent the change in prey and predators over time, respectfully, and α, β, γ,
δ are parameters that describe biological interactions between the two species. We
first set up these equations in Julia:
6 A. Roy
After setting up the equations, we use the ordinary differential equation (ODE)
solver in existing libraries in Julia. The ODE aims to solve the differential equation:
du
= f (u, p, t)
dt
Simulation of Lotka–Volterra Equations … 7
where p represents parameters and t represents a time interval. Setting up into Julia,
instead of simply passing one differential equation into the ODE, we set the entire
Lotka–Volterra model into the ODE, allowing us to work parallel on both equations
simultaneously. The Tsit5() represents the algorithm used, the Tsitouras 5/4 Runge–
Kutta method.
With the differential equations set up and represented in ODE, we turn to the Flux
library for solving and representing the models. We first set up our parameters, α, β,
γ , δ, explained above with Flux.
Finally, we set up the model to train in Flux to generate the Lotka–Volterra graphs.
We use the ADAM optimizer and train the model using Flux, passing the loss function,
parameters, data, optimizer, and a function to display data.
8 A. Roy
4 Results
After training our model in Flux, a graph is auto-generated showing the relationship
between the two differential equations. In the Lotka–Volterra model, we expect a
semi-inverse relationship between the predator and prey. As stated before, the model
shows the relationship between population between two species. The equations esti-
mate that with the decrease in the number of “predators” (shown as u2), it leads to
an eventual increase in the number of “prey” (shown as u1). This eventual increase
of prey leads to an increase of predators, which leads to the fluctuations and cycles
seen in the graph. This graph shows the power of differentiable programming in
relation to mathematical modeling: being able to simulate two differential equations
simultaneously (Fig. 3).
5 Conclusion
Acknowledgements The author of this paper would also like to thank Dr. Himadri Nath Saha for
his help in the idea of the paper and the support shown throughout writing the paper.
References
Abstract Feature selection (FS) plays vital role in reducing computing complexity
of the models due to irrelevant features in the data with the intention to develop
better predictive models. This process involves selecting significant features to apply
in machine learning for model building, where redundant features are removed and
new features developed. This approach involves selecting suitable features for use,
removing redundant features and create new feature in the process of building models.
The study focused on developing a predictive model that performs best for daily living
activities (ADLs) using Activity Recognition with Ambient Sensing (ARAS). In this
regard, we used feature importance, univariate, and correlation matrix to prepare
ARAS dataset before modeling the data. The following algorithms were used to assess
the accuracy of selected features, this includes; Logistic Regression (LR), SVM and
KNN to learn and analyze the data. The results show that SVM outperformed in
both House A and B, compared to other algorithms. Support Vector Machine (SVM)
performed best on univariate feature selection with 10 features compared to 5 features
with the accuracy of 100% from both House A and House B, while on feature
importance selection SVM performed best with 5 features compared to 10 features
with the accuracy of 99% from House A and 100% accuracy from House B. The
feature selection has improved the prediction accuracy in ARAS dataset compared
to the previous results, which achieved the accuracy of 61.5% in average score in
House A and 76.2% accuracy for House B.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 11
N. Sharma et al. (eds.), Data Management, Analytics and Innovation,
Lecture Notes on Data Engineering and Communications Technologies 71,
https://doi.org/10.1007/978-981-16-2937-2_2
12 J. W. Kasubi and D. H. Manjaiah
1 Introduction
The feature selection (FS) is very important practice in model development that
greatly affects the model’s efficiency, as result it offers a lot of benefits such as
simplifications of the model and better understanding of the data for easy interpre-
tations, model accuracy improvement, reducing model overfitting and shorten the
training time of the model. Selecting features that contribute most to the perfor-
mance of the model can be done automatically or manually, in this study we opted
automatically techniques to prepare our data before modeling the data [1].
The FS method involves three methods, namely embedded, wrapper-based and
filter-based. Filter-based techniques are used to select selects features based on a
performance measure, irrespective of the machine learning algorithms that will be
employed later while Wrapper techniques are the method of selecting features that we
are trying to fit into a given dataset based on a specific machine learning algorithm and
the Embedded approach incorporates the benefits of filter and wrapper techniques
to perform feature selection. In this study, we deployed filter techniques to carter
for feature selection due to its advantages over others includes user friendly and
provide better results though the wrap base method provided accurate results than
filter methods, with wrapper methods increasing the cost of processing [2].
Smart Home pays a vital role in human activity recognition, as a result, helps diag-
nose diseases at early stage, not only do smart homes provide health care services,
in addition, it utilizes the IoT technologies to track dangerous activities from occur-
ring at homes, control and monitor of energy and water usage that are taking place
at home and we can make home a better place to live by automating whatever we
need to automate, limitation is where our imaginations stop [3, 4]. Human activity
recognition (HAR) is used to explore different activities performed by human within
the smart home in the presence of sensors in this regard plays a significant role in
monitoring the daily activities of human life which result into healthcare, security,
electricity and water usage [5].
This study was carried out using the ARAS dataset, which involved 27 different
types of activities. The ARAS Dataset in smart home was collected using the installed
sensor into different household appliances such refrigerators, kitchen, sitting room,
bedroom, toilet, Laundry and so forth, of which at the end generated this huge amount
of data (5,184,000 instances) [6].
The research purposes for ARAS Dataset are to enhancing quality of life and
maintain the comfort of its residents, for this matter smart home must be competent
to collect all behavior changes performed by residents in their daily activities in
order to be able to extract hidden knowledge and insights through using machine
learning algorithms. Healthcare experts strongly agree that monitoring for changes
in the ADLs is the better ways to detect potential health problems before they become
uncontrollable [7]. Recognizing activities performed by smart home residents and
their activity of daily living can significantly assist in offering healthcare, security,
grid and water usage, automation, and more importance for the quality of human life.
However, feature selection plays a vital role in ARAS dataset to influence feature
Feature Selection Strategy for Multi-residents Behavior … 13
values to activity recognition performance of the models, and test the relationship
between different activities and activity recognition accuracy rate.
This paper contributes by performing feature selection on ARAS dataset which is
used to prepare our data before modeling the data [8]. Three machine learning algo-
rithms were subsequently used to improve the effectiveness of the selected features
in ARAS dataset to predict future outcomes and the results demonstrate that higher
recognition rates are produced by the proposed techniques.
This work is structured as follow: the relevant works are briefly explained in
Sect. 2; Sect. 3; express the resources and methods applied; Sect. 4 presents the
experimental outcomes and discussions; Sect. 5, presents the conclusions and give
suggestions for future study.
2 Related Work
This part explains previous related works reviewed in relation to smart home and
feature selection techniques, the reviewed articles are as shown below:
Alberdi et al. [9] suggested feature selection to be used to detect the multimodal
symptoms in smart home, classification models were developed to recognize correct
complete modification of scores that predict symptoms, for this matter different algo-
rithms were used to resolve levels difference. The results show that feature selection
boosted the model accuracy and not all behavioral patterns contribute equally to a
symptom’s prediction.
Shangfeng et al. [10] conducted a study on human activity in smart home with
evaluation to develop human activity model using extreme learning machine (ELM),
using CASAS dataset. The outcome shows that ELM in ADL was improved after
conducting feature selection and hidden neural networks lead to distinct recognition
accuracy.
Labib et al. [11] provided a study on activity recognition in smart home based
on basic activities performed by residents in the smart home such that old living
people residents may facilitate their own homes such as cooking meals or watching
TV independently. Experimental evaluation used Kastern and CASAS (kyoto1 and
Kyoto7) dataset to carter for the same. The results show that Kyoto7 dataset obtained
accuracy of 77%, while Kasteren and Kyoto1 achieved accuracy of 93% and 97%,
respectively; the model accuracy was good after the researcher performing feature
selection to the respective datasets.
Hameed et al. [12] applied feature selection using DT and RFE to remove irrele-
vant features in the dataset, for this matter; Enhanced ELR classifier, LR and MLP
were used in prediction, as a result ELR classifier outperformed compared to LR and
MLP after feature selection.
Manoj et al. [13] proposed ACO and ANN for feature selection by using hybrid
method, in order to remove unnecessary and redundant features from the dataset.
The experimental results outperformed by using hybrid algorithms as compared to
the previous results after performing feature selection.
14 J. W. Kasubi and D. H. Manjaiah
Liu et al. [14] presented the idea to recognize ADL by applying feature selection in
smart home using PCC; researcher engaged three machine learning algorithms to test
the efficiency of the recommended approach in ADL detection. The experimental
results show that the recommended method produces better outcome of the ADL
recognition.
Tanaka et al. [15] suggested a swarm optimizer to support feature selection model
for KLR. The proposed technique reduces unnecessary and redundant features, as
the result the experiment demonstrates the increase in the generalization efficiency
of the method proposed.
Fang et al. [16] presented BP algorithm outperformed on feature selection based on
smart home, the study used three machine learning techniques. The results concluded
that after performing feature selection on the research dataset, the ADL detection
performance of the NN using the BP algorithm is robust than NB and HMM Model.
Abudalfa et al. [17], the researcher, evaluated semi-supervised clustering for ADL
detection with different ML approaches to carter for the same. The use of feature
selection improved the performance, the values increased and the confidence level
is decreased. The experimental results show that the presented technique provided
remarkable accuracy; the performance was improved significantly when applying
more sophisticated feature selection techniques.
Pablo et al. [18] presented feature selection for IoT using wrapper-based feature
selection method by merging RFE and GBTs that were able to pick the most signifi-
cant attributes automatically. The experimental shows better results, compared before
performing feature selection.
Oukrich [19] conducted a research on feature selection for ADL recognition in
Smart Home using Cairo and Aruba datasets. Researchers commonly tested several
methods, such as VBN, KNN, Hidden Markov, DT, SVM, CRFs. The outcomes show
that the accuracy attained in Aruba dataset was 90.05% and 88.49% for Cairo, sugges-
tions were given to use deep learning techniques, as compared to other traditional
machine learning algorithms, they are proved to be effective and robust.
Minor and Cook [20] compared the performance of three classifiers; regression
tree, linear regression and SVM and applied them to CASAS datasets for forecasting
the future occurrence of activities. The experimental outcome discovered that regres-
sion tree produces better predicts of activity with lower error, faster training time and
capacity to handle more complex datasets than SVM and linear regression classifiers
after running feature selection in the dataset. In order to enhance activity forecasts
with less complexity, researchers proposed adapting other methods of classifiers and
combining numerical forecasting techniques in future work.
In fact, the selection of features is the first phase in model development, which
is used to decrease the model’s complexity by selecting the proper features by
computing the value of each one in the dataset in order to provide a good predictive
model output [21].
Feature Selection Strategy for Multi-residents Behavior … 15
3 Methods
This part presents the methods used in this work during the development of
the predictive model for sleep behavior in smart homes using machine learning
techniques.
Table 1 provides characteristics of the ARAS dataset which was composed of two
residential houses, which involved 27 different types of activities and it was generated
in Turkey in 2013. Activities collected from House A and B every day were as
follows; having shower, toileting, preparing breakfast, having breakfast, preparing
dinner, having dinner, going out, sleeping, having snacks, watching TV, studying,
and reading books while on the other side of the activities that were not performed
in every day were such as washing dishes, napping, laundry, shaving, talking on
the phone, listening to music, having conversation and having guest. In House A
and House B, the aggregate number of occurrences of activities is 2177 and 1023,
respectively. For each day there were 86,400 data points, that consist of the time
stamp, and in this study the prediction on both House A and B show that going out,
sleeping, studying, watching TV and having breakfast were frequently performed by
the residents [22].
We use the following four assessment measures were used for the evaluation of the
proposed approach:
16 J. W. Kasubi and D. H. Manjaiah
TP + TN
Accuracy = (1)
TP + TN + FP + FN
TP
Precision = (2)
TP + FP
TP
Recall = (3)
TP + FN
Precision ∗ Recall
F1-Measure = 2 ∗ (4)
Precision + Recall
The feature selection was performed using python programming language to decrease
the complication of the predictive model and to know the influence of the characteris-
tics on the general ARAS dataset forecast. The purpose of the FS process is to define
characteristics that provide the best score and remove unnecessary features that are
likely to cause complexity of the model of which may lead to poor performance of
the model [23].
In this study, we deployed filter techniques to carter for feature selection due to its
advantages over others such as easy to use and also gives good results, though wrap
base method wrapper methods increasing the cost of processing though it provides
enhanced results than filter methods. In this regard, we used different filtering tech-
niques such univariate, feature importance and correlation matrix analysis to prepare
our data before modeling the data [24]. To test the accuracy of the proposed FS
techniques in the ARAS dataset, three machine learning algorithms were used. The
outcome shows that the suggested technique provides better outcomes (Fig. 1).
Feature Selection Strategy for Multi-residents Behavior … 17
To evaluate the performance of the proposed approaches for ARAS dataset from both
House A and B, we employed LR, SVM and KNN to develop models and perform
comparisons of the results obtained to both House A and B.
The prediction was performed in order to obtain accuracies and matrices measures
using ARAS dataset in House A and B after performing feature selection techniques.
The outcome shows that SVM outperformed in both House A and B; univariate
feature selection performed best with 10 features compared to 5 features with the
accuracy of 100% from both House A and House B using the SVM, while on the side
feature importance selection performed best with 5 features compared to 10 features
with the accuracy of 99% from House A and 100% accuracy from House B using
the SVM. This implies that SVM algorithms performed best in ARAS dataset using
univariate feature selection with 10 numbers of features while feature importance
selection with 5 numbers of features (Tables 4, 5 and 6).
20 J. W. Kasubi and D. H. Manjaiah
From Table 7, the results show that after performing feature selection on ARAS
Dataset, the performance of the model increased compared to the previous studies,
which achieved the average score of 100% from both House A and B compared to
the previous results which achieved the accuracy of 61.5% in average score in House
A and 76.2% accuracy for House B.
5 Conclusion
The result of the prediction for feature selection in smart home environment using
ARAS dataset from both House A and B, respectively, showed that SVM algorithms
outperformed in feature selection compared to other algorithms such as Logistic
Feature Selection Strategy for Multi-residents Behavior … 21
Regression (LR) and KNN. The SVM performed best on univariate feature selection
with 10 features compared to 5 features with the accuracy of 100% from both House
A and House B, while on feature importance selection SVM performed best with
5 features compared to 10 features with the accuracy of 99% from House A and
100% accuracy from House B. The feature selection has improved the prediction
accuracy in ARAS dataset compared to the previous results, which achieved the
accuracy of 61.5% in average score in House A and 76.2% accuracy for House
B. For future work, we suggest different algorithms and feature selection methods
like wrap-based and embedded to be employed on ARAS dataset for comparison
purposes and improvement of the accuracy.
22
Table 4 Prediction results for logistic regression (LR) model for House A and B
Feature selection techniques No. of features House A House B
Acc Precision Recall F1-score Acc Precision Recall F1-score
Univariate feature selection 10 0.98 0.99 0.98 0.98 0.99 0.98 0.97 0.99
5 0.91 0.91 0.90 0.90 0.99 0.97 0.98 0.99
Importance feature selection 10 0.96 0.96 0.95 0.94 0.98 0.98 0.99 0.97
5 0.91 0.91 0.92 0.90 0.99 0.99 0.98 0.99
Correlation 10 0.93 0.91 0.90 0.91 0.92 0.90 0.90 0.91
J. W. Kasubi and D. H. Manjaiah
Table 5 Prediction results for SVM model for House A and B
Feature selection techniques No. of features House A House B
Acc Precision Recall F1-score Acc Precision Recall F1-score
Univariate feature selection 10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
5 0.99 0.99 0.98 0.97 0.99 0.98 0.99 0.98
Importance feature selection 10 0.98 0.97 0.98 0.97 0.99 0.96 0.98 0.99
5 0.99 0.98 0.99 0.98 1.00 1.00 1.00 1.00
Feature Selection Strategy for Multi-residents Behavior …
Table 7 Comparisons of prediction results on ARAS Dataset with previous research work
Research study Accuracy-average score Accuracy-average score
House A (%) House B (%)
Current study 100 100
Previous study 61.5 76.2
References
1. Brownlee J (2016) Machine learning mastery with Python: understand your data, create accurate
models, and work projects end-to-end. Machine Learning Mastery
2. Raschka S, Mirjalili V (2017) Python machine learning. Packt Publishing Ltd
3. Kwon M-C, Choi S (2018) Recognition of daily human activity using an artificial neural
network and smartwatch. Wirel Commun Mobile Comput 2018
4. Oukrich N, Maach A et al (2019) Human daily activity recognition using neural networks and
ontology-based activity representation. In: Proceedings of the Mediterranean symposium on
smart city applications. Springer, pp 622–633
5. Wang J, Chen Y, Hao S, Peng X, Lisha Hu (2019) Deep learning for sensor-based activity
recognition: a survey. Pattern Recogn Lett 119:3–11
6. Igwe OM, Wang Y, Giakos GC (2018) “Activity learning and recognition using margin setting
algorithm in smart homes. In: 2018 9th IEEE annual ubiquitous computing, electronics &
mobile communication conference (UEMCON). IEEE
7. Fang H, Srinivasan R, Cook DJ (2012) Feature selections for human activity recognition in
smart home environments. Int J Innov Comput Inf Control 8:3525–3535
8. Alemdar H, Ersoy C (2017) Multi-resident activity tracking and recognition in smart
environments. J Ambient Intell Humaniz Comput 8(4):513–529
9. Alberdi A et al (2018) Smart home-based prediction of multidomain symptoms related to
Alzheimer’s disease. IEEE J Biomed Health Inf 22(6):1720–1731
10. Chen S, Fang H, Liu Z (2020) Human activity recognition based on extreme learning machine
in smart home. J Phys Conf Ser 1437(1)
11. Fahad LG, Tahir FT (2020) Activity recognition in a smart home using local feature weighting
and variants of nearest-neighbors classifiers. J Ambient Intell Humanized Comput, 1–10
12. Hameed J et al (2020) Enhanced classification with logistic regression for short term price
and load forecasting in smart homes. In: 2020 3rd international conference on computing,
mathematics and engineering technologies (iCoMET). IEEE
13. Manoj RJ, Anto Praveena MD, Vijayakumar K (2019) An ACO–ANN based feature selection
algorithm for big data. Cluster Comput 22(2):3953–3960
14. Liu Y et al (2020) Daily activity feature selection in smart homes based on Pearson correlation
coefficient. Neur Process Lett, 1–17
15. Tanaka K, Kurita T, Kawabe T (2007) Selection of import vectors via binary particle swarm
optimization and cross-validation for kernel logistic regression. In: 2007 international joint
conference on neural networks. IEEE
16. Fang H et al (2014) Human activity recognition based on feature selection in smart home using
back-propagation algorithm. ISA Trans 53(5):1629–1638
17. Abudalfa S, Qusa H (2019) Evaluation of semi-supervised clustering and feature selection for
human activity recognition. Int J Comput Digital Syst 8(6)
18. Rodriguez-Mier P, Mucientes M, Bugarín A (2019) Feature selection and evolutionary rule
learning for Big Data in smart building energy management. Cogn Comput 11(3):418–433
19. Oukrich N (2019) Daily human activity recognition in smart home based on feature selection,
neural network and load signature of appliances. PhD thesis
20. Minor B, Cook DJ (2017) Forecasting occurrences of activities. Pervasive Mobile Comput
38:77–91
26 J. W. Kasubi and D. H. Manjaiah
21. Zainab A, Refaat SS, Bouhali O (2020) Ensemble-based spam detection in smart home IoT
devices time series data using machine learning techniques. Information 11(7):344
22. Alemdar H et al (2013) ARAS human activity datasets in multiple homes with multiple resi-
dents. In: 2013 7th international conference on pervasive computing technologies for healthcare
and workshops. IEEE
23. Tang S et al (2019) Smart home IoT anomaly detection based on ensemble model learning
from heterogeneous data. In: 2019 IEEE international conference on big data (big data). IEEE
24. Mohammadi M et al (2018) Deep learning for IoT big data and streaming analytics: a survey.
IEEE Commun Surv Tutor 20(4):2923–2960
A Comparative Study on Self-learning
Techniques for Securing Digital Devices
Abstract In the present time, the use of technology in our daily activities is impera-
tive as we employ various technological solutions for activities like banking, commu-
nication, business and e-governance. All of this requires large network infrastruc-
tures, and maintaining the security of the same is still challenging as there is no dearth
of malicious attacks. Our reliability on these systems and the volume of activity that
these systems facilitate makes them more vulnerable to cyber-attacks. Apart from
the traditional firewalls and anti-malware software, tools like the intrusion detection
system (IDS) can be really helpful in increasing the security of our networks and
information systems. There are different techniques by which an intrusion can be
detected in a network. Out of which, a number of machine learning solutions can be
used to develop a robust IDS as they are capable of detecting an intrusion in a network
efficiently with the availability of historical data. In this paper, a comparative study
has been conducted between such techniques that have been leveraged to build an
intelligent IDS. The performance of these models is compared using the accuracy
rate, and it is observed that artificial neural networks give the best accuracy rate, i.e.
98% for network intrusion detection. Most of these experiments were conducted by
their respective authors using the KDDCup99 dataset or the improved NSL-KDD
dataset, both of which are relatively old. In order to build our version of intelligent
IDS, we aim to leverage deep learning algorithms along with recently developed
datasets such as the CICIDS2017 dataset. This comparative study will be helpful to
compare and contrast the various techniques available to develop a competent IDS.
1 Introduction
As more and more people get access to the Internet every day, they are introduced
to a vast resource of knowledge and information. But with this comes the threat of
being a victim of a cyber-attack. Security has become an important and unavoidable
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 27
N. Sharma et al. (eds.), Data Management, Analytics and Innovation,
Lecture Notes on Data Engineering and Communications Technologies 71,
https://doi.org/10.1007/978-981-16-2937-2_3
28 D. Kumar et al.
An intrusion detection system (IDS) scans the network for dubious or suspicious
behaviour and notifies the administrator of potential threats. It is a mechanism that
monitors network traffic for data breach or security hazard, if any. While IDS tracks
the network for potentially harmful activities, false alarms are equally possible
and unavoidable. As a result, proper configuration of intrusion detection systems
is needed to understand and differentiate between regular network traffic and any
anomalous or malicious behaviour. Intrusion prevention systems often track packets
entering the network and verify the possibility of anomalies. It further sends warning
alerts to the administrator if any deviation is observed in the packets. IDS can be
A Comparative Study on Self-learning Techniques … 29
IDS technologies are distinguished fundamentally by the entity that they examine
and the methods by which the features are achieved. Broadly, they are categorized
as follows:
(a) Network-Based
They keep an eye on the network traffic for a section of the network. It also examines
the network and application protocol activity to point out anomalous behaviour.
Network intrusion detection systems (NIDS) are established at a suitable point in a
network to monitor the traffic of all users in the network. It tracks the movement on
the entire subnet and compares it with previous movements on the subnets in order
to select any previously occurred, thus, known attacks. In case a suspicious activity
or an attack is reported, it may be brought to the administrator’s notice. For example,
a NIDS is installed on a subnet where firewalls are also set up to keep a watch if an
intruder is trying to penetrate a firewall.
(b) Host-Based
These IDS monitor the host and the events that occur within the host. Host intrusion
detection systems (HIDS) operate on a network of separate servers or computers.
HIDS tracks the flow of packets only from the system and warns the user if any
malicious or anomalous behaviour is observed. It records the details of files in the
30 D. Kumar et al.
current device and contrasts it with existing records. If the analytical system files
have been tampered with, it is brought to the administrator’s notice. For example,
HIDS are used in devices that are vital and required to keep their architecture intact
for smooth functioning of the system.
A signature is a pattern that has been developed with the knowledge of known threats.
These signatures can be used to map with the ongoing activity in the network, and
if they appear to be similar, they can be classified under the same category thereby
labelling the ongoing activity as a threat. In signature-based detection, we compare
the signature with the ones that are known to be a threat against the events that are
being observed in the network.
models that have been developed and used specifically for the task of intrusion
detection. In this review, we further look at solutions that have been provided to
develop a robust intrusion detection system and evaluate the best practices available
to us.
3.1 Classification
Anish Halimaa et al. [1] explore the techniques that can be used to develop a machine
learning-based IDS. They emphasize on the importance of accuracy as a key factor
in the performance of the system. They intend to propose an approach with reduced
false alarms or false positives to improve the detection rate.
Out of the available machine learning techniques, they apply support vector
machine (SVM) and Naive Bayes to the NSL-KDD knowledge discovery dataset,
which is a refined version of the benchmark KDDCup99 dataset. They have designed
an experiment where they used three different approaches in order to examine the
efficiency of the two algorithms viz. SVM and Naive Bayes. The metrics for the
same are the accuracy rate and the misclassification rate of the model. We discuss
this in detail in the subsequent paragraph.
The first approach consists of using the algorithm itself to build the SVM and
Naive Bayes models for the purpose of intrusion detection. In order to get the best
out of the dataset, they go on to incorporate feature reduction and normalization which
branches out to give the other two approaches. For the second approach, CfsSub-
setEval [2] is adopted for feature reduction—a technique that helps in extracting out
the most relevant attributes. This gives us two new and updated models viz. SVM-
CfsSubsetEval and Naive Bayes-CfsSubsetEval. For the third and final approach,
they apply normalization to the dataset which results in SVM-normalization and
Naive Bayes-normalization.
As a result of this experiment, the authors conclude that models based out of SVM
significantly outperform those based out of Naive Bayes. This holds true even for the
models obtained after feature reduction and normalization. This conclusion is also
evident when the data obtained from this experiment is examined.
Regression analysis deals with the relationship of the output variable with the set
of input features. On the other hand, clustering literally means forming clusters or
groups of data exhibiting similar features which can further help us in classifying
between different data groups. As per our assumption, these data groups will be
benign and malign connections or activities in the network.
A similar approach is observed in the paper presented by Dikshant Gupta et al. [3].
The paper deals with two techniques viz. linear regression and K-means clustering in
32 D. Kumar et al.
order to develop a model capable of detecting an intrusion in a network. The model has
been developed and trained upon the NSL-KDD dataset. In the experiment conducted
by the authors, they first do adequate data preprocessing which includes transforming
the nominal features into numerical inputs which are favourable conditions for the
techniques involved. The dataset is also preprocessed using the mean normalization
method before the algorithms are applied.
Linear regression gives out an accuracy rate of 80.14% (for cost variation alpha =
0.005) which is significant but not satisfying. On the other hand, K-means clustering
displayed a tolerable accuracy rate of 67.5% accuracy. Perhaps, the results seem
to be sufficient for experimental purposes, but they may not satisfy the industrial
requirements. In order to achieve a better accuracy rate, we may look for multi-level
hybrid models or other self-learning techniques (some of which are discussed further
in this review).
version of the KDDCup99 dataset, gave an accuracy rate of 98% which is slightly
higher than that of the proposed LSTM-RNN model. It will probably be interesting to
see what the results would be if a different dataset (probably, the NSL-KDD dataset
or a relatively newer dataset) is used to train the classifier based on the proposed
model.
Genetic algorithm is a series of steps that are inspired by the process of natural
selection. This process is based on the theory of natural evolution which was proposed
by Charles Darwin. Genetic algorithm is used for optimization and searching with
respect to evolutionary computation [9].
Similar to what happens in the biological evolutionary process where the best of
the genetic information is taken from the pool of information and is transferred to
the next generation, the best features of the data are selected for the new generation
resulting in optimization of information and better computational solutions.
One of the first mentions of this algorithm can be traced back to 1957 when
Fraser [10] recommended that genetic systems be modelled in computers. Like neural
networks, it is also inspired by a natural biological process, and it aims to solve
computational problems by mimicking the processes that nature has enabled us with.
Resende et al. [11] talk about an anomaly based IDS which is adaptive and selects
the appropriate attributes in order to profile the ‘normal’ behaviour for a network
using genetic algorithms. Any activity which deviates from this normal behaviour
can be classified as an anomalous behaviour in the network. As per the author(s), this
process of classification is efficient in detecting the intrusion in a network. The role
of a genetic algorithm in this approach is to help extract relevant attributes necessary
for profiling the normal behaviour in a system.
In the experiments conducted on the CICIDS2017 dataset [12], their approach
gave an accuracy rate of 92.85% and a false positive rate of 0.69%. In order to boost
the effectiveness of the model in a real implementation, they propose evolving the
initial population in a large number of generations (greater than 1000) and over a
dataset consisting of different combinations of attacks.
Fuzzy logic originates from the fuzzy set theory according to which reasoning is
approximate rather than reliably deduced from classical predicate logic. This factor
enables fuzzy techniques to be used in anomaly and/or intrusion detection because
the features to be extracted and examined for solving this problem can behave as
fuzzy variables.
34 D. Kumar et al.
4 Datasets
This dataset was used for The Third International Knowledge Discovery and Data
Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth
International Conference on Knowledge Discovery and Data Mining [14]. It is built
on the data captured in The Defence Advanced Research Projects Agency’98 IDS
conference. It is tcpdump data recorded over 7 weeks. Each feature in the dataset is
either labelled as an attack feature, normal feature or content feature. There are 41
features which are further classified as follows:-
(a) Denial of Service Attack
In this attack, the memory or computing resources are made unavailable to legitimate
user(s). This is possible by allowing consumption of resources crucial to the system
such as bandwidth and/or memory. The motive behind this attack is to freeze or
Another random document with
no related content on Scribd:
the box is fastened at a height of some five or six feet above the
ground, or hung up (but this is not so common) like a swinging bar
on a stand made for the purpose. This last arrangement is
particularly safe, as affording no access to vermin. As the birds
multiply, the owner adds cylinder to cylinder till they form a kind of
wall. Towards sunset, he or his wife approaches the dovecote, greeted
by a friendly cooing from inside, picks up from the ground a piece of
wood cut to the right size, and closes the opening of the first bark box
with it, doing the same to all the others in turn, and then leaves them
for the night, secure that no wild cat or other marauder can reach
them.
I have found out within the last few days why so few men are to be
seen in my rounds. The settlements here scarcely deserve the name
of villages—they are too straggling for that; it is only now and then
that from one hut one can catch a distant glimpse of another. The
view is also obstructed by the fields of manioc, whose branches,
though very spreading, are not easily seen through on account of the
thickly-growing, succulent green foliage. This and the bazi pea are,
now that the maize and millet have been gathered in, the only crops
left standing in the fields. Thus it may happen that one has to trust
entirely to the trodden paths leading from one hut to another, to be
sure of missing none, or to the guidance of the sounds inseparable
from every human settlement. There is no lack of such noises at
Masasi, and in fact I follow them almost every day. Walking about
the country with Nils Knudsen, I hear what sounds like a jovial
company over their morning drink—voices becoming louder and
louder, and shouting all together regardless of parliamentary rules. A
sudden turn of the path brings us face to face with a drinking-party,
and a very merry one, indeed, to judge by the humour of the guests
and the number and dimensions of the pombe pots which have been
wholly or partially emptied. The silence which follows our
appearance is like that produced by a stone thrown into a pool where
frogs are croaking. Only when we ask, “Pombe nzuri?” (“Is the beer
good?”) a chorus of hoarse throats shouts back the answer—“Nzuri
kabisa, bwana!” (“Very good indeed, sir!”)
As to this pombe—well, we Germans fail to appreciate our
privileges till we have ungratefully turned our backs on our own
country. At Mtua, our second camp out from Lindi, a huge earthen
jar of the East African brew was brought as a respectful offering to us
three Europeans. At that time I failed to appreciate the dirty-looking
drab liquid; not so our men, who finished up the six gallons or so in a
twinkling. In Masasi, again, the wife of the Nyasa chief Masekera
Matola—an extremely nice, middle-aged woman—insisted on
sending Knudsen and me a similar gigantic jar soon after our arrival.
We felt that it was out of the question to refuse or throw away the
gift, and so prepared for the ordeal with grim determination. First I
dipped one of my two tumblers into the turbid mass, and brought it
up filled with a liquid in colour not unlike our Lichtenhain beer, but
of a very different consistency. A compact mass of meal filled the
glass almost to the top, leaving about a finger’s breadth of real, clear
“Lichtenhainer.” “This will never do!” I growled, and shouted to
Kibwana for a clean handkerchief. He produced one, after a
seemingly endless search, but my attempts to use it as a filter were
fruitless—not a drop would run through. “No use, the stuff is too
closely woven. Lete sanda, Kibwana” (“Bring a piece of the shroud!”)
This order sounds startling enough, but does not denote any
exceptional callousness on my part. Sanda is the Swahili name for
the cheap, unbleached and highly-dressed calico (also called bafta)
which, as a matter of fact, is generally used by the natives to wrap a
corpse for burial. The material is consequently much in demand, and
travellers into the interior will do well to carry a bale of it with them.
When the dressing is washed out, it is little better than a network of
threads, and might fairly be expected to serve the purpose of a filter.
I found, however, that I could not strain the pombe through it—a
few scanty drops ran down and that was all. After trying my tea and
coffee-strainers, equally in vain, I gave up in despair, and drank the
stuff as it stood. I found that it had a slight taste of flour, but was
otherwise not by any means bad, and indeed quite reminiscent of my
student days at Jena—in fact, I think I could get used to it in time.
The men of Masasi seem to have got only too well used to it. I am far
from grudging the worthy elders their social glass after the hard work
of the harvest, but it is very hard that my studies should suffer from
this perpetual conviviality. It is impossible to drum up any
considerable number of men to be cross-examined on their tribal
affinities, usages and customs. Moreover, the few who can reconcile
it with their engagements and inclinations to separate themselves for
a time from their itinerant drinking-bouts are not disposed to be very
particular about the truth. Even when, the other day, I sent for a
band of these jolly topers to show me their methods of
basketmaking, the result was very unsatisfactory—they did some
plaiting in my presence, but they were quite incapable of giving in
detail the native names of their materials and implements—the
morning drink had been too copious.
It is well known that it is the custom of most, if not all, African
tribes to make a part of their supply of cereals into beer after an
abundant harvest, and consume it wholesale in this form. This, more
than anything else, has probably given rise to the opinion that the
native always wastes his substance in time of plenty, and is nearly
starved afterwards in consequence. It is true that our black friends
cannot be pronounced free from a certain degree of “divine
carelessness”—a touch, to call it no more, of Micawberism—but it
would not be fair to condemn them on the strength of a single
indication. I have already laid stress on the difficulty which the
native cultivator has of storing his seed-corn through the winter. It
would be still more difficult to preserve the much greater quantities
of foodstuffs gathered in at the harvest in a condition fit for use
through some eight or nine months. That he tries to do so is seen by
the numerous granaries surrounding every homestead of any
importance, but that he does not invariably succeed, and therefore
prefers to dispose of that part of his crops which would otherwise be
wasted in a manner combining the useful and the agreeable, is
proved by the morning and evening beer-drinks already referred to,
which, with all their loud merriment, are harmless enough. They
differ, by the bye, from the drinking in European public-houses, in
that they are held at each man’s house in turn, so that every one is
host on one occasion and guest on another—a highly satisfactory
arrangement on the whole.
My difficulties are due to other causes besides the chronically
bemused state of the men. In the first place, there are the troubles
connected with photography. In Europe the amateur is only too
thankful for bright sunshine, and even should the light be a little
more powerful than necessary, there is plenty of shade to be had
from trees and houses. In Africa we have nothing of the sort—the
trees are neither high nor shady, the bushes are not green, and the
houses are never more than twelve feet high at the ridge-pole. To this
is added the sun’s position in the sky at a height which affects one
with a sense of uncanniness, from nine in the morning till after three
in the afternoon, and an intensity of light which is best appreciated
by trying to match the skins of the natives against the colours in Von
Luschan’s scale. No medium between glittering light and deep black
shadow—how is one, under such circumstances, to produce artistic
plates full of atmosphere and feeling?
For a dark-room I have been trying to use the Masasi boma. This is
the only stone building in the whole district and has been
constructed for storing food so as to prevent the recurrence of famine
among the natives, and, still more, to make the garrison independent
of outside supplies in the event of another rising. It has only one
story, but the walls are solidly built, with mere loopholes for
windows; and the flat roof of beaten clay is very strong. In this
marvel of architecture are already stacked uncounted bags
containing millet from the new crop, and mountains of raw cotton. I
have made use of both these products, stopping all crevices with the
cotton, and taking the bags of grain to sit on, and also as a support
for my table, hitherto the essential part of a cotton-press which
stands forsaken in the compound, mourning over the shipwreck it
has made of its existence. Finally, I have closed the door with a
combination of thick straw mats made by my carriers, and some
blankets from my bed. In this way, I can develop at a pinch even in
the daytime, but, after working a short time in this apartment, the
atmosphere becomes so stifling that I am glad to escape from it to
another form of activity.
On one of my first strolls here, I came upon
a neat structure which was explained to me as
“tego ya ngunda”—a trap for pigeons. This is
a system of sticks and thin strings, one of
which is fastened to a strong branch bent over
into a half-circle. I have been, from my youth
up, interested in all mechanical contrivances,
and am still more so in a case like this, where
we have an opportunity of gaining an insight
into the earlier evolutional stages of the
RAT TRAP human intellect. I therefore, on my return to
camp, called together all my men and as many
local natives as possible, and addressed the assembly to the effect
that the mzungu was exceedingly anxious to possess all kinds of
traps for all kinds of animals. Then followed the promise of good
prices for good and authentic specimens, and the oration wound up
with “Nendeni na tengenezeni sasa!” (“Now go away and make up
your contraptions!”).
How they hurried off that day, and how eagerly all my men have
been at work ever since! I had hitherto believed all my carriers to be
Wanyamwezi—now I find, through the commentaries which each of
them has to supply with his work, that my thirty men represent a
number of different tribes. Most of them, to be sure, are
Wanyamwezi, but along with them there are some Wasukuma and
Manyema, and even a genuine Mngoni from Runsewe, a
representative of that gallant Zulu tribe who, some decades ago,
penetrated from distant South Africa to the present German
territory, and pushed forward one of its groups—these very Runsewe
Wangoni—as far as the south-western corner of the Victoria Nyanza.
As for the askari, though numbering only thirteen, they belong to no
fewer than twelve different tribes, from those of far Darfur in the
Egyptian Sudan to the Yao in Portuguese East Africa. All these
“faithfuls” have been racking their brains to recall and practise once
more in wood and field the arts of their boyhood, and now they come
and set up, in the open, sunny space beside my palatial abode, the
results of their unwonted intellectual exertions.
The typical cultivator is not credited in literature with much skill
as a hunter and trapper; his modicum of intellect is supposed to be
entirely absorbed by the care of his fields, and none but tribes of the
stamp of the Bushmen, the Pygmies and the Australian aborigines
are assumed by our theoretic wisdom to be capable of dexterously
killing game in forest or steppe, or taking it by skilful stratagem in a
cunningly devised trap. And yet how wide of the mark is this opinion
of the schools! Among the tribes of the district I am studying, the
Makua are counted as good hunters, while at the same time they are
like the rest, in the main, typical hoe-cultivators—i.e., people who,
year after year, keep on tilling, with the primitive hoe, the ground
painfully brought under cultivation. In spite of their agricultural
habits their traps are constructed with wonderful ingenuity. The
form and action of these traps is sufficiently evident from the
accompanying sketches; but in case any reader should be entirely
without the faculty of “technical sight,” I may add for his benefit that
all these murderous implements depend on the same principle.
Those intended for quadrupeds are so arranged that the animal in
walking or running forward strikes against a fine net with his muzzle,
or a thin cord with his foot. The net or the string is thereby pressed
forward, the upper edge of the former glides downwards, but the end
of the string moves a little to one side. In either case this movement
sets free the end of a lever—a small stick which has hitherto, in a way
sufficiently clear from the sketch—kept the trap set. It slips
instantaneously round its support, and in so doing releases the
tension of the tree or bent stick acting as a spring, which in its
upward recoil draws a skilfully fixed noose tight round the neck of
the animal, which is then strangled to death. Traps of similar
construction, but still more cruel, are set for rats and the like, and,
unfortunately, equal cunning and skill are applied to the pursuit of
birds. Perhaps I shall find another opportunity of discussing this side
of native life; it certainly deserves attention, for there is scarcely any
department where the faculty of invention to be found in even the
primitive mind is so clearly shown as in this aspect of the struggle for
existence.
It is not very easy to locate my present abode on the map. Masasi and
its exact latitude and longitude have been known to me for years, but
of this strangely named place,[17] where I drove in my tent-pegs a few
days ago, I never even heard before I had entered the area of the
inland tribes.
One trait is common to all Oriental towns, their beauty at a
distance and the disillusionment in store for those who set foot
within their walls. Knudsen has done nothing but rave about
Chingulungulu ever since we reached Masasi. He declared that its
baraza was the highest achievement of East African architecture,
that it had a plentiful supply of delicious water, abundance of all
kinds of meat, and unequalled fruit and vegetables. He extolled its
population, exclusively composed, according to him, of high-bred
gentlemen and good-looking women, and its well-built, spacious
houses. Finally, its situation, he said, made it a convenient centre for
excursions in all directions over the plain. I have been here too short
a time to bring all the details of this highly coloured picture to the
test of actual fact, but this much I have already ascertained, that
neither place nor people are quite so paradisaical as the enthusiastic
Nils would have me believe.
YAO HOMESTEAD AT CHINGULUNGULU
His name, Kofia tule, was at first a puzzle to me. I knew that kofia
means a cap, but, curiously enough it never occurred to me to look
up tule (which, moreover, I assumed to be a Nyamwezi word) in the
dictionary. That it was supposed to involve a joke of some sort, I
gathered from the general laughter, whenever I asked its meaning. At
last we arrived at the fact that kofia tule means a small, flat cap—in
itself a ridiculous name for a man, but doubly so applied to this black
super-man with the incredibly vacant face.
Kofia tule, then, comes slowly forward, followed by six more
Wanyamwezi, and some local men whom I have engaged as extra
carriers. With him as their mnyampara they are to take my
collections down to the Coast, and get them stored till my return in
the cellars of the District Commissioner’s office at Lindi. The final
instructions are delivered, and then comes the order, “You here, go
to the left,—we are going to the right. March!” Our company takes
some time to get into proper marching order, but at last everything
goes smoothly. A glance northward over the plain assures us that
Kofia tule and his followers have got up the correct safari speed; and
we plunge into the uninhabited virgin pori.
There is something very monotonous and fatiguing about the
march through these open woods. It is already getting on for noon,
and I am half-asleep on my mule, when I catch sight of two black
figures, gun in hand, peeping cautiously round a clump of bushes in
front. Can they be Wangoni?
For some days past we have heard flying rumours that Shabruma,
the notorious leader of the Wangoni in the late rebellion, and the last
of our opponents remaining unsubdued, is planning an attack on
Nakaam, and therefore threatening this very neighbourhood. Just as
I look round for my gun-bearer, a dozen throats raise the joyful shout
of “Mail-carrier!” This is my first experience of the working of the
German Imperial Post in East Africa; I learnt in due course that,
though by no means remunerative to the department, it is as nearly
perfect as any human institution can be. It sounds like an
exaggeration, but it is absolutely true, to say that all mail matter,
even should it be only a single picture post-card, is delivered to the
addressee without delay, wherever he may be within the postal area.
The native runners, of course, have a very different sort of duty to
perform from the few miles daily required of our home functionaries.
With letters and papers packed in a water-tight envelope of oiled
paper and American cloth, and gun on shoulder, the messenger trots
along, full of the importance of his errand, and covers enormous
distances, sometimes, it is said, double the day’s march of an
ordinary caravan. If the road lies through a district rendered unsafe
by lions, leopards, or human enemies, two men are always sent
together. The black figures rapidly approach us, ground arms with
soldierly precision and report in proper form:—Letters from Lindi
for the Bwana mkubwa and the Bwana mdogo—the great and the
little master. As long as Mr. Ewerbeck was with us, it was not easy for
the natives to establish the correct precedence between us. Since they
ranked me as the new captain, they could not possibly call me
Bwana mdogo. Now, however, there is not the slightest difficulty,—
there are only two Europeans, and I being, not only the elder, but
also the leader of the expedition, there is nothing to complicate the
usual gradation of ranks.