Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Data Management, Analytics and

Innovation: Proceedings of ICDMAI


2021, Volume 2 1st Edition Neha
Sharma
Visit to download the full and correct content document:
https://ebookmeta.com/product/data-management-analytics-and-innovation-proceedin
gs-of-icdmai-2021-volume-2-1st-edition-neha-sharma/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Proceedings of Data Analytics and Management: ICDAM


2021, Volume 1 Deepak Gupta

https://ebookmeta.com/product/proceedings-of-data-analytics-and-
management-icdam-2021-volume-1-deepak-gupta/

Data Analytics and Management: Proceedings of ICDAM


Ashish Khanna (Editor)

https://ebookmeta.com/product/data-analytics-and-management-
proceedings-of-icdam-ashish-khanna-editor/

Security, Privacy and Data Analytics: Select


Proceedings of ISPDA 2021 Udai Pratap Rao

https://ebookmeta.com/product/security-privacy-and-data-
analytics-select-proceedings-of-ispda-2021-udai-pratap-rao/

Machine Learning and Big Data Analytics (Proceedings of


International Conference on Machine Learning and Big
Data Analytics (ICMLBDA) 2021) 1st Edition Rajiv Misra

https://ebookmeta.com/product/machine-learning-and-big-data-
analytics-proceedings-of-international-conference-on-machine-
learning-and-big-data-analytics-icmlbda-2021-1st-edition-rajiv-
Artificial Intelligence: Theory and Applications:
Proceedings of AITA 2023, Volume 2 1st Edition Harish
Sharma

https://ebookmeta.com/product/artificial-intelligence-theory-and-
applications-proceedings-of-aita-2023-volume-2-1st-edition-
harish-sharma/

Cambridge IGCSE and O Level History Workbook 2C - Depth


Study: the United States, 1919-41 2nd Edition Benjamin
Harrison

https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/

Big Data Analytics and Knowledge Discovery: 23rd


International Conference, DaWaK 2021, Virtual Event,
September 27–30, 2021, Proceedings 1st Edition
Golfarelli
https://ebookmeta.com/product/big-data-analytics-and-knowledge-
discovery-23rd-international-conference-dawak-2021-virtual-event-
september-27-30-2021-proceedings-1st-edition-golfarelli/

Proceedings of International Conference on Data Science


and Applications : ICDSA 2021, Volume 1 1st Edition
Mukesh Saraswat

https://ebookmeta.com/product/proceedings-of-international-
conference-on-data-science-and-applications-
icdsa-2021-volume-1-1st-edition-mukesh-saraswat/

The 2021 International Conference on Machine Learning


and Big Data Analytics for IoT Security and Privacy:
SPIoT-2021 Volume 2 (Lecture Notes on Data Engineering
and Communications Technologies, 98) John Macintyre
(Editor)
https://ebookmeta.com/product/the-2021-international-conference-
on-machine-learning-and-big-data-analytics-for-iot-security-and-
privacy-spiot-2021-volume-2-lecture-notes-on-data-engineering-
Lecture Notes on Data Engineering
and Communications Technologies 71

Neha Sharma
Amlan Chakrabarti
Valentina Emilia Balas
Alfred M. Bruckstein Editors

Data Management,
Analytics and
Innovation
Proceedings of ICDMAI 2021, Volume 2
Lecture Notes on Data Engineering
and Communications Technologies

Volume 71

Series Editor
Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data
technologies and communications. It will publish latest advances on the engineering
task of building and deploying distributed, scalable and reliable data infrastructures
and communication systems.
The series will have a prominent applied focus on data technologies and
communications with aim to promote the bridging from fundamental research on
data science and networking to data engineering and communications that lead to
industry products, business knowledge and standardisation.
Indexed by SCOPUS, INSPEC, EI Compendex.
All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/15362


Neha Sharma · Amlan Chakrabarti ·
Valentina Emilia Balas · Alfred M. Bruckstein
Editors

Data Management, Analytics


and Innovation
Proceedings of ICDMAI 2021, Volume 2
Editors
Neha Sharma Amlan Chakrabarti
Analytics and Insights A. K. Choudhury School of Information
Tata Consultancy Services Technology
Pune, Maharashtra, India Kolkata, West Bengal, India

Valentina Emilia Balas Alfred M. Bruckstein


Aurel Vlaicu University of Arad Faculty of Computer Science
Arad, Romania Technion—Israel Institute of Technology
Haifa, Israel

ISSN 2367-4512 ISSN 2367-4520 (electronic)


Lecture Notes on Data Engineering and Communications Technologies
ISBN 978-981-16-2936-5 ISBN 978-981-16-2937-2 (eBook)
https://doi.org/10.1007/978-981-16-2937-2

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

These two volumes constitute the proceedings of the International Conference on


Data Management, Analytics and Innovation (ICDMAI 2021) held from 15 to 17
January 2021 on a virtual platform due to pandemic. ICDMAI is a signature confer-
ence of Society for Data Science (S4DS) which is a not-for-profit professional asso-
ciation established to create a collaborative platform for bringing together technical
experts across industry, academia, government laboratories and professional bodies
to promote innovation around data science. ICDMAI is committed to create a forum
which brings data science enthusiasts on the same page and envisions its role towards
its enhancement through collaboration, innovative methodologies and connections
throughout the globe.
This year is special, as we have completed 5 years, and it gives us immense
satisfaction to put on record that we could successfully create a strong data science
ecosystem. In these 5 years, we could bring 50 doyens of data science as keynote
speakers and another set of 50 technical experts contributed towards workshops and
tutorials. Besides, we could engage around 200 experts as reviewers and session
chairs. Till date, we have received around 2093 papers from 42 countries, out of
which 361 papers have been presented and published, which is just 17% of submitted
papers. Now, coming to the specifics of this year, we witnessed participants from 13
countries, 15 industries and 121 international and Indian universities. A total of 63
papers were selected after rigorous review process for oral presentation, and the Best
Paper Awards were given for each track.
We tried our best to bring a bouquet data science through various workshops,
tutorials, keynote sessions, plenary talks, panel discussion and paper presenta-
tions by the experts at ICDMAI 2021. The chief guest of the conference was
Prof. Ashutosh Sharma, Secretary, DST, Government of India, and the guest of
honours were Prof. Anupam Basu, Director, NIT Durgapur, and Mr. Ravinder Pal
Singh, CEO, Merkhado RHA and GoKaddal. Keynote speakers were the top-level
experts like Phillip G. Bradford, Director, Computer Science Program, University of
Connecticut, Stamford; Sushmita Mitra, IEEE Fellow and Professor, Machine Intel-
ligence Unit, Indian Statistical Institute, Kolkata; Sandeep Shukla, IEEE Fellow and
Professor, Department of CSE, Indian Institute of Technology Kanpur, Uttar Pradesh;
Regiane Relva Romano, Special Adviser to the Ministry of Science, Technology
v
vi Preface

and Innovation, Brazil; Yogesh Kulkarni, Principal Architect (CTO Office), Icertis,
Pune; Dr. Aloknath De, Corporate Vice President of Samsung Electronics, South
Korea, and Chief Technology Officer of Samsung R&D Institute India, Bangalore;
Sourabh Mukherjee, Vice President, Data and Artificial Intelligence Group, Accen-
ture; Pallab Dasgupta, Professor, Department of Computer Science and Engineering,
IIT Kharagpur; and Alfred M. Bruckstein, Technion—Israel Institute of Technology,
Faculty of Computer Science, Israel. Pre-conference was conducted by Dipanjan
(DJ) Sarkar, Data Science Lead at Applied Materials; Usha Rengaraju, Polymath
and India’s first women Kaggle Grandmaster; Avni Gupta, Senior Data Analyst—
IoT, Netradyne; Kranti Athalye, Sr. Manager, University Relations, IBM; Sonali
Dey, Business Operations Manager, IBM; Amol Dhondse, Senior Technical Staff
Member, IBM; and Vandana Verma Sehgal, Security Solutions Architect, IBM. All
the experts took the participants through various perspectives of data and analytics.
The force behind organizing ICDMAI 2021 was of the general chair Dr. P. K.
Sinha, Vice-Chancellor and Director, IIIT, New Raipur; Prof. Amol Goje, Pres-
ident, S4DS; Prof. Amlan Charabarti, Vice President, S4DS; Dr. Neha Sharma,
Secretary, S4DS; Executive Body Members of S4DS—Dr. Inderjit Barara, Dr.
Saptarsi Goswami, Mr. Atul Benegiri and all the superactive volunteers. There
was a strong support from our technical partner—IBM, knowledge partner—
Wizer, academic partners—IIT Guwahati and NIT Durgapur and publication partner
Springer. Through this conference, we could build the strong data science ecosystem.
Our special thanks go to Fatos Xhafa, Technical University of Catalonia,
Barcelona, Spain (Series Editor, Springer, Lecture Notes on Data Engineering
and Communications Technologies) for the opportunity to organize this guest-
edited volume. We are grateful to Springer, especially to Mr. Aninda Bose (Senior
Publishing Editor, Springer India Pvt. Ltd.), for the excellent collaboration, patience
and help during the evolvement of this volume.
We are confident that the volumes will provide state-of-the-art information to
professors, researchers, practitioners and graduate students in the areas of data
management, analytics and innovation, and all will find this collection of papers
inspiring and useful.

Pune, India Neha Sharma


Kolkata, India Amlan Chakrabarti
Arad, Romania Valentina Emilia Balas
Haifa, Israel Alfred M. Bruckstein
Contents

Track I
Simulation of Lotka–Volterra Equations Using Differentiable
Programming in Julia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Ankit Roy
Feature Selection Strategy for Multi-residents Behavior Analysis
in Smart Home Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
John W. Kasubi and D. H. Manjaiah
A Comparative Study on Self-learning Techniques for Securing
Digital Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Dev Kumar, Shruti Kumar, and Vidhi Khathuria
An Intelligent, Geo-replication, Energy-Efficient BAN Routing
Algorithm Under Framework of Machine Learning and Cloud
Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Annwesha Banerjee Majumder, Sourav Majumder, Somsubhra Gupta,
and Dharmpal Singh
New Credibilistic Real Option Model Based
on the Pessimism-Optimism Character of a Decision-Maker . . . . . . . . . . . 55
Irina Georgescu, Jani Kinnunen, and Mikael Collan
Analysis of Road Accidents in India and Prediction of Accident
Severity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Sajal Jain, Shrivatsa Krishna, Saksham Pruthi, Rachna Jain,
and Preeti Nagrath
Mining Opinion Features and Sentiment Analysis with Synonymy
Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Sourya Chatterjee and Saptarsi Goswami

vii
viii Contents

Understanding Employee Attrition Using Machine Learning


Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Agnibho Hom Chowdhury, Sourav Malakar, Dibyendu Bikash Seal,
and Saptarsi Goswami

Track II
Fake News Detection: Experiments and Approaches Beyond
Linguistic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Shaily Bhatt, Naman Goenka, Sakshi Kalra, and Yashvardhan Sharma
Object Recognition and Classification for Robotics Using
Virtualization and AI Acceleration on Cloud and Edge . . . . . . . . . . . . . . . . 129
Aditi Patil and Nida Sahar Rafee
Neural Networks Application in Predicting Stock Price of Banking
Sector Companies: A Case Study Analysis of ICICI Bank . . . . . . . . . . . . . 141
T. Ananth Narayan
Epilepsy Seizure Classification Using One-Dimensional
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Gautam Manocha, Harit Rustagi, Sang Pri Singh, Rachna Jain,
and Preeti Nagrath
Syntactic and Semantic Knowledge-Aware Paraphrase Detection
for Clinical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Sudeshna Jana, Abir Naskar, Tirthankar Dasgupta, and Lipika Dey
Enhanced Behavioral Cloning-Based Self-driving Car Using
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Uppala Sumanth, Narinder Singh Punn, Sanjay Kumar Sonbhadra,
and Sonali Agarwal
Early Detection of Parkinson’s Disease Using Computer Vision . . . . . . . . 199
Sabina Tandon and Saurav Verma
Sense the Pulse: A Customized NLP-Based Analytical Platform
for Large Organization—A Data Maturity Journey at TCS . . . . . . . . . . . . 209
Chetan Nain, Ankit Dwivedi, Rishi Gupta, and Preeti Ramdasi

Track III
Fact-Finding Knowledge-Aware Search Engine . . . . . . . . . . . . . . . . . . . . . . . 225
Sonam Sharma
Automated Data Quality Mechanism and Analysis
of Meteorological Data Obtained from Wind-Monitoring
Stations of India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Y. Srinath, Krithika Vijayakumar, S. M. Revathy, A. G. Rangaraj,
N. Sheelarani, K. Boopathi, and K. Balaraman
Contents ix

Efficient and Secure Storage for Renewable Energy Resource Data


Using Parquet for Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
A. G. Rangaraj, A. ShobanaDevi, Y. Srinath, K. Boopathi, and K. Balaraman
Data Processing and Analytics for National Security Intelligence:
An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
G. S. Mani
Framework of EcomTDMA for Transactional Data Mining Using
Frequent Item Set for E-Commerce Application . . . . . . . . . . . . . . . . . . . . . . 317
Pradeep Ambavane, Sarika Zaware, and Nitin Zaware

Track IV
A Survey on Energy-Efficient Task Offloading and Virtual
Machine Migration for Mobile Edge Computation . . . . . . . . . . . . . . . . . . . . 333
Vaishali Joshi and Kishor Patil
Quantitative Study on Barriers of Adopting Big Data Analytics
for UK and Eire SMEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
M. Willetts, A. S. Atkins, and C. Stanier
Post-quantum Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
Sawan Bhattacharyya and Amlan Chakrabarti
A Comprehensive Study of Security Attack on VANET . . . . . . . . . . . . . . . . 407
Shubha R. Shetty and D. H. Manjaiah
Developing Business-Business Private Block-Chain Smart
Contracts Using Hyper-Ledger Fabric for Security, Privacy
and Transparency in Supply Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
B. R. Arun Kumar
Data-Driven Frameworks for System Identification of a Steam
Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Nivedita Wagh and S. D. Agashe

Track V
An Efficient Obstacle Detection Scheme for Low-Altitude UAVs
Using Google Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Nilanjan Sinhababu and Pijush Kanti Dutta Pramanik
Estimating Authors’ Research Impact Using PageRank Algorithm . . . . . 471
Arpan Sardar and Pijush Kanti Dutta Pramanik
Research Misconduct and Citation Gaming: A Critical Review
on Characterization and Recent Trends of Research Manipulation . . . . . 485
Joyita Chakraborty, Dinesh K. Pradhan, and Subrata Nandi
x Contents

Dynamic Price Prediction of Agricultural Produce for E-Commerce


Business Model: A Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . 493
Tumpa Banerjee, Shreyashee Sinha, and Prasenjit Choudhury
Real-Time Facial Recognition Using SURF-FAST . . . . . . . . . . . . . . . . . . . . 505
Showmik Setta, Shreyashee Sinha, Monalisa Mishra, and Prasenjit Choudhury
Microblog Analysis with Machine Learning for Indic Languages:
A Quick Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Manob Roy

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535


About the Editors

Neha Sharma is working with Tata Consultancy Services and is also a Founder
Secretary of Society for Data Science. Prior to this she has worked as Director of
premier Institute of Pune, that run post-graduation courses like MCA and MBA. She
is an alumnus of a premier College of Engineering and Technology, Bhubaneshwar
and completed her PhD from prestigious Indian Institute of Technology, Dhanbad.
She is an ACM Distinguished Speaker, a Senior IEEE member and Secretary of
IEEE Pune Section. She is the recipient of “Best PhD Thesis Award” and “Best
Paper Presenter at International Conference Award” at National Level. Her area of
interest includes Data Mining, Database Design, Analysis and Design, Artificial
intelligence, Big data, Cloud Computing, Block Chain and Data Science.

Prof. Amlan Chakrabarti is a Full Professor in the School of I.T. at the University
of Calcutta. He was a Post-Doctoral fellow at the Princeton University, USA during
2011–2012. He has almost 20 years of experience in Engineering Education and
Research. He is the recipient of prestigious DST BOYSCAST fellowship award in
Engg. Science (2011), JSPS Invitation Research Award (2016), Erasmus Mundus
Leaders Award from EU (2017), Hamied Visiting Professorship from University of
Cambridge (2018). He is an Associate Ed. of Elsevier Journal of Computers and
Electrical Engg. and Guest Ed. of Springer nature Journal in Applied Sciences. He
is a Sr. Member of IEEE and ACM, IEEE Comp. Society Distinguished Visitor,
Distinguished Speaker of ACM, Secretary of IEEE CEDA India Chapter and Vice
President of Data Science Society.

Prof. Valentina Emilia Balas is currently Full Professor in the Department of


Automatics and Applied Software at the Faculty of Engineering, “Aurel Vlaicu”
University of Arad, Romania. She is author of more than 300 research papers. Her
research interests include intelligent systems, fuzzy control, soft computing, smart
sensors, information fusion, modeling and simulation. She is the Editor-in Chief of
the IJAIP and IJCSysE journals in Inderscience. She is the Director of the Depart-
ment of International Relations and Head of Intelligent Systems Research Centre in
Aurel Vlaicu University of Arad.

xi
xii About the Editors

Professor Alfred M. Bruckstein BSc, MSc in EE from the Technion IIT, Haifa,
Israel, and PhD in EE, from Stanford University, Stanford, California, USA, is a Tech-
nion Ollendorff Professor of Science, in the Computer Science Department there,
and is a Visiting Professor at NTU, Singapore, in the SPMS. He has done research
on Neural Coding Processes, and Stochastic Point Processes, Estimation Theory,
and Scattering Theory, Signal and Image Processing Topics, Computer Vision and
Graphics, and Robotics. Over the years he held visiting positions at Bell Laborato-
ries, Murray Hill, NJ, USA, (1987–2001) and TsingHua University, Beijing, China,
(2002–2023), and made short time visits to many universities and research centers
worldwide. At the Technion, he was the Dean of the Graduate School, and is currently
the Head of the Technion Excellence Program.
Track I
Simulation of Lotka–Volterra Equations
Using Differentiable Programming
in Julia

Ankit Roy

Abstract In this paper, we explore the usage of differentiable programming in


computer programs. Differentiable programming allows a program to be differenti-
ated, meaning you can set a certain task that you want to be optimized. This skill is
extreme, as it gives the author of the program the ability to choose which areas are
needed to be optimized. Differentiable programming also allows for easy parallelism,
allowing for parallel parts of a program to be run together. We aim to use Julia and
its Flux libraries in order to simulate the Lotka–Volterra equations, also known as
the predator–prey equations to show the capabilities of differentiable programming
to simulate two equations simultaneously and discuss the benefits of our approach.

1 Introduction

Automatic differentiation is defined as the process in which computer programs take


the derivatives of certain equations, usually through the chain rule [1]. Automatic
differentiation differs though, as it is involved with more than simple chains of oper-
ations, working as the bridge point between programming and calculus. As a result,
a need for a programming technique built directly to handle automatic differentiation
is needed. Problems arise with existing popularized programming languages. When
taking multiple gradients, commonly in neural networks, a need for a program that
is differentiable is needed. As described by Liao, Liu, Wang, and Xiang, the idea
of differentiable programming emerges from deep learning but it can be applied to
other than simply training neural networks. By using differentiable programming,
one can compute higher-order derivatives of the program accurately and efficiently
using automatic differentiation. Differentiable programming means that you can set
a certain task that you need to optimize, then calculate the gradient with respect to
the task, and then fine-tune and fix the task in the direction of the gradient. Differen-
tiability is what enables deep learning. Instead of trying a brute-force method, which

A. Roy (B)
Westfield High School, Chantilly, VA, USA

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 3
N. Sharma et al. (eds.), Data Management, Analytics and Innovation,
Lecture Notes on Data Engineering and Communications Technologies 71,
https://doi.org/10.1007/978-981-16-2937-2_1
4 A. Roy

can turn into an extremely expensive process given a few hundred parameters, differ-
entiable programming instead allows us to take a pseudo-walk around parameter
space to find a good set to optimize. As a result, differentiable programming in deep
learning means that you can not only easily shift heavily parameterized models into
much simpler structures, but also heavily reduce the time and increase the efficiency
of a program. Additionally, differentiable programming exists in the intersection
between programming and calculus; it is a technique and language built specifically
for the optimization of various differential equations.
Already existing popular languages for artificial intelligence learning models such
as Python also lack efficient intrinsic parallelism, meaning that the program you run
is not efficient in running two or more parallel tasks at the accurate time. In differ-
entiable programming, speed is necessary as you differentiate a program, allowing
quick and easy tasks to be run through. Though existing Python libraries such as
PyTorch or TensorFlow are fast in running various models, such as a CNN or a
RNN, they lack the speed to execute networks built up of smaller operations. As
a result, programs such as Swift and Julia have become popular in their differen-
tiable programming implementation. In this paper, we aim to show the efficiency of
differentiable programming by running simulations of the Lotka–Volterra equations.

dx
= αx − βx y
dt
dy
= δx y − γ y
dt
We use the Flux libraries used for differentiable programming within Julia in
order to simulate differential equations. The Lotka–Volterra equations are defined
as two parallel first-order differential equations, and we aim to use differentiable
programming to be able to simulate both differential equations simultaneously.

2 Dataset

The dataset is an automatically generated dataset approximating that of a pendulum


or a sinusoidal curve, based on factors such as the percentage of error contribution in
the data, the sparsity of data points, and the number of data points. After generating
a possible dataset, we then apply differentiable programming techniques to generate
simulations of the datasets (Figs. 1 and 2).
Simulation of Lotka–Volterra Equations … 5

Fig. 1 Julia code generating datasets

3 Approach Overview

We aim to show an example of the Lotka–Volterra model, differential equations that


aim to show the relationship between a predator and prey in a biological ecosystem.
The first-order differential equation of the model is defined as:

dx
= αx − βx y
dt
dy
= δx y − γ y
dt

where x is the current number of prey, y is the current number of predators, dxdt
and
dy
dt
represent the change in prey and predators over time, respectfully, and α, β, γ,
δ are parameters that describe biological interactions between the two species. We
first set up these equations in Julia:
6 A. Roy

Fig. 2 Generation of datasets based on different factors

After setting up the equations, we use the ordinary differential equation (ODE)
solver in existing libraries in Julia. The ODE aims to solve the differential equation:

du
= f (u, p, t)
dt
Simulation of Lotka–Volterra Equations … 7

where p represents parameters and t represents a time interval. Setting up into Julia,
instead of simply passing one differential equation into the ODE, we set the entire
Lotka–Volterra model into the ODE, allowing us to work parallel on both equations
simultaneously. The Tsit5() represents the algorithm used, the Tsitouras 5/4 Runge–
Kutta method.

With the differential equations set up and represented in ODE, we turn to the Flux
library for solving and representing the models. We first set up our parameters, α, β,
γ , δ, explained above with Flux.

In order to set up a trainable problem, we create our predict function, represented


with the solve() function in the ODE earlier, as well as a defined loss function in
Flux using the predict function. We also use the generated datasets to set up data to
further train our model on later.

Finally, we set up the model to train in Flux to generate the Lotka–Volterra graphs.
We use the ADAM optimizer and train the model using Flux, passing the loss function,
parameters, data, optimizer, and a function to display data.
8 A. Roy

Fig. 3 Depiction of the Lotka–Volterra model

4 Results

After training our model in Flux, a graph is auto-generated showing the relationship
between the two differential equations. In the Lotka–Volterra model, we expect a
semi-inverse relationship between the predator and prey. As stated before, the model
shows the relationship between population between two species. The equations esti-
mate that with the decrease in the number of “predators” (shown as u2), it leads to
an eventual increase in the number of “prey” (shown as u1). This eventual increase
of prey leads to an increase of predators, which leads to the fluctuations and cycles
seen in the graph. This graph shows the power of differentiable programming in
relation to mathematical modeling: being able to simulate two differential equations
simultaneously (Fig. 3).

5 Conclusion

In this paper, we discussed the advantages of differentiable programming with solving


and representing differential equations over other traditional programs. We discussed
the usage of Julia for differentiable programming due to its capabilities with existing
libraries. We showed the advantage through the simulation of the Lotka–Volterra
model with Flux, which is optimized to work with two parallel equations. Differ-
entiable programming is a very promising field in the intersection of calculus and
Simulation of Lotka–Volterra Equations … 9

computer programming. More work needs to be done in improving differentiable


programming; limitations of existing frameworks make it difficult to implement this
technique into models of higher complexity. Nevertheless, we hope that the demon-
stration of the simulations provided key insights of the usefulness of differentiable
programming.

Acknowledgements The author of this paper would also like to thank Dr. Himadri Nath Saha for
his help in the idea of the paper and the support shown throughout writing the paper.

References

1. Abadi M, Plotkin GD, A Simple Differrentiable Programming Language


2. Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud D, Neural Ordinary Differential Equations
3. Wang F, Zheng D, Decker J, Wu X, Essertel GM, Rompf T, Demystifying Differentiable
Programming: Shift/Reset the Penultimate Backpropagator
4. Li T-M, Gharbi M, Adams A, Durand F, Ragan-Kelley J, Differentiable Programming for Image
Processing and Deep Learning in Halide
5. Innes M, Saba E, Fischer K, Gandhi D, Rudilosso MC, Joy NM, Karmali T, Pal A, ShahV,
Fashionable Modelling with Flux
6. Innes M, Flux: Elegant machine learning with Julia
7. Besard T, Foket C, De Sutter B, Effective Extensible Programming: Unleashing Julia on GPUs
8. Hernandez A, Amigo J, Differentiable programming and its applications to dynamical systems
Feature Selection Strategy
for Multi-residents Behavior Analysis
in Smart Home Environment

John W. Kasubi and D. H. Manjaiah

Abstract Feature selection (FS) plays vital role in reducing computing complexity
of the models due to irrelevant features in the data with the intention to develop
better predictive models. This process involves selecting significant features to apply
in machine learning for model building, where redundant features are removed and
new features developed. This approach involves selecting suitable features for use,
removing redundant features and create new feature in the process of building models.
The study focused on developing a predictive model that performs best for daily living
activities (ADLs) using Activity Recognition with Ambient Sensing (ARAS). In this
regard, we used feature importance, univariate, and correlation matrix to prepare
ARAS dataset before modeling the data. The following algorithms were used to assess
the accuracy of selected features, this includes; Logistic Regression (LR), SVM and
KNN to learn and analyze the data. The results show that SVM outperformed in
both House A and B, compared to other algorithms. Support Vector Machine (SVM)
performed best on univariate feature selection with 10 features compared to 5 features
with the accuracy of 100% from both House A and House B, while on feature
importance selection SVM performed best with 5 features compared to 10 features
with the accuracy of 99% from House A and 100% accuracy from House B. The
feature selection has improved the prediction accuracy in ARAS dataset compared
to the previous results, which achieved the accuracy of 61.5% in average score in
House A and 76.2% accuracy for House B.

J. W. Kasubi (B) · D. H. Manjaiah


Department of Computer Science, Mangalore University, Karnataka 574199, India
D. H. Manjaiah
e-mail: drmdhmu@gmail.com
J. W. Kasubi
Local Government Training Institute, Dodoma, Tanzania

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 11
N. Sharma et al. (eds.), Data Management, Analytics and Innovation,
Lecture Notes on Data Engineering and Communications Technologies 71,
https://doi.org/10.1007/978-981-16-2937-2_2
12 J. W. Kasubi and D. H. Manjaiah

1 Introduction

The feature selection (FS) is very important practice in model development that
greatly affects the model’s efficiency, as result it offers a lot of benefits such as
simplifications of the model and better understanding of the data for easy interpre-
tations, model accuracy improvement, reducing model overfitting and shorten the
training time of the model. Selecting features that contribute most to the perfor-
mance of the model can be done automatically or manually, in this study we opted
automatically techniques to prepare our data before modeling the data [1].
The FS method involves three methods, namely embedded, wrapper-based and
filter-based. Filter-based techniques are used to select selects features based on a
performance measure, irrespective of the machine learning algorithms that will be
employed later while Wrapper techniques are the method of selecting features that we
are trying to fit into a given dataset based on a specific machine learning algorithm and
the Embedded approach incorporates the benefits of filter and wrapper techniques
to perform feature selection. In this study, we deployed filter techniques to carter
for feature selection due to its advantages over others includes user friendly and
provide better results though the wrap base method provided accurate results than
filter methods, with wrapper methods increasing the cost of processing [2].
Smart Home pays a vital role in human activity recognition, as a result, helps diag-
nose diseases at early stage, not only do smart homes provide health care services,
in addition, it utilizes the IoT technologies to track dangerous activities from occur-
ring at homes, control and monitor of energy and water usage that are taking place
at home and we can make home a better place to live by automating whatever we
need to automate, limitation is where our imaginations stop [3, 4]. Human activity
recognition (HAR) is used to explore different activities performed by human within
the smart home in the presence of sensors in this regard plays a significant role in
monitoring the daily activities of human life which result into healthcare, security,
electricity and water usage [5].
This study was carried out using the ARAS dataset, which involved 27 different
types of activities. The ARAS Dataset in smart home was collected using the installed
sensor into different household appliances such refrigerators, kitchen, sitting room,
bedroom, toilet, Laundry and so forth, of which at the end generated this huge amount
of data (5,184,000 instances) [6].
The research purposes for ARAS Dataset are to enhancing quality of life and
maintain the comfort of its residents, for this matter smart home must be competent
to collect all behavior changes performed by residents in their daily activities in
order to be able to extract hidden knowledge and insights through using machine
learning algorithms. Healthcare experts strongly agree that monitoring for changes
in the ADLs is the better ways to detect potential health problems before they become
uncontrollable [7]. Recognizing activities performed by smart home residents and
their activity of daily living can significantly assist in offering healthcare, security,
grid and water usage, automation, and more importance for the quality of human life.
However, feature selection plays a vital role in ARAS dataset to influence feature
Feature Selection Strategy for Multi-residents Behavior … 13

values to activity recognition performance of the models, and test the relationship
between different activities and activity recognition accuracy rate.
This paper contributes by performing feature selection on ARAS dataset which is
used to prepare our data before modeling the data [8]. Three machine learning algo-
rithms were subsequently used to improve the effectiveness of the selected features
in ARAS dataset to predict future outcomes and the results demonstrate that higher
recognition rates are produced by the proposed techniques.
This work is structured as follow: the relevant works are briefly explained in
Sect. 2; Sect. 3; express the resources and methods applied; Sect. 4 presents the
experimental outcomes and discussions; Sect. 5, presents the conclusions and give
suggestions for future study.

2 Related Work

This part explains previous related works reviewed in relation to smart home and
feature selection techniques, the reviewed articles are as shown below:
Alberdi et al. [9] suggested feature selection to be used to detect the multimodal
symptoms in smart home, classification models were developed to recognize correct
complete modification of scores that predict symptoms, for this matter different algo-
rithms were used to resolve levels difference. The results show that feature selection
boosted the model accuracy and not all behavioral patterns contribute equally to a
symptom’s prediction.
Shangfeng et al. [10] conducted a study on human activity in smart home with
evaluation to develop human activity model using extreme learning machine (ELM),
using CASAS dataset. The outcome shows that ELM in ADL was improved after
conducting feature selection and hidden neural networks lead to distinct recognition
accuracy.
Labib et al. [11] provided a study on activity recognition in smart home based
on basic activities performed by residents in the smart home such that old living
people residents may facilitate their own homes such as cooking meals or watching
TV independently. Experimental evaluation used Kastern and CASAS (kyoto1 and
Kyoto7) dataset to carter for the same. The results show that Kyoto7 dataset obtained
accuracy of 77%, while Kasteren and Kyoto1 achieved accuracy of 93% and 97%,
respectively; the model accuracy was good after the researcher performing feature
selection to the respective datasets.
Hameed et al. [12] applied feature selection using DT and RFE to remove irrele-
vant features in the dataset, for this matter; Enhanced ELR classifier, LR and MLP
were used in prediction, as a result ELR classifier outperformed compared to LR and
MLP after feature selection.
Manoj et al. [13] proposed ACO and ANN for feature selection by using hybrid
method, in order to remove unnecessary and redundant features from the dataset.
The experimental results outperformed by using hybrid algorithms as compared to
the previous results after performing feature selection.
14 J. W. Kasubi and D. H. Manjaiah

Liu et al. [14] presented the idea to recognize ADL by applying feature selection in
smart home using PCC; researcher engaged three machine learning algorithms to test
the efficiency of the recommended approach in ADL detection. The experimental
results show that the recommended method produces better outcome of the ADL
recognition.
Tanaka et al. [15] suggested a swarm optimizer to support feature selection model
for KLR. The proposed technique reduces unnecessary and redundant features, as
the result the experiment demonstrates the increase in the generalization efficiency
of the method proposed.
Fang et al. [16] presented BP algorithm outperformed on feature selection based on
smart home, the study used three machine learning techniques. The results concluded
that after performing feature selection on the research dataset, the ADL detection
performance of the NN using the BP algorithm is robust than NB and HMM Model.
Abudalfa et al. [17], the researcher, evaluated semi-supervised clustering for ADL
detection with different ML approaches to carter for the same. The use of feature
selection improved the performance, the values increased and the confidence level
is decreased. The experimental results show that the presented technique provided
remarkable accuracy; the performance was improved significantly when applying
more sophisticated feature selection techniques.
Pablo et al. [18] presented feature selection for IoT using wrapper-based feature
selection method by merging RFE and GBTs that were able to pick the most signifi-
cant attributes automatically. The experimental shows better results, compared before
performing feature selection.
Oukrich [19] conducted a research on feature selection for ADL recognition in
Smart Home using Cairo and Aruba datasets. Researchers commonly tested several
methods, such as VBN, KNN, Hidden Markov, DT, SVM, CRFs. The outcomes show
that the accuracy attained in Aruba dataset was 90.05% and 88.49% for Cairo, sugges-
tions were given to use deep learning techniques, as compared to other traditional
machine learning algorithms, they are proved to be effective and robust.
Minor and Cook [20] compared the performance of three classifiers; regression
tree, linear regression and SVM and applied them to CASAS datasets for forecasting
the future occurrence of activities. The experimental outcome discovered that regres-
sion tree produces better predicts of activity with lower error, faster training time and
capacity to handle more complex datasets than SVM and linear regression classifiers
after running feature selection in the dataset. In order to enhance activity forecasts
with less complexity, researchers proposed adapting other methods of classifiers and
combining numerical forecasting techniques in future work.
In fact, the selection of features is the first phase in model development, which
is used to decrease the model’s complexity by selecting the proper features by
computing the value of each one in the dataset in order to provide a good predictive
model output [21].
Feature Selection Strategy for Multi-residents Behavior … 15

Table 1 List of activities in ARAS dataset


ID Activity ID Activity ID Activity
1 Others 10 Having snack 19 Laundry
2 Going out 11 Sleeping 20 Shaving
3 Preparing breakfast 12 Watching TV 21 Brushing teeth
4 Having breakfast 13 Studying 22 Talking on the phone
5 Preparing lunch 14 Having shower 23 Listening to music
6 Having lunch 15 Toileting 24 Cleaning
7 Preparing dinner 16 Napping 25 Having conversation
8 Having dinner 17 Using Internet 26 Having guest
9 Washing dishes 18 Reading book 27 Changing clothes

3 Methods

This part presents the methods used in this work during the development of
the predictive model for sleep behavior in smart homes using machine learning
techniques.

3.1 ARAS Datasets

Table 1 provides characteristics of the ARAS dataset which was composed of two
residential houses, which involved 27 different types of activities and it was generated
in Turkey in 2013. Activities collected from House A and B every day were as
follows; having shower, toileting, preparing breakfast, having breakfast, preparing
dinner, having dinner, going out, sleeping, having snacks, watching TV, studying,
and reading books while on the other side of the activities that were not performed
in every day were such as washing dishes, napping, laundry, shaving, talking on
the phone, listening to music, having conversation and having guest. In House A
and House B, the aggregate number of occurrences of activities is 2177 and 1023,
respectively. For each day there were 86,400 data points, that consist of the time
stamp, and in this study the prediction on both House A and B show that going out,
sleeping, studying, watching TV and having breakfast were frequently performed by
the residents [22].

3.2 Performance Measurements

We use the following four assessment measures were used for the evaluation of the
proposed approach:
16 J. W. Kasubi and D. H. Manjaiah

TP + TN
Accuracy = (1)
TP + TN + FP + FN
TP
Precision = (2)
TP + FP
TP
Recall = (3)
TP + FN
Precision ∗ Recall
F1-Measure = 2 ∗ (4)
Precision + Recall

whereby TP represents number of True Positive, TN represents number of True


Negative, FP represents number of False Positive and FN represents number of False
Negative.

3.3 Machine Learning Algorithms

In this work we employed different Machine Learning Algorithms to see which


algorithm work best with the ARAS dataset after performing feature selection using
filter method, for this matter evaluated using three algorithms namely; LR, SVM and
KNN to check which model performs best in ARAS dataset for both House A and
B.

3.4 Feature Selection Methods

The feature selection was performed using python programming language to decrease
the complication of the predictive model and to know the influence of the characteris-
tics on the general ARAS dataset forecast. The purpose of the FS process is to define
characteristics that provide the best score and remove unnecessary features that are
likely to cause complexity of the model of which may lead to poor performance of
the model [23].
In this study, we deployed filter techniques to carter for feature selection due to its
advantages over others such as easy to use and also gives good results, though wrap
base method wrapper methods increasing the cost of processing though it provides
enhanced results than filter methods. In this regard, we used different filtering tech-
niques such univariate, feature importance and correlation matrix analysis to prepare
our data before modeling the data [24]. To test the accuracy of the proposed FS
techniques in the ARAS dataset, three machine learning algorithms were used. The
outcome shows that the suggested technique provides better outcomes (Fig. 1).
Feature Selection Strategy for Multi-residents Behavior … 17

Fig. 1 Proposed feature selection approach

4 Experimental Results and Discussions

4.1 Experimental Results

To evaluate the performance of the proposed approaches for ARAS dataset from both
House A and B, we employed LR, SVM and KNN to develop models and perform
comparisons of the results obtained to both House A and B.

4.1.1 Filter Methods

• Univariate Feature Selection

Univariate feature selection is a feature selection that independently explores


each feature to evaluate the magnitude of the feature’s relationship with the response
variable. The univariate feature selection is generally easy to run, understand and
calculate significance of data. Tables 2 and 3 show best features selected by univariate
feature selection technique in both House A and B.
• Feature Importance Selection

Feature importance refers to a feature selection technique that allocates score to


input characteristics which support dependent variable to be predicted. Figures 2
and 3 show best features selected by feature importance technique in both House A
and B.
18 J. W. Kasubi and D. H. Manjaiah

Table 2 Best 10 attribute selected by univariate feature selection in House A


Activity House A
Attribute name Feature score
9 Sensor_ID_11 831,382.401054
18 Sensor_ID_20 463,988.373077
19 Sensor_ID_12 315,096.408903
3 Sensor_ID_05 263,208.650265
16 Sensor_ID_18 262,665.258971
11 Sensor_ID_13 243,929.874075
2 Sensor_ID_04 197,382.065389
4 Sensor_ID_06 91,999.530473
14 Sensor_ID_16 82,968.851006
8 Sensor_ID_10 75,974.156652

Table 3 Best 10 attribute selected by univariate feature selection in House B


Activity House B
Attribute name Feature score
19 Sensor_ID_12 1.919454e + 06
6 Sensor_ID_8 9.256252e + 05
13 Sensor_ID_15 6.811450e + 05
11 Sensor_ID_13 6.332835e + 05
3 Sensor_ID_05 5.924828e + 05
15 Sensor_ID_17 4.995963e + 05
4 Sensor_ID_06 4.919505e + 05
12 Sensor_ID_14 4.712039e + 05
14 Sensor_ID_16 4.603801e + 05
7 Sensor_ID_09 4.388673e + 05

• Correlation Matrix Feature Selection

Correlation-based feature selection refers to method that measures the correlation


between any two variables in the dataset. Figures 4 and 5 show best features selected
by Correlation-based feature selection technique in both House A and B.
Feature Selection Strategy for Multi-residents Behavior … 19

Fig. 2 Best 10 features selected by feature importance selection in House A

Fig. 3 Best features selected by feature importance feature in House A

4.2 Experimental Discussions

The prediction was performed in order to obtain accuracies and matrices measures
using ARAS dataset in House A and B after performing feature selection techniques.
The outcome shows that SVM outperformed in both House A and B; univariate
feature selection performed best with 10 features compared to 5 features with the
accuracy of 100% from both House A and House B using the SVM, while on the side
feature importance selection performed best with 5 features compared to 10 features
with the accuracy of 99% from House A and 100% accuracy from House B using
the SVM. This implies that SVM algorithms performed best in ARAS dataset using
univariate feature selection with 10 numbers of features while feature importance
selection with 5 numbers of features (Tables 4, 5 and 6).
20 J. W. Kasubi and D. H. Manjaiah

Fig. 4 Best features selected by correlation matrix with heatmap in House A

From Table 7, the results show that after performing feature selection on ARAS
Dataset, the performance of the model increased compared to the previous studies,
which achieved the average score of 100% from both House A and B compared to
the previous results which achieved the accuracy of 61.5% in average score in House
A and 76.2% accuracy for House B.

5 Conclusion

The result of the prediction for feature selection in smart home environment using
ARAS dataset from both House A and B, respectively, showed that SVM algorithms
outperformed in feature selection compared to other algorithms such as Logistic
Feature Selection Strategy for Multi-residents Behavior … 21

Fig. 5 Best features selected by correlation matrix with heatmap in House B

Regression (LR) and KNN. The SVM performed best on univariate feature selection
with 10 features compared to 5 features with the accuracy of 100% from both House
A and House B, while on feature importance selection SVM performed best with
5 features compared to 10 features with the accuracy of 99% from House A and
100% accuracy from House B. The feature selection has improved the prediction
accuracy in ARAS dataset compared to the previous results, which achieved the
accuracy of 61.5% in average score in House A and 76.2% accuracy for House
B. For future work, we suggest different algorithms and feature selection methods
like wrap-based and embedded to be employed on ARAS dataset for comparison
purposes and improvement of the accuracy.
22

Table 4 Prediction results for logistic regression (LR) model for House A and B
Feature selection techniques No. of features House A House B
Acc Precision Recall F1-score Acc Precision Recall F1-score
Univariate feature selection 10 0.98 0.99 0.98 0.98 0.99 0.98 0.97 0.99
5 0.91 0.91 0.90 0.90 0.99 0.97 0.98 0.99
Importance feature selection 10 0.96 0.96 0.95 0.94 0.98 0.98 0.99 0.97
5 0.91 0.91 0.92 0.90 0.99 0.99 0.98 0.99
Correlation 10 0.93 0.91 0.90 0.91 0.92 0.90 0.90 0.91
J. W. Kasubi and D. H. Manjaiah
Table 5 Prediction results for SVM model for House A and B
Feature selection techniques No. of features House A House B
Acc Precision Recall F1-score Acc Precision Recall F1-score
Univariate feature selection 10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
5 0.99 0.99 0.98 0.97 0.99 0.98 0.99 0.98
Importance feature selection 10 0.98 0.97 0.98 0.97 0.99 0.96 0.98 0.99
5 0.99 0.98 0.99 0.98 1.00 1.00 1.00 1.00
Feature Selection Strategy for Multi-residents Behavior …

Correlation 10 0.99 0.95 0.97 0.96 0.97 0.95 0.96 0.96


Bold refers to the outperformance of SVM algorithm in feature selection compared to other algorithms
23
24

Table 6 Prediction Results for KNN Model for House A and B


Feature selection techniques No. of features House A House B
Acc Precision Recall F1-score Acc Precision Recall F1-score
Univariate feature selection 10 0.97 0.97 0.97 0.97 0.99 0.96 0.98 0.97
5 0.91 0.91 0.90 0.90 0.93 0.94 0.91 0.93
Importance feature selection 10 0.96 0.96 0.95 0.94 0.91 0.92 0.90 0.90
5 0.92 0.92 0.90 0.91 0.90 0.91 0.89 0.91
Correlation 10 0.96 0.96 0.93 0.95 0.90 0.89 0.87 0.90
J. W. Kasubi and D. H. Manjaiah
Feature Selection Strategy for Multi-residents Behavior … 25

Table 7 Comparisons of prediction results on ARAS Dataset with previous research work
Research study Accuracy-average score Accuracy-average score
House A (%) House B (%)
Current study 100 100
Previous study 61.5 76.2

References

1. Brownlee J (2016) Machine learning mastery with Python: understand your data, create accurate
models, and work projects end-to-end. Machine Learning Mastery
2. Raschka S, Mirjalili V (2017) Python machine learning. Packt Publishing Ltd
3. Kwon M-C, Choi S (2018) Recognition of daily human activity using an artificial neural
network and smartwatch. Wirel Commun Mobile Comput 2018
4. Oukrich N, Maach A et al (2019) Human daily activity recognition using neural networks and
ontology-based activity representation. In: Proceedings of the Mediterranean symposium on
smart city applications. Springer, pp 622–633
5. Wang J, Chen Y, Hao S, Peng X, Lisha Hu (2019) Deep learning for sensor-based activity
recognition: a survey. Pattern Recogn Lett 119:3–11
6. Igwe OM, Wang Y, Giakos GC (2018) “Activity learning and recognition using margin setting
algorithm in smart homes. In: 2018 9th IEEE annual ubiquitous computing, electronics &
mobile communication conference (UEMCON). IEEE
7. Fang H, Srinivasan R, Cook DJ (2012) Feature selections for human activity recognition in
smart home environments. Int J Innov Comput Inf Control 8:3525–3535
8. Alemdar H, Ersoy C (2017) Multi-resident activity tracking and recognition in smart
environments. J Ambient Intell Humaniz Comput 8(4):513–529
9. Alberdi A et al (2018) Smart home-based prediction of multidomain symptoms related to
Alzheimer’s disease. IEEE J Biomed Health Inf 22(6):1720–1731
10. Chen S, Fang H, Liu Z (2020) Human activity recognition based on extreme learning machine
in smart home. J Phys Conf Ser 1437(1)
11. Fahad LG, Tahir FT (2020) Activity recognition in a smart home using local feature weighting
and variants of nearest-neighbors classifiers. J Ambient Intell Humanized Comput, 1–10
12. Hameed J et al (2020) Enhanced classification with logistic regression for short term price
and load forecasting in smart homes. In: 2020 3rd international conference on computing,
mathematics and engineering technologies (iCoMET). IEEE
13. Manoj RJ, Anto Praveena MD, Vijayakumar K (2019) An ACO–ANN based feature selection
algorithm for big data. Cluster Comput 22(2):3953–3960
14. Liu Y et al (2020) Daily activity feature selection in smart homes based on Pearson correlation
coefficient. Neur Process Lett, 1–17
15. Tanaka K, Kurita T, Kawabe T (2007) Selection of import vectors via binary particle swarm
optimization and cross-validation for kernel logistic regression. In: 2007 international joint
conference on neural networks. IEEE
16. Fang H et al (2014) Human activity recognition based on feature selection in smart home using
back-propagation algorithm. ISA Trans 53(5):1629–1638
17. Abudalfa S, Qusa H (2019) Evaluation of semi-supervised clustering and feature selection for
human activity recognition. Int J Comput Digital Syst 8(6)
18. Rodriguez-Mier P, Mucientes M, Bugarín A (2019) Feature selection and evolutionary rule
learning for Big Data in smart building energy management. Cogn Comput 11(3):418–433
19. Oukrich N (2019) Daily human activity recognition in smart home based on feature selection,
neural network and load signature of appliances. PhD thesis
20. Minor B, Cook DJ (2017) Forecasting occurrences of activities. Pervasive Mobile Comput
38:77–91
26 J. W. Kasubi and D. H. Manjaiah

21. Zainab A, Refaat SS, Bouhali O (2020) Ensemble-based spam detection in smart home IoT
devices time series data using machine learning techniques. Information 11(7):344
22. Alemdar H et al (2013) ARAS human activity datasets in multiple homes with multiple resi-
dents. In: 2013 7th international conference on pervasive computing technologies for healthcare
and workshops. IEEE
23. Tang S et al (2019) Smart home IoT anomaly detection based on ensemble model learning
from heterogeneous data. In: 2019 IEEE international conference on big data (big data). IEEE
24. Mohammadi M et al (2018) Deep learning for IoT big data and streaming analytics: a survey.
IEEE Commun Surv Tutor 20(4):2923–2960
A Comparative Study on Self-learning
Techniques for Securing Digital Devices

Dev Kumar, Shruti Kumar, and Vidhi Khathuria

Abstract In the present time, the use of technology in our daily activities is impera-
tive as we employ various technological solutions for activities like banking, commu-
nication, business and e-governance. All of this requires large network infrastruc-
tures, and maintaining the security of the same is still challenging as there is no dearth
of malicious attacks. Our reliability on these systems and the volume of activity that
these systems facilitate makes them more vulnerable to cyber-attacks. Apart from
the traditional firewalls and anti-malware software, tools like the intrusion detection
system (IDS) can be really helpful in increasing the security of our networks and
information systems. There are different techniques by which an intrusion can be
detected in a network. Out of which, a number of machine learning solutions can be
used to develop a robust IDS as they are capable of detecting an intrusion in a network
efficiently with the availability of historical data. In this paper, a comparative study
has been conducted between such techniques that have been leveraged to build an
intelligent IDS. The performance of these models is compared using the accuracy
rate, and it is observed that artificial neural networks give the best accuracy rate, i.e.
98% for network intrusion detection. Most of these experiments were conducted by
their respective authors using the KDDCup99 dataset or the improved NSL-KDD
dataset, both of which are relatively old. In order to build our version of intelligent
IDS, we aim to leverage deep learning algorithms along with recently developed
datasets such as the CICIDS2017 dataset. This comparative study will be helpful to
compare and contrast the various techniques available to develop a competent IDS.

1 Introduction

As more and more people get access to the Internet every day, they are introduced
to a vast resource of knowledge and information. But with this comes the threat of
being a victim of a cyber-attack. Security has become an important and unavoidable

D. Kumar (B) · S. Kumar · V. Khathuria


Department of Information Technology, Thadomal Shahani Engineering College, Bandra (W),
Mumbai 400050, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 27
N. Sharma et al. (eds.), Data Management, Analytics and Innovation,
Lecture Notes on Data Engineering and Communications Technologies 71,
https://doi.org/10.1007/978-981-16-2937-2_3
28 D. Kumar et al.

characteristic of the Internet and other networks. Advancements in Information tech-


nology (IT) have brought a lot of convenience in our lives. A lot of activities that
are vital to us are now made simpler by various IT applications. Many individuals
interact with applications pertaining to e-banking, e-commerce, e-government, etc.
on a daily basis. These interactions involve a lot of sensitive information, and such
interactions are also being conducted on a global scale. It is imperative to ensure that
the security measures taken while planning a network of such scale are of supreme
quality.
In this current age of technological advancements, there are numerous methods for
securing data and devices. These methodologies aim to revamp the security infras-
tructure of information systems and networks of digital devices. Many applications
and services adopted from these methodologies are used in the industry to ensure
network security. These include but are not restricted to firewalls, anti-virus soft-
wares, network segmentation, data loss prevention (DLP) systems, etc. They can be
assumed as a part of the primary security layer. Firewalls and anti-malware software
are commonly installed on individual devices in a network. But as enterprises grow
larger, these networks are scaled up, and new devices are connected to them. Fire-
walls and anti-malware software alone is not enough to protect an entire network
from malicious attacks. Networks of such scale demand a system of complex nature
to ensure the security of the utmost quality. One such system that best fits the purpose
is the intrusion detection system (IDS).
IDS is an application or a system that monitors and analyses traffic across networks
and large systems to detect any anomalous or suspicious activity. IDS generally
comes after the primary security layer viz. firewalls, anti-malware software, DLP
systems, etc. IDS looks for familiar and recognizable activities that pose a threat to
the network. If any such activity or threat is detected, it is immediately brought to
notice by sending alerts.
In this paper, we first try to understand the common and well-known classifications
of IDS. We further go on to see how different self-learning and predictive technologies
can be used to develop the desired model for the purpose of intrusion detection in a
network.

2 Intrusion Detection System

An intrusion detection system (IDS) scans the network for dubious or suspicious
behaviour and notifies the administrator of potential threats. It is a mechanism that
monitors network traffic for data breach or security hazard, if any. While IDS tracks
the network for potentially harmful activities, false alarms are equally possible
and unavoidable. As a result, proper configuration of intrusion detection systems
is needed to understand and differentiate between regular network traffic and any
anomalous or malicious behaviour. Intrusion prevention systems often track packets
entering the network and verify the possibility of anomalies. It further sends warning
alerts to the administrator if any deviation is observed in the packets. IDS can be
A Comparative Study on Self-learning Techniques … 29

Fig. 1 Intrusion detection system and its types

classified in a variety of categories which is discussed as follows. Figure 1 provides


a summary of the below-mentioned classification.

2.1 Types of IDS Technologies

IDS technologies are distinguished fundamentally by the entity that they examine
and the methods by which the features are achieved. Broadly, they are categorized
as follows:

(a) Network-Based
They keep an eye on the network traffic for a section of the network. It also examines
the network and application protocol activity to point out anomalous behaviour.
Network intrusion detection systems (NIDS) are established at a suitable point in a
network to monitor the traffic of all users in the network. It tracks the movement on
the entire subnet and compares it with previous movements on the subnets in order
to select any previously occurred, thus, known attacks. In case a suspicious activity
or an attack is reported, it may be brought to the administrator’s notice. For example,
a NIDS is installed on a subnet where firewalls are also set up to keep a watch if an
intruder is trying to penetrate a firewall.

(b) Host-Based

These IDS monitor the host and the events that occur within the host. Host intrusion
detection systems (HIDS) operate on a network of separate servers or computers.
HIDS tracks the flow of packets only from the system and warns the user if any
malicious or anomalous behaviour is observed. It records the details of files in the
30 D. Kumar et al.

current device and contrasts it with existing records. If the analytical system files
have been tampered with, it is brought to the administrator’s notice. For example,
HIDS are used in devices that are vital and required to keep their architecture intact
for smooth functioning of the system.

2.2 Intrusion Detection Techniques

The intrusion detection techniques mainly consist of two detection methodologies


viz. signature-based and anomaly based. Most of the systems use a combination of
the below-mentioned techniques to reduce the error of its detections.

(a) Signature-Based Detection

A signature is a pattern that has been developed with the knowledge of known threats.
These signatures can be used to map with the ongoing activity in the network, and
if they appear to be similar, they can be classified under the same category thereby
labelling the ongoing activity as a threat. In signature-based detection, we compare
the signature with the ones that are known to be a threat against the events that are
being observed in the network.

(b) Anomaly Based Detection


Anomaly intrusion detection system is primarily based on statistical techniques. It
works in such a way that it is able to identify unknown anomalous patterns as well.
It detects the attack based on its irregular pattern in the network. This enables the
detection system to detect and alarm about newer unknown threats and anomalous
activities.
In order to achieve this kind of functioning in an intrusion detection system,
it makes perfect sense to leverage self-learning techniques such as deep learning,
machine learning and other artificial intelligence algorithms.

3 Machine Learning in IDS

It is imperative for an intrusion detection system to have a robust model capable of


identifying various kinds of anomalous activities. To develop such a model, we will
have to investigate a number of available mathematical and statistical techniques that
have been worked upon by various researchers. In today’s time, machine learning is
being leveraged in a number of domains ranging from health care to finance. Machine
learning is used for solving various analytical and statistical problems that revolve
around classification, clustering, and self-learning techniques. It, therefore, seems
capable of providing varied and optimum solutions for intrusion detection in our
network. We will compare and contrast various types of mathematical and statistical
A Comparative Study on Self-learning Techniques … 31

models that have been developed and used specifically for the task of intrusion
detection. In this review, we further look at solutions that have been provided to
develop a robust intrusion detection system and evaluate the best practices available
to us.

3.1 Classification

Anish Halimaa et al. [1] explore the techniques that can be used to develop a machine
learning-based IDS. They emphasize on the importance of accuracy as a key factor
in the performance of the system. They intend to propose an approach with reduced
false alarms or false positives to improve the detection rate.
Out of the available machine learning techniques, they apply support vector
machine (SVM) and Naive Bayes to the NSL-KDD knowledge discovery dataset,
which is a refined version of the benchmark KDDCup99 dataset. They have designed
an experiment where they used three different approaches in order to examine the
efficiency of the two algorithms viz. SVM and Naive Bayes. The metrics for the
same are the accuracy rate and the misclassification rate of the model. We discuss
this in detail in the subsequent paragraph.
The first approach consists of using the algorithm itself to build the SVM and
Naive Bayes models for the purpose of intrusion detection. In order to get the best
out of the dataset, they go on to incorporate feature reduction and normalization which
branches out to give the other two approaches. For the second approach, CfsSub-
setEval [2] is adopted for feature reduction—a technique that helps in extracting out
the most relevant attributes. This gives us two new and updated models viz. SVM-
CfsSubsetEval and Naive Bayes-CfsSubsetEval. For the third and final approach,
they apply normalization to the dataset which results in SVM-normalization and
Naive Bayes-normalization.
As a result of this experiment, the authors conclude that models based out of SVM
significantly outperform those based out of Naive Bayes. This holds true even for the
models obtained after feature reduction and normalization. This conclusion is also
evident when the data obtained from this experiment is examined.

3.2 Regression and Clustering

Regression analysis deals with the relationship of the output variable with the set
of input features. On the other hand, clustering literally means forming clusters or
groups of data exhibiting similar features which can further help us in classifying
between different data groups. As per our assumption, these data groups will be
benign and malign connections or activities in the network.
A similar approach is observed in the paper presented by Dikshant Gupta et al. [3].
The paper deals with two techniques viz. linear regression and K-means clustering in
32 D. Kumar et al.

order to develop a model capable of detecting an intrusion in a network. The model has
been developed and trained upon the NSL-KDD dataset. In the experiment conducted
by the authors, they first do adequate data preprocessing which includes transforming
the nominal features into numerical inputs which are favourable conditions for the
techniques involved. The dataset is also preprocessed using the mean normalization
method before the algorithms are applied.
Linear regression gives out an accuracy rate of 80.14% (for cost variation alpha =
0.005) which is significant but not satisfying. On the other hand, K-means clustering
displayed a tolerable accuracy rate of 67.5% accuracy. Perhaps, the results seem
to be sufficient for experimental purposes, but they may not satisfy the industrial
requirements. In order to achieve a better accuracy rate, we may look for multi-level
hybrid models or other self-learning techniques (some of which are discussed further
in this review).

3.3 Deep Learning

Deep learning belongs to a broader set of machine learning techniques primarily


consisting of artificial neural networks (ANN) that are inspired by the functions
and structure of neurons present in the brain and the central nervous system [4, 5].
Dissimilar to machine learning, feature extraction in deep learning is done intuitively.
Indicating significant features is one of the important prerequisites for ANNs. ANNs
have also been used in different types of classification problems existing in various
domains [6]. This is why deep learning seems to be more beneficial than traditional
machine learning solutions as the core structure of deep learning methods enables it
to do the step intuitively as we see in the biological behaviour of a brain.
Shenfield et al. [6] present a technique to detect malicious network traffic using
ANN in an IDS. The dataset utilized in this experiment is procured from the online
exploit and vulnerability repository exploitdb [7]. In the experiment, the proposed
ANN architecture, which is a two hidden layer multi-layer perceptron (MLP), is
capable of performing the classification task that is important for the IDS. The metrics
used for this experiment.
After the experiment, the author(s) claimed an average accuracy rate of 98% on
their ANN model with an average ‘area under the receiver operator characteristic’
(AUROC) curve of 0.98. The higher the value for the AUROC curve, the better is
the classifier at differentiating between malicious and benign attacks, i.e. the two
classes.
Kim et al. [8] proposed a long short-term memory-recurrent neural network
(LSTM-RNN) classifier which is developed on the KDDCup99 dataset. While an
advanced self-learning algorithm is extremely important, the dataset that is being
used to train the model on has equal significance, if not more.
That being said, the proposed model gives a detection rate of 98.88%, a false
alarm rate of 10.04%, and an accuracy rate of 96.93%. While these metrics are really
commendable, the ANN architecture trained on the NSL-KDD dataset, a refined
A Comparative Study on Self-learning Techniques … 33

version of the KDDCup99 dataset, gave an accuracy rate of 98% which is slightly
higher than that of the proposed LSTM-RNN model. It will probably be interesting to
see what the results would be if a different dataset (probably, the NSL-KDD dataset
or a relatively newer dataset) is used to train the classifier based on the proposed
model.

3.4 Genetic Algorithms

Genetic algorithm is a series of steps that are inspired by the process of natural
selection. This process is based on the theory of natural evolution which was proposed
by Charles Darwin. Genetic algorithm is used for optimization and searching with
respect to evolutionary computation [9].
Similar to what happens in the biological evolutionary process where the best of
the genetic information is taken from the pool of information and is transferred to
the next generation, the best features of the data are selected for the new generation
resulting in optimization of information and better computational solutions.
One of the first mentions of this algorithm can be traced back to 1957 when
Fraser [10] recommended that genetic systems be modelled in computers. Like neural
networks, it is also inspired by a natural biological process, and it aims to solve
computational problems by mimicking the processes that nature has enabled us with.
Resende et al. [11] talk about an anomaly based IDS which is adaptive and selects
the appropriate attributes in order to profile the ‘normal’ behaviour for a network
using genetic algorithms. Any activity which deviates from this normal behaviour
can be classified as an anomalous behaviour in the network. As per the author(s), this
process of classification is efficient in detecting the intrusion in a network. The role
of a genetic algorithm in this approach is to help extract relevant attributes necessary
for profiling the normal behaviour in a system.
In the experiments conducted on the CICIDS2017 dataset [12], their approach
gave an accuracy rate of 92.85% and a false positive rate of 0.69%. In order to boost
the effectiveness of the model in a real implementation, they propose evolving the
initial population in a large number of generations (greater than 1000) and over a
dataset consisting of different combinations of attacks.

3.5 Neural Networks and Fuzzy Logic

Fuzzy logic originates from the fuzzy set theory according to which reasoning is
approximate rather than reliably deduced from classical predicate logic. This factor
enables fuzzy techniques to be used in anomaly and/or intrusion detection because
the features to be extracted and examined for solving this problem can behave as
fuzzy variables.
34 D. Kumar et al.

Mizdic et al. [13] propose a hybrid structure consisting of neural networks


along with implementation of fuzzy logic. The crux of this architecture is the self-
organizing map (SOM) block cascade linked with fuzzy systems and is developed
using the KDDCup99 dataset.
In order to enhance the effectiveness of the model, a corrector block has also been
introduced in the architecture. The SOM block is divided into two layers, and the
neural networks in those layers are cascade linked. The corrector block consists of
a fuzzy system and automatic corrector, and its primary role involves determining
the unknown samples forwarded from SOM blocks. This hybrid solution achieves
a total accuracy rate of 94.3% which is higher than the accuracy rates of previously
developed models based on the same dataset, Also, it is observed that correction and
the inherent fuzzy system improves the classification of R2L attack amongst all other
classes of attacks in the data (Table 1).

4 Datasets

In order to implement each of the different techniques discussed above, datasets


play a crucial role in determining the performance of the algorithm and thereby the
system. Procuring and applying appropriate data is an onerous yet imperative task.
The dataset should contain instances of different possible scenarios for the system
to make sense of it and act appropriately in case of any anomaly. All in all, the
system should be capable of labelling the traffic based on the features provided in
the dataset. Over the years, a number of datasets have been put to practice for the
purpose of detecting anomalous behaviour in a network. Datasets that are taken into
consideration for our study are as follows:

4.1 KDD Cup 99 Dataset

This dataset was used for The Third International Knowledge Discovery and Data
Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth
International Conference on Knowledge Discovery and Data Mining [14]. It is built
on the data captured in The Defence Advanced Research Projects Agency’98 IDS
conference. It is tcpdump data recorded over 7 weeks. Each feature in the dataset is
either labelled as an attack feature, normal feature or content feature. There are 41
features which are further classified as follows:-
(a) Denial of Service Attack

In this attack, the memory or computing resources are made unavailable to legitimate
user(s). This is possible by allowing consumption of resources crucial to the system
such as bandwidth and/or memory. The motive behind this attack is to freeze or
Another random document with
no related content on Scribd:
the box is fastened at a height of some five or six feet above the
ground, or hung up (but this is not so common) like a swinging bar
on a stand made for the purpose. This last arrangement is
particularly safe, as affording no access to vermin. As the birds
multiply, the owner adds cylinder to cylinder till they form a kind of
wall. Towards sunset, he or his wife approaches the dovecote, greeted
by a friendly cooing from inside, picks up from the ground a piece of
wood cut to the right size, and closes the opening of the first bark box
with it, doing the same to all the others in turn, and then leaves them
for the night, secure that no wild cat or other marauder can reach
them.

DOVECOTE AND GRANARY

I have found out within the last few days why so few men are to be
seen in my rounds. The settlements here scarcely deserve the name
of villages—they are too straggling for that; it is only now and then
that from one hut one can catch a distant glimpse of another. The
view is also obstructed by the fields of manioc, whose branches,
though very spreading, are not easily seen through on account of the
thickly-growing, succulent green foliage. This and the bazi pea are,
now that the maize and millet have been gathered in, the only crops
left standing in the fields. Thus it may happen that one has to trust
entirely to the trodden paths leading from one hut to another, to be
sure of missing none, or to the guidance of the sounds inseparable
from every human settlement. There is no lack of such noises at
Masasi, and in fact I follow them almost every day. Walking about
the country with Nils Knudsen, I hear what sounds like a jovial
company over their morning drink—voices becoming louder and
louder, and shouting all together regardless of parliamentary rules. A
sudden turn of the path brings us face to face with a drinking-party,
and a very merry one, indeed, to judge by the humour of the guests
and the number and dimensions of the pombe pots which have been
wholly or partially emptied. The silence which follows our
appearance is like that produced by a stone thrown into a pool where
frogs are croaking. Only when we ask, “Pombe nzuri?” (“Is the beer
good?”) a chorus of hoarse throats shouts back the answer—“Nzuri
kabisa, bwana!” (“Very good indeed, sir!”)
As to this pombe—well, we Germans fail to appreciate our
privileges till we have ungratefully turned our backs on our own
country. At Mtua, our second camp out from Lindi, a huge earthen
jar of the East African brew was brought as a respectful offering to us
three Europeans. At that time I failed to appreciate the dirty-looking
drab liquid; not so our men, who finished up the six gallons or so in a
twinkling. In Masasi, again, the wife of the Nyasa chief Masekera
Matola—an extremely nice, middle-aged woman—insisted on
sending Knudsen and me a similar gigantic jar soon after our arrival.
We felt that it was out of the question to refuse or throw away the
gift, and so prepared for the ordeal with grim determination. First I
dipped one of my two tumblers into the turbid mass, and brought it
up filled with a liquid in colour not unlike our Lichtenhain beer, but
of a very different consistency. A compact mass of meal filled the
glass almost to the top, leaving about a finger’s breadth of real, clear
“Lichtenhainer.” “This will never do!” I growled, and shouted to
Kibwana for a clean handkerchief. He produced one, after a
seemingly endless search, but my attempts to use it as a filter were
fruitless—not a drop would run through. “No use, the stuff is too
closely woven. Lete sanda, Kibwana” (“Bring a piece of the shroud!”)
This order sounds startling enough, but does not denote any
exceptional callousness on my part. Sanda is the Swahili name for
the cheap, unbleached and highly-dressed calico (also called bafta)
which, as a matter of fact, is generally used by the natives to wrap a
corpse for burial. The material is consequently much in demand, and
travellers into the interior will do well to carry a bale of it with them.
When the dressing is washed out, it is little better than a network of
threads, and might fairly be expected to serve the purpose of a filter.
I found, however, that I could not strain the pombe through it—a
few scanty drops ran down and that was all. After trying my tea and
coffee-strainers, equally in vain, I gave up in despair, and drank the
stuff as it stood. I found that it had a slight taste of flour, but was
otherwise not by any means bad, and indeed quite reminiscent of my
student days at Jena—in fact, I think I could get used to it in time.
The men of Masasi seem to have got only too well used to it. I am far
from grudging the worthy elders their social glass after the hard work
of the harvest, but it is very hard that my studies should suffer from
this perpetual conviviality. It is impossible to drum up any
considerable number of men to be cross-examined on their tribal
affinities, usages and customs. Moreover, the few who can reconcile
it with their engagements and inclinations to separate themselves for
a time from their itinerant drinking-bouts are not disposed to be very
particular about the truth. Even when, the other day, I sent for a
band of these jolly topers to show me their methods of
basketmaking, the result was very unsatisfactory—they did some
plaiting in my presence, but they were quite incapable of giving in
detail the native names of their materials and implements—the
morning drink had been too copious.
It is well known that it is the custom of most, if not all, African
tribes to make a part of their supply of cereals into beer after an
abundant harvest, and consume it wholesale in this form. This, more
than anything else, has probably given rise to the opinion that the
native always wastes his substance in time of plenty, and is nearly
starved afterwards in consequence. It is true that our black friends
cannot be pronounced free from a certain degree of “divine
carelessness”—a touch, to call it no more, of Micawberism—but it
would not be fair to condemn them on the strength of a single
indication. I have already laid stress on the difficulty which the
native cultivator has of storing his seed-corn through the winter. It
would be still more difficult to preserve the much greater quantities
of foodstuffs gathered in at the harvest in a condition fit for use
through some eight or nine months. That he tries to do so is seen by
the numerous granaries surrounding every homestead of any
importance, but that he does not invariably succeed, and therefore
prefers to dispose of that part of his crops which would otherwise be
wasted in a manner combining the useful and the agreeable, is
proved by the morning and evening beer-drinks already referred to,
which, with all their loud merriment, are harmless enough. They
differ, by the bye, from the drinking in European public-houses, in
that they are held at each man’s house in turn, so that every one is
host on one occasion and guest on another—a highly satisfactory
arrangement on the whole.
My difficulties are due to other causes besides the chronically
bemused state of the men. In the first place, there are the troubles
connected with photography. In Europe the amateur is only too
thankful for bright sunshine, and even should the light be a little
more powerful than necessary, there is plenty of shade to be had
from trees and houses. In Africa we have nothing of the sort—the
trees are neither high nor shady, the bushes are not green, and the
houses are never more than twelve feet high at the ridge-pole. To this
is added the sun’s position in the sky at a height which affects one
with a sense of uncanniness, from nine in the morning till after three
in the afternoon, and an intensity of light which is best appreciated
by trying to match the skins of the natives against the colours in Von
Luschan’s scale. No medium between glittering light and deep black
shadow—how is one, under such circumstances, to produce artistic
plates full of atmosphere and feeling?
For a dark-room I have been trying to use the Masasi boma. This is
the only stone building in the whole district and has been
constructed for storing food so as to prevent the recurrence of famine
among the natives, and, still more, to make the garrison independent
of outside supplies in the event of another rising. It has only one
story, but the walls are solidly built, with mere loopholes for
windows; and the flat roof of beaten clay is very strong. In this
marvel of architecture are already stacked uncounted bags
containing millet from the new crop, and mountains of raw cotton. I
have made use of both these products, stopping all crevices with the
cotton, and taking the bags of grain to sit on, and also as a support
for my table, hitherto the essential part of a cotton-press which
stands forsaken in the compound, mourning over the shipwreck it
has made of its existence. Finally, I have closed the door with a
combination of thick straw mats made by my carriers, and some
blankets from my bed. In this way, I can develop at a pinch even in
the daytime, but, after working a short time in this apartment, the
atmosphere becomes so stifling that I am glad to escape from it to
another form of activity.
On one of my first strolls here, I came upon
a neat structure which was explained to me as
“tego ya ngunda”—a trap for pigeons. This is
a system of sticks and thin strings, one of
which is fastened to a strong branch bent over
into a half-circle. I have been, from my youth
up, interested in all mechanical contrivances,
and am still more so in a case like this, where
we have an opportunity of gaining an insight
into the earlier evolutional stages of the
RAT TRAP human intellect. I therefore, on my return to
camp, called together all my men and as many
local natives as possible, and addressed the assembly to the effect
that the mzungu was exceedingly anxious to possess all kinds of
traps for all kinds of animals. Then followed the promise of good
prices for good and authentic specimens, and the oration wound up
with “Nendeni na tengenezeni sasa!” (“Now go away and make up
your contraptions!”).
How they hurried off that day, and how eagerly all my men have
been at work ever since! I had hitherto believed all my carriers to be
Wanyamwezi—now I find, through the commentaries which each of
them has to supply with his work, that my thirty men represent a
number of different tribes. Most of them, to be sure, are
Wanyamwezi, but along with them there are some Wasukuma and
Manyema, and even a genuine Mngoni from Runsewe, a
representative of that gallant Zulu tribe who, some decades ago,
penetrated from distant South Africa to the present German
territory, and pushed forward one of its groups—these very Runsewe
Wangoni—as far as the south-western corner of the Victoria Nyanza.
As for the askari, though numbering only thirteen, they belong to no
fewer than twelve different tribes, from those of far Darfur in the
Egyptian Sudan to the Yao in Portuguese East Africa. All these
“faithfuls” have been racking their brains to recall and practise once
more in wood and field the arts of their boyhood, and now they come
and set up, in the open, sunny space beside my palatial abode, the
results of their unwonted intellectual exertions.
The typical cultivator is not credited in literature with much skill
as a hunter and trapper; his modicum of intellect is supposed to be
entirely absorbed by the care of his fields, and none but tribes of the
stamp of the Bushmen, the Pygmies and the Australian aborigines
are assumed by our theoretic wisdom to be capable of dexterously
killing game in forest or steppe, or taking it by skilful stratagem in a
cunningly devised trap. And yet how wide of the mark is this opinion
of the schools! Among the tribes of the district I am studying, the
Makua are counted as good hunters, while at the same time they are
like the rest, in the main, typical hoe-cultivators—i.e., people who,
year after year, keep on tilling, with the primitive hoe, the ground
painfully brought under cultivation. In spite of their agricultural
habits their traps are constructed with wonderful ingenuity. The
form and action of these traps is sufficiently evident from the
accompanying sketches; but in case any reader should be entirely
without the faculty of “technical sight,” I may add for his benefit that
all these murderous implements depend on the same principle.
Those intended for quadrupeds are so arranged that the animal in
walking or running forward strikes against a fine net with his muzzle,
or a thin cord with his foot. The net or the string is thereby pressed
forward, the upper edge of the former glides downwards, but the end
of the string moves a little to one side. In either case this movement
sets free the end of a lever—a small stick which has hitherto, in a way
sufficiently clear from the sketch—kept the trap set. It slips
instantaneously round its support, and in so doing releases the
tension of the tree or bent stick acting as a spring, which in its
upward recoil draws a skilfully fixed noose tight round the neck of
the animal, which is then strangled to death. Traps of similar
construction, but still more cruel, are set for rats and the like, and,
unfortunately, equal cunning and skill are applied to the pursuit of
birds. Perhaps I shall find another opportunity of discussing this side
of native life; it certainly deserves attention, for there is scarcely any
department where the faculty of invention to be found in even the
primitive mind is so clearly shown as in this aspect of the struggle for
existence.

TRAP FOR ANTELOPES

Of psychological interest is the behaviour of the natives in face of


my own activity in this part of my task. When, we two Europeans
having finished our frugal dinner, Nils Knudsen has laid himself
down for his well-deserved siesta, and the snoring of my warriors
resounds, more rhythmically than harmoniously from the
neighbouring baraza, I sit in the blazing sun, like the shadowless
Schlemihl, only slightly protected by the larger of my two helmets,
sketching.

TRAP FOR GUINEA-FOWL


TRAP FOR LARGE GAME

The ability to make a rapid and accurate sketch of any object in a


few strokes is one whose value to the scientific explorer cannot be
overrated. Photography is certainly a wonderful invention, but in the
details of research-work carried on day by day, it is apt to fail one
oftener than might be expected, and that not merely in the darkness
of hut-interiors, but over and over again by daylight in the open air.
I am sitting sketching, then. Not a breath of air is stirring—all
nature seems asleep. My pen, too, is growing tired, when I hear a
noise immediately behind me. A hasty glance shows me that the
momentum of universal human curiosity has overcome even the
primæval force of negroid laziness. It is the whole band of my
carriers, accompanied by a few people belonging to the place. They
must have come up very softly, as they might easily do with their
bare feet on the soft, sandy soil. Presently the whole crowd is looking
over my shoulder in the greatest excitement. I do not let them
disturb me; stroke follows stroke, the work nears completion,—at last
it is finished. “Sawasawa?” (“Is it like?”) I ask eagerly, and the
answering chorus of “Ndio” (“Yes”) is shouted into my ears with an
enthusiasm which threatens to burst the tympanum. “Kizuri?” (“Is it
fine?”) “Kizuri sana kabisa” (“Very fine, indeed”), they yell back still
more loudly and enthusiastically; “Wewe fundi” (“You are a master-
craftsman”). These flattering critics are my artists who, having
practised themselves, may be supposed to know what they are
talking about; the few washenzi, unlettered barbarians, unkissed of
the Muse, have only joined in the chorus from gregarious instinct,
mere cattle that they are.
Now comes the attempt at a practical application. I rise from my
camp-stool, take up an oratorical attitude and inform my disciples in
art that, as they have now seen how I, the fundi, set about drawing a
trap, it would be advisable for them to attempt a more difficult
subject, such as this. It is dull work to keep on drawing their friends,
or trees, houses, and animals; and they are such clever fellows that a
bird-trap must surely be well within their powers. I have already
mentioned the look of embarrassed perplexity which I encountered
when beginning my studies at Lindi. Here it was even more marked
and more general. It produced a definite impression that the idea of
what we call perspective for the first time became clear to the men’s
minds. They were evidently trying to express something of the sort
by their words and gestures to each other; they followed with their
fingers the strangely foreshortened curves which in reality stood for
circles—in short, they were in presence of something new—
something unknown and unimagined, which on the one hand made
them conscious of their intellectual and artistic inferiority, and on
the other drew them like a magnet to my sketch-book. None of them
has up to the present attempted to draw one of these traps.
Travellers of former days, or in lands less satisfactorily explored
than German East Africa, found the difficulties of barter not the least
of their troubles. Stanley, not so many years ago, set out on his
explorations with hundreds of bales of various stuffs and
innumerable kinds of beads, and even thus it was not certain
whether the natives of the particular region traversed would be
suited; not to mention the way in which this primitive currency
increased the number of carriers required by every expedition. In
German East Africa, where the Colonial Administration has so often
been unjustly attacked, the white man can now travel almost as
easily as at home. His letter of credit, indeed, only holds good as far
as the coast, but if his errand is, like mine, of an official character,
every station, and even every smaller post, with any Government
funds at its disposal, has orders to give the traveller credit, on his
complying with certain simple formalities, and to provide him with
cash. The explanation is not difficult: the fact that our rupees are
current on the coast compels all the interior tribes to adopt them,
whether they like it or not. I brought with me from Lindi a couple of
large sacks with rupees, half and quarter rupees, and for immediate
needs a few cases of heller.[16] This copper coin, long obsolete in
Germany, has been coined for circulation in our colony, but the
natives have not been induced to adopt it, and reckon as before by
pice—an egg costs one pice (pesa) and that is enough—no one thinks
of working out the price in hellers. Neither is the coin popular with
the white residents, who deride its introduction and make feeble
puns on its name—one of the poorest being based on the name of the
present Director of Customs, which happens to be identical with it.
I find, however, that the natives are by no means averse to
accepting these despised coins when they get the chance. On our
tramps through the villages, Moritz with the lantern is followed by
Mambo sasa, the Mngoni, carrying on his woolly head a large jar of
bright copper coin newly minted at Berlin.
After a long, but not tedious examination of all the apartments in
the native palaces, I return to the light of day, dazzled by the tropical
sunshine. With sympathetic chuckles, my bodyguard—those of my
men who are always with me and have quickly grasped, with the
sympathetic intuition peculiar to the native, what it is that I want—
follow, dragging with them a heap of miscellaneous property. Lastly
come the master of the house and his wife, in a state of mingled
expectation and doubt. Now begins the bargaining, in its essentials
not very different from that experienced in the harbours of Naples,
Port Said, Aden and Mombasa. “Kiasi gani?” (“What is the price?”)
one asks with ostentatious nonchalance, including the whole pile in a
compendious wave of the hand. The fortunate owner of the valuables
apparently fails to understand this, so he opens his mouth wide and
says nothing. I must try him on another tack. I hold up some article
before his eyes and ask, “Nini hii?” (“What is this?”), which proves
quite effectual. My next duty is to imagine myself back again in the
lecture-hall during my first term at college, and to write down with
the utmost diligence the words, not of a learned professor, but of a
raw, unlettered mshenzi. By the time I have learnt everything I want
to know, the name, the purpose, the mode of manufacture and the
way in which the thing is used, the native is at last able and willing to
fix the retail price. Up to the present, I have met with two extremes:
one class of sellers demand whole rupees, Rupia tatu (three) or
Rupia nne (four), quite regardless of the nature of the article for sale
—the other, with equal consistency, a sumni as uniform price. This is
a quarter-rupee—in the currency of German East Africa an
exceedingly attractive-looking silver coin, a little smaller than our
half-mark piece or an English sixpence. Possibly it is its handiness,
together with the untarnished lustre of my newly-minted specimens
in particular, which accounts for this preference. One thing must be
mentioned which distinguishes these people very favourably from
the bandits of the ports already mentioned. None of them raises an
outcry on being offered the tenth or twentieth part of what he asks.
With perfect calm he either gradually abates his demands till a fair
agreement is reached, or else he says, at the first offer, “Lete” (“Hand
it over”). At this moment Moritz and my jar of coppers come to the
front of the stage. The boy has quickly lifted the vessel down from the
head of his friend Mambo sasa. With the eye of a connoisseur he
grasps the state of our finances and then pays with the dignity, if not
the rapidity, of the cashier at a metropolitan bank. The remaining
articles are bargained for in much the same way. It takes more time
than I like; but this is not to be avoided.
When the purchase of the last piece is completed, my carriers, with
the amazing deftness I have so often admired, have packed up the
spoil, in the turn of a hand, in large and compact bundles. A
searching look round for photographic subjects, another last glance
at the house-owner chuckling to himself over his newly-acquired
wealth, and then a vigorous “Kwa heri” (“Good-bye”), and lantern
and jar go their way. We had only just settled into our house here
when we received a visit from the chief’s son, Salim Matola, a very
tall and excessively slender youth of seventeen or eighteen,
magnificently clad in a European waistcoat, and very friendly. Since
then he has scarcely left my side; he knows everything, can do
everything, finds everything, and, to my delight, brings me
everything. He makes the best traps, shows me with what diabolical
ingenuity his countrymen set limed twigs, plays on all instruments
like a master, and produces fire by drilling so quickly that one is
astonished at the strength in his slight frame. In a word, he is a
treasure to the ethnographer.
One thing only seems to be unknown to my young friend, and that
is work. His father, Masekera Matola, already mentioned, has a very
spacious group of huts and extensive gardens. Whether the old
gentleman ever does any perceptible work on this property with his
own hands, I am not in a position to judge, as he is for the present
most strenuously occupied in consuming beer; but at every visit, I
have noticed the women of the family working hard to get in the last
of the crops. The young prince alone seems to be above every
plebeian employment. His hands certainly do not look horny, and his
muscles leave much to be desired. He strolls through life in his
leisurely way with glad heart and cheerful spirit.
MY CARAVAN ON THE MARCH. DRAWN BY PESA MBILI
CHAPTER VII
MY CARAVAN ON THE SOUTHWARD
MARCH

Chingulungulu, beginning of August, 1906.

It is not very easy to locate my present abode on the map. Masasi and
its exact latitude and longitude have been known to me for years, but
of this strangely named place,[17] where I drove in my tent-pegs a few
days ago, I never even heard before I had entered the area of the
inland tribes.
One trait is common to all Oriental towns, their beauty at a
distance and the disillusionment in store for those who set foot
within their walls. Knudsen has done nothing but rave about
Chingulungulu ever since we reached Masasi. He declared that its
baraza was the highest achievement of East African architecture,
that it had a plentiful supply of delicious water, abundance of all
kinds of meat, and unequalled fruit and vegetables. He extolled its
population, exclusively composed, according to him, of high-bred
gentlemen and good-looking women, and its well-built, spacious
houses. Finally, its situation, he said, made it a convenient centre for
excursions in all directions over the plain. I have been here too short
a time to bring all the details of this highly coloured picture to the
test of actual fact, but this much I have already ascertained, that
neither place nor people are quite so paradisaical as the enthusiastic
Nils would have me believe.
YAO HOMESTEAD AT CHINGULUNGULU

To relate my experiences in their proper order, I must, however, go


back to our departure from Masasi which, owing to a variety of
unfortunate circumstances, took place earlier than originally
planned. To begin with, there was the changed attitude of the
inhabitants, who at first, as already stated, showed the greatest
amiability, and allowed us, in the most obliging way, to inspect their
homes and buy their household furnishings. In my later sketching
and collecting expeditions, I came everywhere upon closed doors and
apparently deserted compounds. This phenomenon, too, comes
under the heading of racial psychology. However much he may profit
by the foreigner’s visits, the African prefers to have his own hut to
himself.[18]
In the second place, we began, in the course of a prolonged
residence, to discover the drawbacks of our quarters in the rest-
house. Knudsen, who is very sensitive in this respect, insisted that it
was damp, and we soon found that the subsoil water, which indeed
reached the surface as a large spring on the hillside a little below the
house, was unpleasantly close to our floor. Even on the march up
from the coast, Knudsen had suffered from occasional attacks of
fever. These now became so frequent and severe that he was scarcely
fit for work. His faithful old servant, Ali, nursed him with the most
touching devotion, and never left his bedside night or day.
I had myself on various occasions noticed a curious irritation of
the scalp, for which I could discover no cause, in spite of repeated
examination. One day, while hastening across from the dark-room to
the rest-house, with some wet plates in my hand, I was conscious of
intense discomfort among my scanty locks, and called out to Moritz
to take off my hat and look if there was anything inside it. He obeyed,
inspected the hat carefully inside and out, and, on pursuing his
researches under the lining, turned grey in the face, and ejaculated
with evident horror, “Wadudu wabaya!”[19] The case becoming
interesting, I put my plates down and instituted a minute
investigation into Moritz’s find, which proved to consist of a number
of assorted animalcules, with a sprinkling of larger creatures
resembling ticks. This was somewhat startling. I had come to Africa
with a mind entirely at ease as regards malaria—I swear by Koch and
fear nothing. But remittent fever is another matter. In Dar es Salam I
had heard enough and to spare about this latest discovery of the
great Berlin bacteriologist, and how it is produced by an
inconspicuous tick-like insect which burrows in the soil of all sites
occupied for any length of time by natives. The mosquito-net, I was
told, is a sufficient protection against the full grown papasi, as they
are called, but not against their hopeful progeny, which can slip
unhindered through the finest mesh. This particular kind of fever,
moreover, was said to be most especially trying—you were never
seriously ill, and yet never really well, or fit for work; and nothing,
not even quinine, would avail to keep the attacks from recurring
every few days. Small wonder if, at the sight of these wadudu
wabaya in the shape of ticks, I too turned pale at the thought of the
ignoble end possibly awaiting my enterprise before it was well begun.
I had already found out that Masasi was not precisely an abode of
all the virtues, and that an appreciable percentage of the soldiers
forming the garrison at the boma were suffering from venereal
diseases; but the incident which precipitated our departure was the
following. The akida, or local headman (a former sergeant in the
Field Force), was the owner of a small herd of cattle, and with the
good-nature which is one of the most striking traits in the African
character, earned my warmest gratitude by sending me a small jar of
milk every day. After a time we heard, and the rumour gained in
definiteness with each repetition, that the akida was a leper. I could
not refuse the milk, which continued to arrive regularly, and came in
very handy for fixing my pencil drawings.
In their totality the evils enumerated may
not signify more than a succession of pin-
pricks; but even such trifling interferences
with human well-being may in the end
appreciably diminish one’s enjoyment of life.
With the attractions of Chingulungulu as an
additional inducement, it was not surprising
that only a day or two intervened between the
first suggestion that we should migrate
southward and our actual departure. With
their usual monkeylike agility, my carriers one
evening packed a large heap of specimens in
convenient loads, and as quickly the order was
given to Saleh, the corporal in command of
the askari, and Pesa mbili, the leader of the
porters, “Safari to-morrow at six!”
THE YAO CHIEF
MATOLA Next to Matola, the Yao chief of
Chingulungulu, no man in the country is
oftener in men’s mouths than his illustrious
colleague and fellow tribesman, Nakaam, of Chiwata in the north-
western part of the Makonde plateau. The Europeans on the coast
are not agreed as to which of these two chiefs is the more powerful.
In the interior, however, Matola seems to be far more looked up to by
the natives than the chief of Chiwata. Nevertheless, I thought it
absolutely necessary to visit the latter and his people. My plans are
not based on any fixed line of march, but were expressly arranged so
that I should be able to take whatever route circumstances might
render most convenient.
I must confess that my stay at Masasi has turned out a
disappointment as regards the customs, habits and ideas of the
natives, though I have gained a very fair insight into the outward,
material details of their life. But here too, Nils Knudsen is ready with
consolation and encouragement. “What can you expect, Professor?
the people here are a terribly mixed lot, after all, and have lost all
their own traditions and customs. Don’t waste any more time in this
wretched hole of a Masasi, but come to Chingulungulu; you have no
idea what a fine place that is!”
We marched at daybreak on July 31. The
road through the Masasi district, as already
mentioned, skirts the great chain of insular
mountains on the east, passing, at a sufficient
height to afford an extensive view to the east
and south, over an escarpment formed by the
products of aerial denudation from the gneiss
peaks. Did I say the plain? it is an ocean that
we see spread out before our eyes, a white,
boundless expanse, studded with islands, here
one, there another, and yonder, on the misty
horizon, whole archipelagoes. This wonderful
spectacle, passing away all too quickly as the
sun climbs higher—the peaks rising like
islands from the sea of the morning mist,
while our caravan trails its length along the
shore—pictures for us as in a mirror the
aspect it presented in those distant ages when
the blue waves of the primæval ocean rolled
where now the blue smoke of lowly huts NAKAAM, A YAO
CHIEF
ascends to the heavens.
The goal of our first day’s march was Mwiti,
where, to judge from the importance given to it on the map, I
expected a large native settlement. Not far from the Masasi Mission
station, the road to Mwiti branches off from the Coast road on the
right. I order a halt; the column opens out; I shout into the fresh
morning air “Wapagazi kwa Lindi!” (“the carriers for Lindi!”); and
the oldest and also the tallest of my porters, a Mnyamwezi of
pronounced Masai type, strides up with a heavy, swaying motion like
a camel.
INTERIOR OF A COMPOUND AT MWITI

His name, Kofia tule, was at first a puzzle to me. I knew that kofia
means a cap, but, curiously enough it never occurred to me to look
up tule (which, moreover, I assumed to be a Nyamwezi word) in the
dictionary. That it was supposed to involve a joke of some sort, I
gathered from the general laughter, whenever I asked its meaning. At
last we arrived at the fact that kofia tule means a small, flat cap—in
itself a ridiculous name for a man, but doubly so applied to this black
super-man with the incredibly vacant face.
Kofia tule, then, comes slowly forward, followed by six more
Wanyamwezi, and some local men whom I have engaged as extra
carriers. With him as their mnyampara they are to take my
collections down to the Coast, and get them stored till my return in
the cellars of the District Commissioner’s office at Lindi. The final
instructions are delivered, and then comes the order, “You here, go
to the left,—we are going to the right. March!” Our company takes
some time to get into proper marching order, but at last everything
goes smoothly. A glance northward over the plain assures us that
Kofia tule and his followers have got up the correct safari speed; and
we plunge into the uninhabited virgin pori.
There is something very monotonous and fatiguing about the
march through these open woods. It is already getting on for noon,
and I am half-asleep on my mule, when I catch sight of two black
figures, gun in hand, peeping cautiously round a clump of bushes in
front. Can they be Wangoni?
For some days past we have heard flying rumours that Shabruma,
the notorious leader of the Wangoni in the late rebellion, and the last
of our opponents remaining unsubdued, is planning an attack on
Nakaam, and therefore threatening this very neighbourhood. Just as
I look round for my gun-bearer, a dozen throats raise the joyful shout
of “Mail-carrier!” This is my first experience of the working of the
German Imperial Post in East Africa; I learnt in due course that,
though by no means remunerative to the department, it is as nearly
perfect as any human institution can be. It sounds like an
exaggeration, but it is absolutely true, to say that all mail matter,
even should it be only a single picture post-card, is delivered to the
addressee without delay, wherever he may be within the postal area.
The native runners, of course, have a very different sort of duty to
perform from the few miles daily required of our home functionaries.
With letters and papers packed in a water-tight envelope of oiled
paper and American cloth, and gun on shoulder, the messenger trots
along, full of the importance of his errand, and covers enormous
distances, sometimes, it is said, double the day’s march of an
ordinary caravan. If the road lies through a district rendered unsafe
by lions, leopards, or human enemies, two men are always sent
together. The black figures rapidly approach us, ground arms with
soldierly precision and report in proper form:—Letters from Lindi
for the Bwana mkubwa and the Bwana mdogo—the great and the
little master. As long as Mr. Ewerbeck was with us, it was not easy for
the natives to establish the correct precedence between us. Since they
ranked me as the new captain, they could not possibly call me
Bwana mdogo. Now, however, there is not the slightest difficulty,—
there are only two Europeans, and I being, not only the elder, but
also the leader of the expedition, there is nothing to complicate the
usual gradation of ranks.

You might also like