Professional Documents
Culture Documents
Full download Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah file pdf all chapter on 2024
Full download Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah file pdf all chapter on 2024
https://ebookmass.com/product/machine-learning-for-business-
analytics-concepts-techniques-and-applications-in-rapidminer-
galit-shmueli/
https://ebookmass.com/product/applications-of-deep-learning-in-
electromagnetics-teaching-maxwells-equations-to-machines-maokun-
li/
https://ebookmass.com/product/time-series-algorithms-recipes-
implement-machine-learning-and-deep-learning-techniques-with-
python-akshay-r-kulkarni/
https://ebookmass.com/product/risk-modeling-practical-
applications-of-artificial-intelligence-machine-learning-and-
deep-learning-terisa-roberts/
Bioinformatics: Methods and Applications Dev Bukhsh
Singh
https://ebookmass.com/product/bioinformatics-methods-and-
applications-dev-bukhsh-singh/
https://ebookmass.com/product/emerging-trends-in-applications-
and-infrastructures-for-computational-biology-bioinformatics-and-
systems-biology-systems-and-applications-1st-edition-arabnia/
https://ebookmass.com/product/computation-in-bioinformatics-s-
balamurugan/
https://ebookmass.com/product/artificial-intelligence-and-deep-
learning-in-pathology-1st-edition-stanley-cohen-md-editor/
https://ebookmass.com/product/deep-reinforcement-learning-for-
wireless-communications-and-networking-theory-applications-and-
implementation-dinh-thai-hoang/
Deep Learning in
Bioinformatics
This page intentionally left blank
Deep Learning in
Bioinformatics
Techniques and Applications in
Practice
Habib Izadkhah
Department of Computer Science
University of Tabriz
Tabriz, Iran
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from the
publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found
at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may
be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-823822-6
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
CHAPTER 1 Why life science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why deep learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contemporary life science is about data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Deep learning and bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 What will you learn? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CHAPTER 2 A review of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 What is machine learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Challenge with machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Overfitting and underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Mitigating overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Adjusting parameters using cross-validation . . . . . . . . . . . . . . . . . . . . 15
2.4.3 Cross-validation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Types of machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 The math behind deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 Relevant mathematical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.3 The math behind machine learning: statistics . . . . . . . . . . . . . . . . . . . . 25
2.7 TensorFlow and Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Real-world tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
CHAPTER 3 An introduction of Python ecosystem for deep learning . . . . . . . . . . . . . . . . . 31
3.1 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 SciPy (scientific Python) ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 A quick refresher in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.2 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 Data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 Control flow statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.5 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
vii
viii Contents
3.5 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6 Matplotlib crash course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.7 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8 How to load dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.1 Considerations when loading CSV data . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.2 Pima Indians diabetes dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.3 Loading CSV files in NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.4 Loading CSV files in Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.9 Dimensions of your data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.10 Correlations between features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.11 Techniques to understand each feature in the dataset . . . . . . . . . . . . . . . . . . . . 53
3.11.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11.2 Box-and-whisker plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.11.3 Correlation matrix plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.12 Prepare your data for deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.12.1 Scaling features to a range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12.2 Data normalizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12.3 Binarize data (make binary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.13 Feature selection for machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13.1 Univariate selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13.2 Recursive feature elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.13.3 Principal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.13.4 Feature importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.14 Split dataset into training and testing sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.15 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
CHAPTER 4 Basic structure of neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 The neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Layers of neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 How a neural network is trained? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Delta learning rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Generalized delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.7 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.1 Stochastic gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7.2 Batch gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7.3 Mini-batch gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8 Example: delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8.1 Implementation of the SGD method . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8.2 Implementation of the batch method . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.9 Limitations of single-layer neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
CHAPTER 5 Training multilayer neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Contents ix
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Acknowledgments
This book is a product of sincere cooperation of many people. The author would like to thank all those
who contributed in the process of writing and publishing this book. Dr. Masoud Kargar, Dr. Masoud
Aghdasifam, Hamed Babaei, Mahsa Famil, Esmaeil Roohparver, Mehdi Akbari, Mahsa Hashemzadeh,
and Shabnam Farsiani have read the whole draft and made numerous suggestions, improving the pre-
sentation quality of the book; I thank them for all their effort and encouragement.
I wish to express my sincere appreciation to the team at Elsevier, particularly Chris Katsaropoulos,
Senior Acquisitions Editor, for his guidance, comprehensive explanations of the issues, prompt reply
to my e-mails, and, of course, his patience. I would like to also thank Joshua Mearns and Nirmala
Arumugam, for preparing the production process and coordinating the web page, and the production
team. Finally, I thank “the unknown reviewers,” for their great job on exposing what needed to be
restated, clarified, rewritten, and/or complemented.
Habib Izadkhah
xiii
This page intentionally left blank
Preface
Artificial Intelligence, Machine Learning, Deep Learning, and Big Data have become the latest hot
buzzwords, Deep learning and bioinformatics being two of the hottest areas of contemporary research.
Deep learning, as an emerging branch from machine learning, is a good solution for big data analytics.
Deep learning methods have been extensively applied to various fields of science and engineering,
including computer vision, speech recognition, natural language processing, social network analyzing,
and bioinformatics, where they have produced results comparable to and in some cases superior to
domain experts. A vital value of deep learning is the analysis and learning of massive amounts of data,
making it a valuable method for Big Data Analytics.
Bioinformatics research comes into an era of Big Data. With increasing data in biology, it is ex-
pected that deep learning will become increasingly important in the field and will be utilized in a vast
majority of analysis problems. Mining potential value in biological data for researchers and the health
care domain has great significance. Deep learning, which is especially formidable in handling big data,
shows outstanding performance in biological data processing.
To practice deep learning, you need to have a basic understanding of the Python ecosystem. Python
is a versatile language that offers a large number of libraries and features that are helpful for Artificial
Intelligence and Machine Learning in particular, and, of course, you do not need to learn all of these
libraries and features to work with deep learning. In this book, I first give you the necessary Python
background knowledge to study deep learning. Then, I introduce deep learning in an easy to under-
stand and use way, and also explore how deep learning can be utilized for addressing several important
problems in bioinformatics, including drug discovery, de novo molecular design, protein structure pre-
diction, gene expression regulation, protein sequence classification, and biomedical image processing.
Through real-world case studies and working examples, you’ll discover various methods and strategies
for building deep neural networks using the Keras library. The book will give you all the practical in-
formation available on the bioinformatics domain, including the best practices. I believe that this book
will provide valuable insights for a successful career and will help graduate students, researchers, ap-
plied bioinformaticians working in the industry and academia to use deep learning techniques in their
biological and bioinformatics studies as a starting point.
This book
• provides necessary Python background for practicing deep learning,
• introduces deep learning in a convenient way,
• provides the most practical information available on the domain to build efficient deep learning
models,
• presents how deep learning can be utilized for addressing several important problems in bioinfor-
matics,
• explores the legendary deep learning architectures, including convolutional and recurrent neural
networks, for bioinformatics,
• discusses deep learning challenges and suggestions.
Habib Izadkhah
xv
This page intentionally left blank
CHAPTER
deep neural network (deep learning) is presented in the subsequent chapters of the book, it is important
to know about some of the breakthroughs achieved with deep learning first.
A common application of deep learning is image recognition. Using deep learning for facial recog-
nition includes a wide range of applications from security areas and cell phone unlocking methods to
automated tagging of individuals who are present in an image. Companies now seek to use this feature
to set up the process of making purchases without the need for credit cards. For instance, have you
noticed that Facebook has developed an extraordinary feature that lets you know about the presence of
your friends in your photos? Facebook used to make you click on photos and type your friends’ names
to tag them. However, as soon as a photo is uploaded, Facebook now does the magic and tags everybody
for you. This technology is called facial recognition.
Deep learning can also be utilized to restore images or eliminate their noise. This feature of machine
learning is also employed in different security areas, identification of criminals, and quality enhance-
ment of a family photo or a medical image. Producing fake images is also another feature of deep
learning. In fact, deep learning algorithms are able to generate new images of people’s faces, objects,
and even sceneries that have never existed. These images are utilized in graphic design, video game
development, and movie production.
Leading to a plethora of applications for users, many of the similar deep learning developments are
now employed in bioinformatics and biomedicine to classify tumor cells into various categories. Given
the scarcity of medical data, fake images can be produced to generate new data.
Deep learning has also resulted in many speech recognition developments that have become perva-
sive in search engines, cell phones, computers, TV sets, and other online devices everywhere.
So far, various speech recognition technologies have been developed, such as Alexa, Cortana,
Google Assistant, and Siri, changing human interactions with devices, homes, cars, and jobs. Through
the speech recognition technology, it is possible to talk with computers and devices, which can also
understand what the speech means and can make a response. Introducing voice-controlled or digital
assistants into the speech recognition market has changed the outlook of this technology in the 21st
century.
Analyzing its user’s behavior, a recommender system suggests the most appropriate items (e.g., data,
information, and goods). Helping users find their targets faster, this system is an approach proposed to
deal with the problems caused by the growingly massive amount of information. Many companies that
have extensive websites now employ recommender systems to facilitate their processes. Given different
preferences of various users at different ages, there is no doubt that users select different products; thus,
recommender systems should yield various results accordingly. Recommender systems have significant
effects on the revenues of different companies. If employed correctly, these systems can bring about
high profitability for companies. For instance, Netflix has announced that 60% of DVDs rented by users
are provided through recommender systems, which can greatly affect user choices of films.
Recommender systems can also be employed to prescribe appropriate medicines for patients. In
fact, prescribing the right medicines for patients is among the most important processes of their treat-
ments, for which accurate decisions must be made based on patients’ current conditions, history, and
symptoms. In many cases, patients may need more than one medicine or new medicines for another
condition in addition to a previous disease. Such cases increase the chances of medical error in the
prescription of medicines and the incidence of side effects of medicine misuse.
These are only a few innovations achieved through the use of deep learning methods in bioinformat-
ics. Ranging from medical diagnosis and tumor detection to production and prescription of customized
1.3 Contemporary life science is about data 3
medicines based on a specific genome, deep learning has attracted many large pharmaceutical and
medical companies. Many deep learning ideas used in bioinformatics are inspired by the conventional
applications of deep learning.
We are living in an interesting era when there is a convergence of biological data and the extensive
scientific methods of processing that kind of data. Those who can combine data with novel methods to
learn from data patterns can achieve significant scientific breakthroughs.
Language: Spanish
DISQUISICIONES HISTÓRICO-
FILOLÓGICAS.
PUERTO-RICO.
Tip. al vap. de “La Correspondencia.”
1894.
COLON EN PUERTO-RICO
CAYETANO COLL Y TOSTE.
Colón en Puerto-Rico
DISQUISICIONES HISTÓRICO-
FILOLÓGICAS
PUERTO-RICO.
1893.
Noviembre de 1893.
Segundo viaje de Colón