As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.
A
activation function 60
binary step function 60
leaky ReLU 62
ReLu 61
sigmoid 62
tanh 63
activator protein (AP) 175
advanced tools, model monitoring
data quality tools 222
DL monitoring tools 223
monitoring 222
system monitoring tools 222
AI, in genomics market
reference link 6
Amazon SageMaker 210
Amazon Web Services (AWS) 17
Analysis using Denoising Autoencoders for Gene Expression (ADAGE)
reference link 130
anomaly detection 119
novelty detection 120
area under the curve (AUC) 172
artificial neural networks (ANNs) 4
artificial neuron (AN) 57
association 121
autoencoder applications, for predicting gene expression
ADAGE 130
gene expression clustering, boosting 131
hierarchical organization, of yeast transcriptomic machinery 131
architecture 124
bottleneck 125
convolutional autoencoders 127, 128
decoder 125
deep autoencoders 127
encoder 125
gene expression 130
gene expression use case 131
image compression 125
loss function 125
properties 122
regularization 125
regularized autoencoders 128
Single-cell RNA sequencing (scRNA-seq) 126, 127
vanilla autoencoder 127
automated ML (AutoML) 227
B
backpropagation through time (BPTT) 102
balanced data 229
bases 13
Bayes algorithm 139
bidirectional RNN 103
biological neuron 57
FASTA 20
GenBank 21
installation, verifying 12
installing 10
installing, pip used 11
Python installation, verifying 10
SeqIO object 20
SeqRecord object 19
sequences, working with 18
using, for genomic data analysis 18
black-box model interpretability 185, 186
bottleneck 125
business value, unlocking from model interpretability 187
profitability 189
trust, building 189
C
ChIP-seq experiment
TFBS predictions via 111
ChIP-sequencing (ChIP-seq) 38
classification metrics or performance statistics, model evaluation
clustering 37
CNN architecture 84
input layer 84
output layer 87
pooling layer 86
CNN for coexpression (CNNC) 93
coding sequence (CDS) 35
considerations, for algorithm implementation
memory requirements 161
model interpretability 161
training time 161
continuous integration and continuous development (CI/CD) 232
contractive autoencoders 129
convolutional autoencoders 127, 128
convolutional neural networks (CNNs) 70, 71, 82, 83
applications, in genomics 90
history 83
Cortex 210
cross-validation 163
D
DanQ 108
reference link 108
data leakage 228
cross-validation 163
dataset, training 162
Group K-Fold cross-validation 163
holdout dataset 162
K-Fold cross-validation 163
random partitioning 162
stratified K-Fold cross-validation 163
stratified partitioning 162
validation dataset 162
data preprocessing
data augmentation 159
data cleaning 158
data transformation 159
data processing 157
data transformation
data processing 16
quality control and cleaning 16
data wrangling
decoder 125
deep autoencoders 127
deep autoregressive models 137
DeepBind 90
DeepChrome 92
DeepInsight 91
DragoNN 80
Kipoi 80
neural network definition 56
deep learning, applying to genomics
deep learning challenges
computational resource requirements 227
expertise in DL frameworks 227
fewer biological samples 226
lack of flexible tools 226
lack of high-quality labeled data 227
lack of model interpretability 227, 228
deep learning libraries
Keras 79
PyTorch 78
TensorFlow 78
deep learning model
creating 211
deploying, as web service 211
exporting 211
monitoring 222
DeepNano 107
reference link 106
deep neural network (DNN) 58, 121, 159
activation function 60
anatomy 56
application, in genomics 76
architectures 69
backpropagation 64
bias 60
for genomics 74
forward propagation 64
gene regulatory networks (GRNs) 77
gradient descent 65
hidden layer 58
input layer 58
key concepts 59
loss function 64
protein structure predictions 77
regularization 65
regulatory genomics 77
single-cell RNA sequencing 77
transfer function 60
weights 60
reference link 108
DeepVariant 92
denoising autoencoders 129
denoising autoencoders, used for predicting gene expression from TCGA pan-cancer RNA-Seq data 131
data collection 131
data preprocessing 131
deoxyribonucleoside triphosphates (dNTPs) 13
deployment tools
Amazon SageMaker 210
Cortex 210
MLflow 210
TensorFlow Serving 210
dimensionality reduction technique (DRT) 121
dinucleotide content
calculating 25
discriminative models
versus generative models 138, 139
DNA methylation 66
DNN application
gene expression prediction 76
SNP prediction 76
DNN architectures
convolutional neural networks (CNNs) 69
feed-forward neural networks (FNNs) 69
graph neural networks (GNNs) 69
recurrent neural networks (RNNs) 69
DragoNN 80
URL 80
drifts
addressing 223
dropouts 126
E
encoder 125
Encyclopedia of DNA Elements (ENCODE) 33
engineered features
versus learned features 159
exabytes (EB) 33
Explanation Summary (ExSum) 194
exploratory data analysis (EDA) 15, 38, 228
extract, transform, and load (ETL) 15, 158
F
False Positive Rate (FPR) 172
feature variation 121
feed-forward neural networks (FNNs) 70, 138
feed-forward NNs (FNNs) 138
forget gate 104
forward propagation 64
fully connected (FC) layer 70
G
GANs applications 147
crease data, analysis 148, 149
DNA generation 149
for augmenting population-scale genomics data 150
gated recurrent unit (GRU) 104, 105
GC content
calculating 24
GenBank 33
gene expression 130
Gene Expression Omnibus (GEO) 41
Generative Adversarial Networks (GANs) 138-140
components 140
discriminator, training 141, 142
generative models
versus discriminative models 138, 139
generator model 139
genome sequencing 12
anger sequencing, of nucleic acids 13
genome-wide association studies (GWAS) 146
genomic data analysis 14
Biopython, using 18
cloud computing, for genomics data analysis 17
data collection 16
data transformation 16
exploratory data analysis 16
GC content, calculating 24
modeling 17
nucleotide content, calculating 24, 25
steps 15
visualization and reporting 17
genomics
CNNs, applications 90
convolutional neural networks (CNNs) 89, 90
deep learning challenges 226
deep neural network (DNN), using for 74
DNN application 76
machine learning (ML), need for 5
ML, challenges 51
pre-existing models, using 232
recurrent neural network (RNN), applications 106
recurrent neural network (RNN), use cases 106
genomics datasets
working with, challenges 143, 144
Git 205
global surrogate method 192, 193
GoLan 107
gradient descent 65
graph neural networks (GNNs) 72
Group K-Fold cross-validation 163
H
hidden Markov models (HMMs) 139
Hugging Face 205
reference link 216
Hugging Face Spaces
Human Genome Project 13
hyperparameter tuning 164, 232
Bayesian optimization 165
libraries 166
model training 181
random search 165
I
image compression 125
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 83
individual conditional expectation (ICE) 191
input gate 104
J
JUND TF
binding site location, predicting 174
K
Keras 79
kernels 84
key performance indicators (KPIs) 155
K-Fold cross-validation 163
Kipoi
URL 80
k-nearest neighbors (knn) 145
L
learned features
versus engineered features 159
learning problem 34
life sciences
machine learning for genomics 6
Local Interpretable Model-agnostic Explanations (LIME) 193
log-loss 173
long short-term memory (LSTM) 103, 105
loss function 125
M
challenges, in genomics 51
for genomics 38
in biotechnology 6
in life sciences 6
need, for genomics 5
machine learning (ML), in genomics
data collection, to preprocessing 38
feature extraction and selection 39
model deployment 40
model evaluation 39
model interpretability 40
model monitoring 40
model training 39
train-test data splitting 39
workflow 38
machine learning software
exploring 6
machine translation 226
managed solutions 209
deployment tools 210
ONNX 210
mean absolute error (MAE) 36
mean squared error (MSE) 39, 118
median absolute deviation (MAD) 131
median absolute percent error (MAPE) 174
MLflow 210
ML libraries 32
scikit-learn 33
ML use cases 40
model training 48
model
tuning 163
model degradation
concept drift 221
data drift 220
reasons 220
model deployment 205
advantages 205
as services 209
batch prediction 208
managed solutions 209
model-as-service 209
online prediction 208
pre-steps 206
through model-in-service 209
through web services 209
types 208
model development
appropriate algorithm, selecting 161
steps 160
model evaluation 169
classification metrics or performance statistics 169
performance visualization 172
regression metrics 173
model interpretability 184
business value, unlocking from 187
methods 189
use cases 194
model interpretability methods
ExSum 194
individual conditional expectation (ICE) 191
partial dependence plot 190
permuted feature importance (PFI) 191, 192
saliency map 194
Shapley value 193
models
degrading, reasons 220
monitoring, need for 220
monitoring, with advanced tools 220
model training
data partitioning 161
multidimensional scaling (MDS) 37
multi-layer perceptrons (MLPs) 69
N
nanopore sequencing 14
National Center for Biotechnology Information (NCBI) 33
natural language processing (NLP) 66, 226
neural machine translation model (NMT) 108
definition 56
neural network architecture, creating 197
model evaluation 198
next-generation sequencing (NGS) 13, 226
evolution 14
second-generation DNA sequencing technologies 14
third-generation DNA sequencing technologies 14
non-linearity 67
normalized empirical probability distribution function (NEPDF) 93
novelty detection 120
nucleotide 12
O
output gate 104
Oxford Nanopore Technology (ONT) 106
P
Pandas 32
parameter sweeping 164
partial dependence plot (PDP) 190
permuted feature importance (PFI) 191, 192
pip
used, for Biopython installation 11
pitfalls, for applying deep learning to genomics 228
confounding 228
data leakage 229
improper model comparisons 230
PLoS Biol
reference link 5
polymerase chain reaction (PCR) 14
positional weight matrix (PWF) 5
position frequency matrix (PFM) 29
positioning weight matrix (PWM) 71
precision-recall (PR) curve 173
predictive modeling 17
principal component analysis (PCA) 121, 147, 158
ProLanGo
reference link 107
Python
download link 11
Python packages
Matplotlib 32
Pandas 32
seaborn 32
Python programming language 6
PyTorch 78
model and data 79
model deployment 79
R
R2 174
receiver operating characteristic (ROC) 172, 173
Rectified Linear Unit (ReLu) 61
recurrent neural network (RNN) 96-99, 138
applications, in genomics 106
backpropagation 102
understanding, through TFBS predictions 110
use cases, in genomics 106
recurrent neural network (RNN), types 105
many-to-many 106
many-to-one 106
one-to-many 105
one-to-one 105
regression metrics 173
R2 174
RMSE 173
regularization 65
data augmentation 66
dropout 66
Elastic Net 66
Lasso 66
ridge 66
regularized autoencoders 128
contractive autoencoders 129
denoising autoencoders 129
ReLU activation function 68
research and development (R&D) 187
reset gate 105
Ribo-sequencing (Ribo-seq) 38
RNA sequencing (RNA-seq) 77
RNN architectures 102
bidirectional RNN 103
gated recurrent unit (GRU) 103, 105
long short-term memory (LSTM) 103, 105
root mean squared error (RMSE) 169, 173
S
saliency map 194
scikit-learn 7
seaborn 32
sequence record 19
sequencing by synthesis (SBS) 14
SHapley Additive ExPlanations (SHAP) 40
Shapley value 193
Single-cell RNA sequencing (scRNA-seq) 126, 127
single-molecule sequencing real-time (SMRT) 14
single nucleotide polymorphisms (SNPs) 35, 76
singular value decomposition (SVD) 37
Spaces 205
stochastic gradient descent (SGD) 164
stratified K-Fold cross-validation 163
Streamlit
Streamlit-based application
building, of CNN model 211
structural zeros 126
Structured Query Language (SQL) 158
style transfer 137
supervised learning (SL) 138, 162
types 35
working with 36
support vector machines (SVM) 92
synapse 57
synthetic data, for genomics 145
Synthetic Minority Oversampling TEchnique (SMOTE) 145
T
Telomere to Telomere (T2T) effort 76
TensorFlow 78
TensorFlow Data Validation (TFDV) 222
TensorFlow Serving 210
tensors 78
TFBS prediction problem
data collection 175
data preprocessing 176
data, processing 175
data wrangling 175
framing, in terms of DL 175
TFBS predictions, via ChIP-seq experiment 111
data collection 111
data preprocessing 112
data splitting 112
The Cancer Genome Atlas (TCGA) , 33
Torch 78
Transcription Factor Binding Site (TFBS) 208, 229
Transcription Factor Binding Site (TFBS) predictions
recurrent neural network (RNN), understanding through 109, 110
Transcription factors (TF) 109, 188
transfer function 60
transfer learning (TL) 88, 121
True Positive Rate (TPR) 172
Truncated backpropagation through time (TBPTT) 102
U
Universal Approximation Theorem 56
unsupervised DL 118
anomaly detection 119
association 121
clustering 118
unsupervised learning (UL) 169
unsupervised ML methods 37
clustering 37
dimensionality reduction 37
types 37
update gate 105
use cases, model interpretability
data collection 195
feature extraction 195
Neurol Network architecture, creating 197
target labels 196
train-test split 196
V
vanilla autoencoder 127
visualization libraries, Python 7
W
weights 60
whole-exome sequencing (WES) 38
whole-genome sequencing (WGS) 38
whole-transcriptome sequencing (WTS) 38