Index

As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.

A

activation function 60

binary step function 60

leaky ReLU 62

ReLu 61

sigmoid 62

softmax 63, 64

tanh 63

activator protein (AP) 175

advanced tools, model monitoring

data quality tools 222

DL monitoring tools 223

monitoring 222

system monitoring tools 222

AI, in genomics market

reference link 6

Amazon SageMaker 210

Amazon Web Services (AWS) 17

Analysis using Denoising Autoencoders for Gene Expression (ADAGE)

reference link 130

anomaly detection 119

novelty detection 120

outlier detection 119, 120

area under the curve (AUC) 172

artificial neural networks (ANNs) 4

artificial neuron (AN) 57

association 121

autoencoder applications, for predicting gene expression

ADAGE 130

gene expression clustering, boosting 131

hierarchical organization, of yeast transcriptomic machinery 131

autoencoders 73, 74, 121

architecture 124

bottleneck 125

convolutional autoencoders 127, 128

decoder 125

deep autoencoders 127

encoder 125

gene expression 130

gene expression use case 131

image compression 125

loss function 125

properties 122

regularization 125

regularized autoencoders 128

Single-cell RNA sequencing (scRNA-seq) 126, 127

vanilla autoencoder 127

working 122-124

automated ML (AutoML) 227

B

backpropagation 64, 65

backpropagation through time (BPTT) 102

balanced data 229

bases 13

Bayes algorithm 139

bias 60, 68

bidirectional RNN 103

biological neuron 57

Biopython 7, 18

FASTA 20

GenBank 21

installation, verifying 12

installing 10

installing, pip used 11

Python installation, verifying 10

SeqIO object 20

seq object 18, 19

SeqRecord object 19

sequences, working with 18

using, for genomic data analysis 18

black-box model interpretability 185, 186

bottleneck 125

business value, unlocking from model interpretability 187

business decisions 187-189

profitability 189

trust, building 189

C

ChIP-seq experiment

TFBS predictions via 111

ChIP-sequencing (ChIP-seq) 38

classification metrics or performance statistics, model evaluation

accuracy 170, 171

Precision 171, 172

Recall 171, 172

clustering 37

CNN architecture 84

convolutional layer 84, 85

fully connected layer 86, 87

input layer 84

output layer 87

pooling layer 86

CNN for coexpression (CNNC) 93

coding sequence (CDS) 35

computer vision (CV) 137, 226

considerations, for algorithm implementation

memory requirements 161

model interpretability 161

training time 161

continuous integration and continuous development (CI/CD) 232

contractive autoencoders 129

convolutional autoencoders 127, 128

convolutional neural networks (CNNs) 70, 71, 82, 83

applications, in genomics 90

for genomics 89, 90

history 83

Cortex 210

cross-validation 163

D

DanQ 108

reference link 108

data collection 157, 158

data leakage 228

data partitioning 161, 162

cross-validation 163

dataset, training 162

Group K-Fold cross-validation 163

holdout dataset 162

K-Fold cross-validation 163

random partitioning 162

stratified K-Fold cross-validation 163

stratified partitioning 162

validation dataset 162

data preprocessing

data augmentation 159

data cleaning 158

data transformation 159

data processing 157

data transformation

data processing 16

quality control and cleaning 16

data wrangling

data preprocessing 158, 159

decoder 125

deep autoencoders 127

deep autoregressive models 137

DeepBind 90

DeepChrome 92

DeepInsight 91

deep learning (DL) 4, 51, 56

DragoNN 80

Kipoi 80

life cycle 154-156

neural network definition 56

workflow, for genomics 74, 75

deep learning, applying to genomics

best practices 230-234

deep learning challenges

computational resource requirements 227

expertise in DL frameworks 227

fewer biological samples 226

lack of flexible tools 226

lack of high-quality labeled data 227

lack of model interpretability 227, 228

deep learning libraries

Keras 79

PyTorch 78

TensorFlow 78

deep learning model

creating 211

deploying, as web service 211

exporting 211

monitoring 222

DeepNano 107

reference link 106

deep neural network (DNN) 58, 121, 159

activation function 60

anatomy 56

application, in genomics 76

architectures 69

backpropagation 64

bias 60

for genomics 74

forward propagation 64

gene regulatory networks (GRNs) 77

gradient descent 65

hidden layer 58

input layer 58

key concepts 59

loss function 64

output layer 58, 59

protein structure predictions 77

regularization 65

regulatory genomics 77

single-cell RNA sequencing 77

transfer function 60

weights 60

DeepTarget 108, 109

reference link 108

DeepVariant 92

denoising autoencoders 129

denoising autoencoders, used for predicting gene expression from TCGA pan-cancer RNA-Seq data 131

data collection 131

data preprocessing 131

model training 132-135

deoxyribonucleoside triphosphates (dNTPs) 13

deployment tools

Amazon SageMaker 210

Cortex 210

MLflow 210

TensorFlow Serving 210

dimensionality reduction technique (DRT) 121

dinucleotide content

calculating 25

discriminative models

versus generative models 138, 139

DNA methylation 66

DNN application

gene expression prediction 76

SNP prediction 76

DNN architectures

autoencoders 73, 74

convolutional neural networks (CNNs) 69

feed-forward neural networks (FNNs) 69

graph neural networks (GNNs) 69

recurrent neural networks (RNNs) 69

DragoNN 80

URL 80

drifts

addressing 223

dropouts 126

E

encoder 125

Encyclopedia of DNA Elements (ENCODE) 33

engineered features

versus learned features 159

exabytes (EB) 33

Explanation Summary (ExSum) 194

exploratory data analysis (EDA) 15, 38, 228

extract, transform, and load (ETL) 15, 158

F

False Positive Rate (FPR) 172

feature engineering 159, 233

feature variation 121

feed-forward neural networks (FNNs) 70, 138

feed-forward NNs (FNNs) 138

forget gate 104

forward propagation 64

fully connected (FC) layer 70

G

GANs applications 147

crease data, analysis 148, 149

DNA generation 149

for augmenting population-scale genomics data 150

gated recurrent unit (GRU) 104, 105

GC content

calculating 24

GenBank 33

gene expression 130

Gene Expression Omnibus (GEO) 41

Generative Adversarial Networks (GANs) 138-140

components 140

discriminator, training 141, 142

generator, training 142, 143

model improvement 146, 147

working 140, 141

generative models

versus discriminative models 138, 139

generator model 139

genome sequencing 12

anger sequencing, of nucleic acids 13

genome-wide association studies (GWAS) 146

genomic big data 33, 34

genomic data analysis 14

Biopython, using 18

cloud computing, for genomics data analysis 17

data collection 16

data transformation 16

dinucleotide content 25-27

exploratory data analysis 16

GC content, calculating 24

ML modeling 27-29

modeling 17

motif 29, 30

nucleotide content, calculating 24, 25

steps 15

visualization and reporting 17

use case 21-24

genomics

CNNs, applications 90

convolutional neural networks (CNNs) 89, 90

deep learning challenges 226

deep neural network (DNN), using for 74

DNN application 76

machine learning (ML), need for 5

ML, challenges 51

pre-existing models, using 232

recurrent neural network (RNN), applications 106

recurrent neural network (RNN), use cases 106

genomics datasets

working with, challenges 143, 144

Git 205

global surrogate method 192, 193

GoLan 107

gradient descent 65

graph neural networks (GNNs) 72

Group K-Fold cross-validation 163

H

hidden Markov models (HMMs) 139

Hugging Face 205

reference link 216

Hugging Face Spaces

model deployment 216-219

Human Genome Project 13

hyperparameter tuning 164, 232

Bayesian optimization 165

grid search 164, 165

libraries 166

libraries, KerasTuner 166-168

model training 181

random search 165

I

image compression 125

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 83

individual conditional expectation (ICE) 191

input gate 104

J

JUND TF

binding site location, predicting 174

K

Keras 79

KerasTuner 166-168

kernels 84

key performance indicators (KPIs) 155

K-Fold cross-validation 163

Kipoi

URL 80

k-nearest neighbors (knn) 145

L

learned features

versus engineered features 159

learning problem 34

life sciences

machine learning for genomics 6

Local Interpretable Model-agnostic Explanations (LIME) 193

Logistic regression 40, 48

log-loss 173

long short-term memory (LSTM) 103, 105

loss function 125

M

machine learning (ML) 4, 5

challenges, in genomics 51

for genomics 38

in biotechnology 6

in life sciences 6

need, for genomics 5

machine learning (ML), in genomics

data collection, to preprocessing 38

feature extraction and selection 39

model deployment 40

model evaluation 39

model interpretability 40

model monitoring 40

model training 39

train-test data splitting 39

workflow 38

machine learning software

exploring 6

machine translation 226

managed solutions 209

deployment tools 210

ONNX 210

Matplotlib 12, 32

mean absolute error (MAE) 36

mean squared error (MSE) 39, 118

median absolute deviation (MAD) 131

median absolute percent error (MAPE) 174

MLflow 210

ML libraries 32

scikit-learn 33

ML use cases 40

data collection 41, 42

data preprocessing 42, 43

data splitting 47, 48

data transformation 45, 46

EDA 43-45

model evaluation 48-51

model training 48

model

tuning 163

model degradation

concept drift 221

data drift 220

reasons 220

model deployment 205

advantages 205

as services 209

batch prediction 208

managed solutions 209

model-as-service 209

online prediction 208

pre-steps 206

steps 206, 207

through model-in-service 209

through web services 209

types 208

model development

appropriate algorithm, selecting 161

steps 160

model evaluation 169

classification metrics or performance statistics 169

performance visualization 172

regression metrics 173

model interpretability 184

business value, unlocking from 187

methods 189

use cases 194

model interpretability methods

ExSum 194

global surrogate 192, 193

individual conditional expectation (ICE) 191

partial dependence plot 190

permuted feature importance (PFI) 191, 192

saliency map 194

Shapley value 193

models

degrading, reasons 220

monitoring, need for 220

monitoring, with advanced tools 220

model training

data partitioning 161

multidimensional scaling (MDS) 37

multi-layer perceptrons (MLPs) 69

N

nanopore sequencing 14

National Center for Biotechnology Information (NCBI) 33

natural language processing (NLP) 66, 226

neural machine translation model (NMT) 108

neural network (NN) 138, 185

definition 56

working, example 66-69

neural network architecture, creating 197

model evaluation 198

model interpretation 199, 200

model training 197, 198

next-generation sequencing (NGS) 13, 226

evolution 14

second-generation DNA sequencing technologies 14

third-generation DNA sequencing technologies 14

non-linearity 67

normalized empirical probability distribution function (NEPDF) 93

novelty detection 120

nucleotide 12

content, calculating 24, 25

O

outlier detection 119, 120

output gate 104

Oxford Nanopore Technology (ONT) 106

P

Pandas 32

parameter sweeping 164

partial dependence plot (PDP) 190

permuted feature importance (PFI) 191, 192

pip

used, for Biopython installation 11

pitfalls, for applying deep learning to genomics 228

confounding 228

data leakage 229

improper model comparisons 230

PLoS Biol

reference link 5

polymerase chain reaction (PCR) 14

positional weight matrix (PWF) 5

position frequency matrix (PFM) 29

positioning weight matrix (PWM) 71

precision-recall (PR) curve 173

predictive modeling 17

principal component analysis (PCA) 121, 147, 158

ProLan 107, 108

ProLanGo

reference link 107

Python

download link 11

Python packages

Matplotlib 32

Pandas 32

seaborn 32

Python programming language 6

PyTorch 78

model and data 79

model deployment 79

R

R2 174

receiver operating characteristic (ROC) 172, 173

Rectified Linear Unit (ReLu) 61

recurrent neural network (RNN) 96-99, 138

applications, in genomics 106

backpropagation 102

understanding, through TFBS predictions 110

use cases, in genomics 106

working 99-101

recurrent neural network (RNN), types 105

many-to-many 106

many-to-one 106

one-to-many 105

one-to-one 105

regression metrics 173

R2 174

RMSE 173

regularization 65

data augmentation 66

dropout 66

Elastic Net 66

Lasso 66

ridge 66

regularized autoencoders 128

contractive autoencoders 129

denoising autoencoders 129

sparse autoencoder 128, 129

ReLU activation function 68

research and development (R&D) 187

reset gate 105

Ribo-sequencing (Ribo-seq) 38

RNA sequencing (RNA-seq) 77

RNN architectures 102

bidirectional RNN 103

gated recurrent unit (GRU) 103, 105

long short-term memory (LSTM) 103, 105

root mean squared error (RMSE) 169, 173

S

saliency map 194

scikit-learn 7

seaborn 32

sequence record 19

sequencing by synthesis (SBS) 14

SHapley Additive ExPlanations (SHAP) 40

Shapley value 193

Single-cell RNA sequencing (scRNA-seq) 126, 127

single-molecule sequencing real-time (SMRT) 14

single nucleotide polymorphisms (SNPs) 35, 76

singular value decomposition (SVD) 37

Spaces 205

sparse autoencoders 128, 129

stochastic gradient descent (SGD) 164

stratified K-Fold cross-validation 163

Streamlit

installing 204, 205

Streamlit-based application

building, of CNN model 211

creating 211-216

structural zeros 126

Structured Query Language (SQL) 158

style transfer 137

supervised learning (SL) 138, 162

supervised ML 34, 35

types 35

working with 36

support vector machines (SVM) 92

synapse 57

synthetic data, for genomics 145

techniques 145, 146

Synthetic Minority Oversampling TEchnique (SMOTE) 145

T

Telomere to Telomere (T2T) effort 76

TensorFlow 78

TensorFlow Data Validation (TFDV) 222

TensorFlow Serving 210

tensors 78

TFBS prediction problem

data collection 175

data preprocessing 176

data, processing 175

data wrangling 175

feature engineering 176-178

framing, in terms of DL 175

model training 179, 180

TFBS predictions, via ChIP-seq experiment 111

data collection 111

data preprocessing 112

data splitting 112

model training 112-115

The Cancer Genome Atlas (TCGA) , 33

Torch 78

Transcription Factor Binding Site (TFBS) 208, 229

Transcription Factor Binding Site (TFBS) predictions

recurrent neural network (RNN), understanding through 109, 110

Transcription factors (TF) 109, 188

transfer function 60

transfer learning (TL) 88, 121

True Positive Rate (TPR) 172

Truncated backpropagation through time (TBPTT) 102

U

Universal Approximation Theorem 56

unsupervised DL 118

anomaly detection 119

association 121

clustering 118

unsupervised learning (UL) 169

unsupervised ML methods 37

clustering 37

dimensionality reduction 37

types 37

update gate 105

use cases, model interpretability

data collection 195

feature extraction 195

Neurol Network architecture, creating 197

target labels 196

train-test split 196

V

vanilla autoencoder 127

visualization libraries, Python 7

W

weights 60

whole-exome sequencing (WES) 38

whole-genome sequencing (WGS) 38

whole-transcriptome sequencing (WTS) 38

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset