Dataset (medical)

The dataset used in this chapter is publicly available at the UCI Machine Learning Repository maintained by the School of Information and Computer Science at the University of California. You can access this at https://archive.ics.uci.edu/ml/datasets/cardiotocography.

It is to be noted that this URL enables you to download an Excel data file. This file can be easily converted to a .csv format by saving the file as a .csv file.

For data we should use the formatting which is used for .csv, as shown in the following code:

# Read data
library(keras)
data <- read.csv('~/Desktop/data/CTG.csv', header=T)
str(data)

OUTPUT
## 'data.frame': 2126 obs. of 22 variables:
## $ LB : int 120 132 133 134 132 134 134 122 122 122 ...
## $ AC : num 0 0.00638 0.00332 0.00256 0.00651 ...
## $ FM : num 0 0 0 0 0 0 0 0 0 0 ...
## $ UC : num 0 0.00638 0.00831 0.00768 0.00814 ...
## $ DL : num 0 0.00319 0.00332 0.00256 0 ...
## $ DS : num 0 0 0 0 0 0 0 0 0 0 ...
## $ DP : num 0 0 0 0 0 ...
## $ ASTV : int 73 17 16 16 16 26 29 83 84 86 ...
## $ MSTV : num 0.5 2.1 2.1 2.4 2.4 5.9 6.3 0.5 0.5 0.3 ...
## $ ALTV : int 43 0 0 0 0 0 0 6 5 6 ...
## $ MLTV : num 2.4 10.4 13.4 23 19.9 0 0 15.6 13.6 10.6 ...
## $ Width : int 64 130 130 117 117 150 150 68 68 68 ...
## $ Min : int 62 68 68 53 53 50 50 62 62 62 ...
## $ Max : int 126 198 198 170 170 200 200 130 130 130 ...
## $ Nmax : int 2 6 5 11 9 5 6 0 0 1 ...
## $ Nzeros : int 0 1 1 0 0 3 3 0 0 0 ...
## $ Mode : int 120 141 141 137 137 76 71 122 122 122 ...
## $ Mean : int 137 136 135 134 136 107 107 122 122 122 ...
## $ Median : int 121 140 138 137 138 107 106 123 123 123 ...
## $ Variance: int 73 12 13 13 11 170 215 3 3 1 ...
## $ Tendency: int 1 0 0 1 1 0 0 1 1 1 ...
## $ NSP : int 2 1 1 1 1 3 3 3 3 3 ...

This data consists of fetal CTGs, and the target variable classifies a patient into one of three categories: normal, suspect, and pathological. There are 2,126 rows in this dataset. The CTGs are classified by three expert obstetricians, and a consensus classification label is assigned to each of them as normal (N) (represented by 1), suspect (S) (represented by 2), and pathological (P) (represented by 3). There are 21 independent variables, and the main objective is to develop a classification model to correctly classify each patient into one of the three categories represented by N, S, and P.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset