For disease diagnosis, we are going to use the free dataset proben1, which is available on the Web (http://www.filewatcher.com/m/proben1.tar.gz.1782734-0.html). Proben1 is a benchmark set of several datasets from different domains. We are going to use the cancer and the diabetes datasets. We add a class to run the experiments of each case: DiagnosisExample
.
The breast cancer dataset is composed of 10 variables, of which nine are inputs and one is a binary output. The dataset has 699 records, but we excluded from them 16 which were found to be incomplete, thus we used 683 to train and test the neural network.
The following table shows a configuration of this dataset:
Variable Name |
Type |
Maximum Value and Minimum Value |
---|---|---|
Diagnosis result |
OUTPUT |
[0; 1] |
Clump Thickness |
INPUT #1 |
[1; 10] |
Uniformity of Cell Size |
INPUT #2 |
[1; 10] |
Uniformity of Cell Shape |
INPUT #3 |
[1; 10] |
Marginal Adhesion |
INPUT #4 |
[1; 10] |
Single Epithelial Cell Size |
INPUT #5 |
[1; 10] |
Bare Nuclei |
INPUT #6 |
[1; 10] |
Bland Chromatin |
INPUT #7 |
[1; 10] |
Normal Nucleoli |
INPUT #8 |
[1; 10] |
Mitoses |
INPUT #9 |
[1; 10] |
So, the proposed neural topology will be that of the following figure:
The dataset division was made as follows:
As in the previous cases, we performed many experiments to try to find the best neural network to classify whether cancer is benign or malignant. So we conducted 12 different experiments (1,000 epochs per experiment), wherein MSE and accuracy values were analyzed. After that, the confusion matrix, sensitivity, and specificity were generated with the test dataset and analysis was done. Finally, an analysis of generalization was taken. The neural networks involved in the experiments are shown in the following table:
Experiment |
Number of neurons in hidden layer |
Learning rate |
Activation Function |
---|---|---|---|
#1 |
3 |
0.1 |
Hidden Layer: SIGLOG Output Layer: LINEAR |
#2 |
Hidden Layer: HYPERTAN Output Layer: LINEAR | ||
#3 |
0.5 |
Hidden Layer: SIGLOG Output Layer: LINEAR | |
#4 |
Hidden Layer: HYPERTAN Output Layer: LINEAR | ||
#5 |
0.9 |
Hidden Layer: SIGLOG Output Layer: LINEAR | |
#6 |
Hidden Layer: HYPERTAN Output Layer: LINEAR | ||
#7 |
5 |
0.1 |
Hidden Layer: SIGLOG Output Layer: LINEAR |
#8 |
Hidden Layer: HYPERTAN Output Layer: LINEAR | ||
#9 |
0.5 |
Hidden Layer: SIGLOG Output Layer: LINEAR | |
#10 |
Hidden Layer: HYPERTAN Output Layer: LINEAR | ||
#11 |
0.9 |
Hidden Layer: SIGLOG Output Layer: LINEAR | |
#12 |
Hidden Layer: HYPERTAN Output Layer: LINEAR |
After each experiment, we collected MSE values (Table X); experiments #4, #8, #9, #10, and #11 were equivalents, because they have low MSE values and same total accuracy measure (92.25%). Therefore, we selected experiments #4 and #11, because they have low MSE values among the five experiments mentioned before:
Experiment |
MSE training rate |
Total accuracy |
---|---|---|
#1 |
0.01067 |
96.29% |
#2 |
0.00443 |
98.50% |
#3 |
9.99611E-4 |
97.77% |
#4 |
9.99913E-4 |
99.25% |
#5 |
9.99670E-4 |
96.26% |
#6 |
9.92578E-4 |
97.03% |
#7 |
0.01392 |
98.49% |
#8 |
0.00367 |
99.25% |
#9 |
9.99928E-4 |
99.25% |
#10 |
9.99951E-4 |
99.25% |
#11 |
9.99926E-4 |
99.25% |
#12 |
NaN |
3.44% |
Graphically, the MSE evolution over time is very fast, as can be seen in the following chart of the fourth experiment. Although we used 1,000 epochs to train, the experiment stopped earlier, because the minimum overall error (0.001) was reached:
The confusion matrix is shown in the table with the sensibility and specificity for both experiments. It is possible to check that measures are the same for both experiments:
Experiment |
Confusion Matrix |
Sensibility |
Specificity |
---|---|---|---|
#4 |
[[34.0, 1.0] [0.00, 99.0]] |
97.22% |
100.0% |
#11 |
[[34.0, 1.0] [0.00, 99.0]] |
97.22% |
100.0% |
If we had to choose between models generated by experiments #4 or #11, we recommend selecting #4, because it's simpler than #11 (it has fewer neurons in the hidden layer).
An additional example to be explored is the diagnosis of diabetes. This dataset has eight inputs and one output, shown in the table below. There are 768 records, all complete. However, proben1 states that there are several senseless zero values, probably indicating missing data. We're handling this data as if it was real anyway, thereby introducing some errors (or noise) into the dataset:
Variable Name |
Type |
Maximum Value and Minimum Value |
---|---|---|
Diagnosis result |
OUTPUT |
[0; 1] |
Number of times pregnant |
INPUT #1 |
[0.0; 17] |
Plasma glucose concentration a 2 hours in an oral glucose tolerance test |
INPUT #2 |
[0.0; 199] |
Diastolic blood pressure (mm Hg) |
INPUT #3 |
[0.0; 122] |
Triceps skin fold thickness (mm) |
INPUT #4 |
[0.0; 99] |
2-Hour serum insulin (mu U/ml) |
INPUT #5 |
[0.0; 744] |
Body mass index (weight in kg/(height in m)^2) |
INPUT #6 |
[0.0; 67.1] |
Diabetes pedigree function |
INPUT #7 |
[0.078; 2420] |
Age (years) |
INPUT #8 |
[21; 81] |
The dataset division was made as follows:
To discover the best neural net topology to classify diabetes, we used the same schema of neural networks with the same analysis described in the last section. However, we're using multiple class classification in the output layer: two neurons in this layer will be used, one for the presence of diabetes and one for absence.
So, the proposed neural architecture looks like that of the following figure:
The table below shows the MSE training value and accuracy of the first six experiments and of the last six experiments:
Experiment |
MSE training rate |
Total accuracy |
---|---|---|
#1 |
0.00807 |
60.54% |
#2 |
0.00590 |
71.03% |
#3 |
9.99990E-4 |
75.49% |
#4 |
9.98840E-4 |
74.17% |
#5 |
0.00184 |
61.58% |
#6 |
9.82774E-4 |
59.86% |
#7 |
0.00706 |
63.57% |
#8 |
0.00584 |
72.41% |
#9 |
9.99994E-4 |
74.66% |
#10 |
0.01047 |
72.14% |
#11 |
0.00316 |
59.86% |
#12 |
0.43464 |
40.13% |
The fall of the MSE is fast in both cases. However, experiment #9 generates an increase of error rate in the first values. It is shown in the following figure:
Analyzing the confusion matrixes, it can be seen that the measures are very similar:
Experiment |
Confusion Matrix |
Sensibility |
Specificity |
---|---|---|---|
#3 |
[[35.0, 12.0] [25.0, 79.0]] |
74.46% |
75.96% |
#9 |
[[34.0, 12.0] [26.0, 78.0]] |
73.91% |
75.00% |
One more time, we suggest choosing the simplest model. In the diabetes example, it is the artificial neural network generated by experiment #3.