36 4. SAFETY VALIDATION OF NEURAL NETWORKS
performance. erefore, the performance of each neural network should be evaluated on the test
data, and inadequately performing networks should be excluded from the final ensemble.
Investigating the use of cross validation for testing the robustness of deep neural net-
works, Pei et al. [198] proposed an automated testing system for deep neural networks called
DeepXplore. Pei et al. introduced neuron coverage as a measure for the extent to which the
networks logic space has been tested. Neuron coverage calculates the fraction of neurons in the
neural network which were activated during the testing process. A neuron is considered to have
been activated if its output exceeds a pre-defined threshold limit. Using this measure of neuron
coverage, DeepXplore can systematically test the deep neural network for erroneous behavior
and synthesize new inputs with the aim to maximize neuron coverage, thereby ensuring that
all parts of the network are covered during the testing process. e system compares inputs be-
tween multiple deep neural networks built for the same task and cross validates the outputs to
identify erroneous behavior. For example, given an ensemble of three networks built to steer
a vehicle, if two of the networks decide to turn right while one decides to turn left, the latter
one is assumed to be behaving erroneously. is has the advantage that no manual labeling for
synthesized test inputs is required. However, the disadvantage is that erroneous behavior can
only be detected if at least one network outputs different decisions than the other networks in
the ensemble. If all networks show the same erroneous behavior, DeepXplore will fail to identify
it. Also, the system assumes that the decision made by the majority of the networks is correct,
which may not always be the case. To test the deep neural networks for erroneous behavior,
DeepXplore generates synthesized test inputs which attempt to maximize the neuron cover-
age in the networks and produce differential behaviors between them. erefore, the system
attempts to jointly optimize these two objectives when generating test inputs. e system was
tested using five different widely used public datasets: MNIST [199], ImageNet [200], Udac-
ity Challenge [201], Contagio/VirusTotal [202, 203], and Drebin [204, 205]. DeepXplore was
evaluated on 3 deep neural networks for each dataset, for a total of 15 networks. In the image
recognition datasets (MNIST, ImageNet, and Udacity), the synthesized test inputs were created
by applying transformations to the original images in the dataset. ree transformations were
leveraged: (1) changing lighting conditions, (2) occluding part of the image with a rectangle to
simulate an attacker blocking a part of the image, and (3) occluding the image with multiple
small rectangles to simulate the effect of dirt on the camera lens. DeepXplore was shown to
find thousands of examples of erroneous behavior, such as an autonomous vehicle attempting to
crash into a guard rail in the Udacity dataset. Moreover, it was shown that the error inducing test
cases could be used as training data to improve the robustness of the networks. is was shown
to achieve 1–3% improved accuracy over adversarial and random training data augmentation
methods.
4.1. VALIDATION TECHNIQUES 37
4.1.5 VISUALIZATION
Visualization can be a useful tool to improve the interpretability of neural networks for verifi-
cation and validation practitioners. Visualization techniques can be used to transform data to
forms that humans can interpret more easily. ere are already visualization tools and techniques
in traditional software verification and validation which are used to create visual representations
of data to improve interpretability [174]. Visualization techniques for neural networks could, for
example, be used to create graphical representations of changes in weights or internal connec-
tions in the network, plots of error functions over the learning process to improve understanding
of the decision making process and the learning process during training. erefore, these rep-
resentations can give greater insight into the structure of the neural network, including weights
and biases and their respective changes during training [206, 207].
Visualization tools have gained significant interest recently for the interpretation of CNN
learning. One popular technique is Activation Maximization which provides insight into which
features the CNN classifiers have learned to relate to different classifications. Activation Max-
imization synthesizes an image which maximizes the output for a given neuron or output. An-
other use for this technique is to create an adversarial example, which is an image that is unrec-
ognizable to humans but outputs a high confidence classification by the CNN [167]. However,
Simonyan et al. [208] introduced a regularization technique into this process, which created
more recognizable images, giving insight into the kind of features the CNN classifier was look-
ing for in the specific classes. Yosinski et al. [209] further extended this method with better
regularization techniques as well as investigation of neurons at all layers, rather than limiting
the study to the output neurons. is showed that neurons at different layers were learning
different features, with higher layers learning more complex and abstract features (e.g., faces,
wheels, eyes) while the lower layers were learning more basic features (e.g., edges and corners).
erefore, this type of visualization shows great potential to improve interpretability of CNNs,
as they can provide significant insight into what the neural network has learned [209]. For in-
stance, such visualization techniques were used by Bojarski et al. [87, 210] in the NVIDIA
PilotNet project to visualize the internal state of the CNN used for steering. By studying the
activations within different layers of the trained CNN, they were able to gain a better under-
standing of what features the neural network had learned to recognize. e analysis showed that
even with only the human steering angles as training input, the CNN had learned to recognize
useful road features such as the edges of the road. e authors also investigated the activations
in the network when given an image with no road as input and found that the activations of the
two feature maps mostly contained noise, indicating that the CNN found no useful features in
the image. erefore, the CNN only learned to recognize features that were useful for its task,
such as road-related features.
Another useful visualization technique investigates the importance of different neurons
in creating predictions. By analyzing the gradients flowing into the last convolutional layer in
the CNN, the contribution of each neuron to the final prediction can be determined. is in-
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset