Black Box Testing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Visualization

Next Chapter

Discussion

36 4. SAFETY VALIDATION OF NEURAL NETWORKS

performance. erefore, the performance of each neural network should be evaluated on the test

data, and inadequately performing networks should be excluded from the ﬁnal ensemble.

Investigating the use of cross validation for testing the robustness of deep neural net-

works, Pei et al. [198] proposed an automated testing system for deep neural networks called

DeepXplore. Pei et al. introduced neuron coverage as a measure for the extent to which the

network’s logic space has been tested. Neuron coverage calculates the fraction of neurons in the

neural network which were activated during the testing process. A neuron is considered to have

been activated if its output exceeds a pre-deﬁned threshold limit. Using this measure of neuron

coverage, DeepXplore can systematically test the deep neural network for erroneous behavior

and synthesize new inputs with the aim to maximize neuron coverage, thereby ensuring that

all parts of the network are covered during the testing process. e system compares inputs be-

tween multiple deep neural networks built for the same task and cross validates the outputs to

identify erroneous behavior. For example, given an ensemble of three networks built to steer

a vehicle, if two of the networks decide to turn right while one decides to turn left, the latter

one is assumed to be behaving erroneously. is has the advantage that no manual labeling for

synthesized test inputs is required. However, the disadvantage is that erroneous behavior can

only be detected if at least one network outputs diﬀerent decisions than the other networks in

the ensemble. If all networks show the same erroneous behavior, DeepXplore will fail to identify

it. Also, the system assumes that the decision made by the majority of the networks is correct,

which may not always be the case. To test the deep neural networks for erroneous behavior,

DeepXplore generates synthesized test inputs which attempt to maximize the neuron cover-

age in the networks and produce diﬀerential behaviors between them. erefore, the system

attempts to jointly optimize these two objectives when generating test inputs. e system was

tested using ﬁve diﬀerent widely used public datasets: MNIST [199], ImageNet [200], Udac-

ity Challenge [201], Contagio/VirusTotal [202, 203], and Drebin [204, 205]. DeepXplore was

evaluated on 3 deep neural networks for each dataset, for a total of 15 networks. In the image

recognition datasets (MNIST, ImageNet, and Udacity), the synthesized test inputs were created

by applying transformations to the original images in the dataset. ree transformations were

leveraged: (1) changing lighting conditions, (2) occluding part of the image with a rectangle to

simulate an attacker blocking a part of the image, and (3) occluding the image with multiple

small rectangles to simulate the eﬀect of dirt on the camera lens. DeepXplore was shown to

ﬁnd thousands of examples of erroneous behavior, such as an autonomous vehicle attempting to

crash into a guard rail in the Udacity dataset. Moreover, it was shown that the error inducing test

cases could be used as training data to improve the robustness of the networks. is was shown

to achieve 1–3% improved accuracy over adversarial and random training data augmentation

methods.

4.1. VALIDATION TECHNIQUES 37

4.1.5 VISUALIZATION

Visualization can be a useful tool to improve the interpretability of neural networks for veriﬁ-

cation and validation practitioners. Visualization techniques can be used to transform data to

forms that humans can interpret more easily. ere are already visualization tools and techniques

in traditional software veriﬁcation and validation which are used to create visual representations

of data to improve interpretability [174]. Visualization techniques for neural networks could, for

example, be used to create graphical representations of changes in weights or internal connec-

tions in the network, plots of error functions over the learning process to improve understanding

of the decision making process and the learning process during training. erefore, these rep-

resentations can give greater insight into the structure of the neural network, including weights

and biases and their respective changes during training [206, 207].

Visualization tools have gained signiﬁcant interest recently for the interpretation of CNN

learning. One popular technique is Activation Maximization which provides insight into which

features the CNN classiﬁers have learned to relate to diﬀerent classiﬁcations. Activation Max-

imization synthesizes an image which maximizes the output for a given neuron or output. An-

other use for this technique is to create an adversarial example, which is an image that is unrec-

ognizable to humans but outputs a high conﬁdence classiﬁcation by the CNN [167]. However,

Simonyan et al. [208] introduced a regularization technique into this process, which created

more recognizable images, giving insight into the kind of features the CNN classiﬁer was look-

ing for in the speciﬁc classes. Yosinski et al. [209] further extended this method with better

regularization techniques as well as investigation of neurons at all layers, rather than limiting

the study to the output neurons. is showed that neurons at diﬀerent layers were learning

diﬀerent features, with higher layers learning more complex and abstract features (e.g., faces,

wheels, eyes) while the lower layers were learning more basic features (e.g., edges and corners).

erefore, this type of visualization shows great potential to improve interpretability of CNNs,

as they can provide signiﬁcant insight into what the neural network has learned [209]. For in-

stance, such visualization techniques were used by Bojarski et al. [87, 210] in the NVIDIA

PilotNet project to visualize the internal state of the CNN used for steering. By studying the

activations within diﬀerent layers of the trained CNN, they were able to gain a better under-

standing of what features the neural network had learned to recognize. e analysis showed that

even with only the human steering angles as training input, the CNN had learned to recognize

useful road features such as the edges of the road. e authors also investigated the activations

in the network when given an image with no road as input and found that the activations of the

two feature maps mostly contained noise, indicating that the CNN found no useful features in

the image. erefore, the CNN only learned to recognize features that were useful for its task,

such as road-related features.

Another useful visualization technique investigates the importance of diﬀerent neurons

in creating predictions. By analyzing the gradients ﬂowing into the last convolutional layer in

the CNN, the contribution of each neuron to the ﬁnal prediction can be determined. is in-

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Black Box Testing

Create new playlist

Sign In

Sign Up

Table of Contents for
Black Box Testing