4.2. DISCUSSION 39
different image transformations: changing brightness, changing contrast, translation, scaling,
horizontal shearing, rotation, blurring, fog effect, and rain effect. However, instead of using
majority voting in an ensemble of deep neural networks like DeepXplore, DeepTest compares
the output in the transformed image to the labeled correct output in the original image to identify
erroneous behavior. e disadvantage of this method is that false positives can occur in situations
where the new output is in fact. For example, in an image which has been transformed by hori-
zontal shearing, a different steering angle may be required to stay on the road compared to the
original image. e DeepTest framework was applied to three top scoring deep neural networks
from the Udacity self-driving challenge, where deep learning was used to predict steering angles
given an input image from a front-facing camera mounted on a vehicle. e results showed that
the system can find thousands of erroneous behavior cases for each network. However, due to
the above-mentioned drawback, a trade-off between the number of identified erroneous cases
and frequency of false positives needs to be made. For instance, in one test case among 6339
reported erroneous cases, 130 false positives were identified through manual investigation. e
drawback is that false positives are difficult to identify and currently require manual checking.
Furthermore, similar to DeepXplore, identified erroneous test cases can be used to train the
system for improved robustness.
4.2 DISCUSSION
Deep neural networks have a number of challenges which render traditional verification and val-
idation methods ineffective. e verification and validation of these systems is challenging due
to the high complexity and nonlinearity inherent to them [214]. e lack of interpretability in
these systems means that the systems cannot be modeled mathematically, limiting the effective-
ness of formal methods. Also, due to the large parameter space in complex tasks such as driving,
testing the system for all possible inputs is impossible which limits the effectiveness of black
box testing methods [215]. e validation of online learning systems has further challenges due
to the potential ability to continue learning and change their internal structure over time after
deployment. It is therefore difficult to ensure that the desired behavior, even if proven at design
time, does not drift to undesired behavior due to the adaptation of the system to the operation
environment. ese adaptations, while typically beneficial to system performance, could lead to
undesirable behavior due to poor design or poor inputs (e.g., erroneous inputs from faulty sen-
sors). erefore, static verification and validation methods are not enough for online learners.
is means that verification and validation for online learners require run-time methods [160].
Hence, run-time monitoring and software safety cage methods will play a critical role in ensur-
ing the safety of online learning systems. Furthermore, validating the software algorithm itself
is not enough hence emphasis must also be given to validating the training data. Varshney and
Alemzadeh [216] noted that the influence of training data on a neural network systems behavior
can be as impactful as the algorithm itself. erefore, efforts must be made to ensure that the
training data is adequate for training the system for its desired task. e training data validation
40 4. SAFETY VALIDATION OF NEURAL NETWORKS
must include consideration for the volume of data, coverage of critical situations, minimization
of unknown critical situations, and representativeness of the operational environment [141].
Another challenge for autonomous vehicle applications is the adequacy of current stan-
dards such as ISO 26262 for safety validation of neural network systems. ese safety standards
have helped develop industry practices to address safety in a systematic way. However, Salay et
al. [217] noted that ISO 26262 in its current form does not address machine learning methods
adequately. Salay et al. identified five factors from machine learning which will likely impact
ISO 26262 and require changes in the standard: (i) identifying hazards, (ii) faults and failure
modes, (iii) use of training sets, (iv) level of machine learning usage, and (v) required software
techniques. First, identifying hazards, as specified by ISO 26262 currently, is an issue as ma-
chine learning can create new types of hazards which might not necessarily fit the definition
of hazards as given by the standard. erefore, the definition of hazards in ISO 26262 should
be revised to also consider harm potentially caused by complex behavioral interactions between
the autonomous vehicle and humans that are not due to a system malfunction. Second, faults
and failure modes will be further affected by machine learning methods as they will introduce
machine learning-specific faults in network topology, learning algorithm, or training set, which
will need to be addressed by ISO 26262. erefore, ISO 26262 should require the use of fault
detection tools and techniques which take into account the unique features of machine learning.
ird, the use of training sets is problematic from the perspective of ISO 26262 certification,
as it breaks an assumption made by the standard that component behavior is fully specified and
each refinement can be verified with respect to its specification. However, where training sets are
used in place of specifications, this assumption is not valid (as training sets are inherently incom-
plete). erefore, training set coverage should be considered instead of completeness. System
specification may be an issue for systems with more advanced functionality, such as perception
of the environment, as these may be inherently unspecifiable. Hence, complete specification re-
quirement in ISO 26262 should be relaxed. Fourth, the level of machine learning usage could
be a further issue as ISO 26262 assumes the software can be defined as an architecture con-
sisting of components and their interactions in a hierarchical structure. However, this is not
always the case for machine learning systems. For example, in end-to-end systems there are no
sub-components or hierarchical structure and therefore these systems challenge the assumptions
in the standard. Moreover, ISO 26262 mandates use of modularity principles such as restrict-
ing the size of components and maximizing the cohesion within a component, which could be
problematic for machine learning components that lack transparency and therefore cannot apply
these principles. Finally, the required software techniques in ISO 26262 are a further challenge
for machine learning methods as many of them assume that an imperative programming lan-
guage is being used. Salay et al. assessed the 75 software techniques required by ISO 26262 and
found that approximately 40% of these are not applicable to machine learning, while the rest are
either directly applicable or can be applied if adapted in some way. erefore, ISO 26262 soft-
ware techniques should perhaps focus more on the intent than on specific details. Additionally,
4.2. DISCUSSION 41
more work is required on the software development requirements for machine learning systems
to provide requirement criteria for issues such as training, validation, test datasets, training data
pre-processing, and management of large data sets [218].
It is clear that verification and validation efforts should be carried out throughout the
lifecycle of the system. e process should start with a clear validation plan with the requirements
of the network in mind. To help build a case for safety of the system, approaches such as Goal
Structuring Notation (GSN) [219] could be utilized. GSN specifies a set of goals for the system
which, if fulfilled, can be used as evidence for arguing the safety case of the system. See Fig. 4.1
for an example of a general GSN format for a neural network system suggested by Kurd et
al. [179]. It should be noted that for a complete GSN, additional hazards specific to the intended
application domain should be considered and used to build additional safety goals. During the
design phase, safety of the system should be focused on inherently safe design, detailed system
specification, and feasibility analysis of the system. During the training phase, validation efforts
could include analyzing the adequacy of the training data, verification of the training process, and
evaluation of generalization capabilities. After training, the complete system should be validated
Input-output functions
for the neural network
have been safely mapped
Observable behavior of the
neural network must be
predictable and repeatable
e neural network
tolerates faults
in its inputs
e neural network
does not create
hazardous outputs
Use of network in safety-
critical context must ensure
specific requirements
are met
Acceptably safe” will be
determined by the
satisfaction of safety criteria
Argument over key
safety criteria
Neural network is acceptably
safe to perform a specified
function within the safety-
critical context
Neural network
model definition
C2
C3
S1
G1
G2 G3 G4 G5
Function may partially
or completely satisfy
target function
Includes known and
unknown inputs
A fault is classified as
an input that lies outside
the specified input set
Hazardous output is defined as
an output which is outside the
specified set or target function
C4 C5 C6 C7
C1
Figure 4.1: An example of a goal structuring notation for safety case argumentation of a deep
neural network, where the boxes illustrate the goals, rounded boxes provide context for a goal,
and rhomboids denote strategies to fulfill the goals. Adapted from [179].
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset