3.2. RESEARCH CHALLENGES 27
3.2.2 NETWORK ARCHITECTURES
e current model design in deep learning research often depends largely on rules of thumb and
heuristic techniques. Currently, there are no efficient techniques for choosing the optimal net-
work hyperparameters, and trial-and-error is often used instead [125, 126]. A common method
for hyperparameter tuning is Grid Search [127, 128], where a range of values for each hyper-
parameter are chosen, and training is completed for different hyperparameter setups. e best
performing hyperparameter values are then chosen for the final design. However, this heuris-
tic approach to network parameter tuning is very time intensive and error prone [129, 130].
is hinders the research process, as a significant part of the research project can be used for
tuning the model parameters to increase the accuracy of the model. is also ties into the first
research challenge, as the need to re-train the model with slightly changed parameters increases
the amount of computation needed to arrive at the final model design. ere is ongoing research
in automated neural architecture search methods looking to solve this problem [131135].
3.2.3 GOAL SPECIFICATION
Goal specification is an issue specifically in reinforcement learning. e goal of any reinforce-
ment learning model is to learn a control policy which maximizes the rewards. erefore, the
reward function is the means by which the designer describes the desired behavior the model
should learn [136, 137]. However, this can be difficult to do effectively for complex tasks. In au-
tomated driving, multiple conflicting objects need to be balanced against one another. A poorly
designed reward function cannot only hinder the convergence of the model, but also cause un-
expected model behavior. erefore, care needs to be taken that the reward function adequately
represents the desired behavior, and no unexpected ways to exploit the reward function exist. A
classic example of such reward hacking is a robot trained for ball paddling, where the reward
function is the distance between the ball and a desired height [138]. Although the intended
behavior for the robot is to hit the ball such that it reaches the desired height consistently, the
robot may learn to exploit the reward function by resting the ball on the paddle and raising the
paddle and ball together to the target height. erefore, the reward function should be designed
such that there are no unintended ways of obtaining high rewards [139].
3.2.4 GENERALIZATION
Generalization is a key challenge when working in any complex operational environment. is
is especially true in autonomous driving, as the operational environment is very complex, and
can differ drastically between different scenarios. Differences in road type (urban vs. highway),
weather (dry vs. snow), regional road laws, local signs and road marks, traffic, and host vehicle
parameters can all have a significant effect on the driving behavior required, and the autonomous
vehicle must be able to adapt to these different environments. However, due to the complexity of
the operational environment, testing the system in all possible scenarios is unfeasible [140, 141].
erefore, the model needs to be able to generalize to previously unseen scenarios. ere are
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset