22 3. DEEP LEARNING FOR VEHICLE CONTROL
could successfully complete all overtaking manoeuvres. In contrast, Shalev-Shwartz et al. [104]
consider a more complex scenario in which an autonomous vehicle has to operate around un-
predictable vehicles. e aim of the project was to design an agent which can pass a roundabout
safely and efficiently. e performance of the agent was evaluated based on (1) keeping a safe
distance from other vehicles at all times, (2) the time to finish the route, and (3) smoothness of
the acceleration policy. e authors utilized a RNN to accomplish this task, which was chosen
due to its ability to learn the function between a chosen action, the current state, and the state
at the next time step without explicitly relying on any Markovian assumptions. Moreover, by
explicitly expressing the dynamics of the system in a transparent way, prior knowledge could be
incorporated into the system more easily. e described method was shown to learn to slowdown
when approaching roundabouts, to give way to aggressive drivers, and to safely continue when
merging with less aggressive drivers. In the initial testing, the next state was decomposed into
a predictable part, including velocities and locations of adjacent vehicles, and a non-predictable
part, consisting of the accelerations of the adjacent vehicles. e dynamics of the predictable
part of the next state was provided to the agent. However, in the next phase of testing, all state
parameters at the next time step were considered unpredictable and instead had to be learned
during training. e learning process was more challenging in these conditions, but still suc-
ceeded. Additionally, the authors claimed that the described method could be adapted to other
driving policies such as lane change decisions, highway exit and merge, negotiation of the right
of way in junctions, yielding for pedestrians, and complicated planning in urban scenarios.
As mentioned before, supervised learning can significantly reduce training time for a con-
trol system. For this reason, Xia et al. [105] presented a vehicle control algorithm based on Q-
learning combined with a pre-training phase based on expert demonstration. A filtered experi-
ence replay, where previous experiences were stored while poor performances were eliminated,
was used to improve the convergence during training. e use of pre-training and filtered experi-
ence replay was shown to not only improve final performance of the control policy, but also speed
up convergence by up to 71%. Comparing two learning algorithms for lane keeping, Sallab et
al. [106] investigated the effect of continuous and discretized action spaces. A DQN was chosen
for discrete action spaces, while an actor-critic algorithm was used for continuous action values.
e two networks were trained and evaluated driving around a simulated race track, where their
goal was to complete the track while staying near the center of the lane. e ability to utilize
continuous action values resulted in significantly stronger performance with the actor-critic al-
gorithm, enabling a much smoother control policy. In contrast, the DQN algorithm struggled
to stay near the center of the lane, especially on curved roads. ese results demonstrated the
advantages of using continuous action values.
Using vision to control the autonomous vehicle, Zhang et al. [107] presented their su-
pervised learning algorithm, SafeDAgger, based on the Dataset Aggregation (DAgger) [108]
imitation learning technique. In DAgger, the first phase of training consists of traditional super-
vised learning where the model learns from a training set collected from an expert completing