18 3. DEEP LEARNING FOR VEHICLE CONTROL
have been made toward high-level autonomy and we can expect further advancements using
DNNs to drive vehicles in more complex scenarios in the future.
3.1.2 LONGITUDINAL CONTROL
Deep learning also offers multiple advantages for longitudinal vehicle control. e longitudinal
control of an autonomous vehicle can be described as an optimal tracking control problem for
a complex nonlinear system [89, 90], and is therefore poorly suited to control systems based
on linearized vehicle models or other simplified analytical solutions [91]. Traditional control
systems used in longitudinal ADAS systems, such as Adaptive Cruise Control, provide the au-
tonomous vehicle with poor adaptability to different scenarios. However, deep learning solutions
have shown strong performance in nonlinear control problems and can learn to perform a task
without knowledge of the system model [43, 44, 92, 93].
In early works of neural longitudinal control, Dai et al. [94] presented a fuzzy reinforce-
ment learning control algorithm for longitudinal vehicle control. e proposed algorithm com-
bines reinforcement learning with fuzzy logic, where the reinforcement learning was based on
Q-learning and fuzzy logic used a Takagi-Sugeno-type fuzzy inference system. e Q-learning
module estimates the optimal control action in the current state, while the fuzzy inference sys-
tem produces the final control output based on the estimated action value. e reward function
for Q-learning was set up based on the distance between the host vehicle and lead vehicle, to
encourage the vehicle to follow at a safe distance. e trained system was evaluated in a simu-
lated car following scenario, and the vehicle was shown to successfully drive the vehicle without
failures after 68 trials. However, while this approach demonstrated reinforcement learning can
be used to successfully follow vehicles in front, a reward function based on a single objective can
lead to unexpected behavior. e reward function is the method by which the designer can signal
the desired behavior to the reinforcement learning agent, and should therefore accurately repre-
sent the control objective. erefore, reward functions for autonomous vehicles should encour-
age safe, comfortable, and efficient driving strategies. To achieve this, multi-objective reward
functions should be investigated.
For instance, Desjardins and Chaib-Draa [23] used a multi-objective reward function
based on time headway and time headway derivative. is encouraged the agent to maintain an
ideal time headway to the lead vehicle (set to 2 s in experiments), but the time headway derivative
term in the reward function also rewarded actions which moved the vehicle toward the ideal time
headway and penalized actions which moved it farther away from the ideal time headway. e
proposed approach was implemented in a cooperative adaptive cruise control system using a
policy gradient algorithm. e chosen network architecture was a feedforward network, with a
single hidden layer consisting of 20 neurons and an output layer with 3 outputs (accelerate, brake,
do nothing). Ten training runs were completed, and the best performing network was chosen
for testing. During testing, adequate vehicle following behavior was demonstrated, with time
headway errors of 0.039 s achieved in emergency braking scenarios. However, the downside was