18 3. DEEP LEARNING FOR VEHICLE CONTROL
have been made toward high-level autonomy and we can expect further advancements using
DNNs to drive vehicles in more complex scenarios in the future.
3.1.2 LONGITUDINAL CONTROL
Deep learning also offers multiple advantages for longitudinal vehicle control. e longitudinal
control of an autonomous vehicle can be described as an optimal tracking control problem for
a complex nonlinear system [89, 90], and is therefore poorly suited to control systems based
on linearized vehicle models or other simplified analytical solutions [91]. Traditional control
systems used in longitudinal ADAS systems, such as Adaptive Cruise Control, provide the au-
tonomous vehicle with poor adaptability to different scenarios. However, deep learning solutions
have shown strong performance in nonlinear control problems and can learn to perform a task
without knowledge of the system model [43, 44, 92, 93].
In early works of neural longitudinal control, Dai et al. [94] presented a fuzzy reinforce-
ment learning control algorithm for longitudinal vehicle control. e proposed algorithm com-
bines reinforcement learning with fuzzy logic, where the reinforcement learning was based on
Q-learning and fuzzy logic used a Takagi-Sugeno-type fuzzy inference system. e Q-learning
module estimates the optimal control action in the current state, while the fuzzy inference sys-
tem produces the final control output based on the estimated action value. e reward function
for Q-learning was set up based on the distance between the host vehicle and lead vehicle, to
encourage the vehicle to follow at a safe distance. e trained system was evaluated in a simu-
lated car following scenario, and the vehicle was shown to successfully drive the vehicle without
failures after 68 trials. However, while this approach demonstrated reinforcement learning can
be used to successfully follow vehicles in front, a reward function based on a single objective can
lead to unexpected behavior. e reward function is the method by which the designer can signal
the desired behavior to the reinforcement learning agent, and should therefore accurately repre-
sent the control objective. erefore, reward functions for autonomous vehicles should encour-
age safe, comfortable, and efficient driving strategies. To achieve this, multi-objective reward
functions should be investigated.
For instance, Desjardins and Chaib-Draa [23] used a multi-objective reward function
based on time headway and time headway derivative. is encouraged the agent to maintain an
ideal time headway to the lead vehicle (set to 2 s in experiments), but the time headway derivative
term in the reward function also rewarded actions which moved the vehicle toward the ideal time
headway and penalized actions which moved it farther away from the ideal time headway. e
proposed approach was implemented in a cooperative adaptive cruise control system using a
policy gradient algorithm. e chosen network architecture was a feedforward network, with a
single hidden layer consisting of 20 neurons and an output layer with 3 outputs (accelerate, brake,
do nothing). Ten training runs were completed, and the best performing network was chosen
for testing. During testing, adequate vehicle following behavior was demonstrated, with time
headway errors of 0.039 s achieved in emergency braking scenarios. However, the downside was
3.1. AUTONOMOUS VEHICLE CONTROL 19
that oscillatory velocity profiles were observed, which poses safety and passenger comfort issues.
Similarly to oscillatory outputs in lateral control, this could potentially be resolved with the use
of RNNs. Another potential solutions would be to design a reward function with an additional
term for drive smoothness. Such a reward function was used, for instance, by Huang et al. [95],
who used an actor-critic algorithm for autonomous longitudinal control. e multi-objective
reward function considered the velocity tracking error and drive smoothness. is helps ensure
that no sharp accelerations or decelerations are used unnecessarily. is results in a control policy
which is comfortable for occupants in the vehicle. However, no adjacent vehicles or safety were
considered in this work.
An algorithm for a personalized ACC system was presented by Chen et al. [96]. e pro-
posed approach used Q-learning to estimate the desired vehicle velocity at each time-step, which
was then achieved using a Proportional-Integral-Derivative (PID) controller. A single hidden
layer feedforward network was used to estimate the Q-function. e system was evaluated in
simulation, based on driving smoothness, passenger comfort, and safety. A similar approach was
used by Zhao et al. [97], who used an actor-critic algorithm to learn personalized driving styles
for an ACC system. e reward function considered driver habits, passenger comfort, and vehi-
cle safety. e trained network was tested in a hardware-in-the-loop simulation and compared
to PID and Linear Quadratic Regulator (LQR) controllers. e proposed algorithm was shown
to outperform the traditional PID and LQR controllers in the test scenarios, demonstrating the
power of reinforcement learning for longitudinal control.
A collision avoidance system using reinforcement learning was developed by Chae et
al. [98]. e proposed approach used a Deep Q-Network (DQN) algorithm to choose from
four different discrete deceleration values. A two-term reward function was used, which con-
sidered collision avoidance and avoiding high risk situations. e reward function was balanced
to avoid too conservative or reckless braking policies using these conflicting objectives. A replay
memory was used to improve convergence of training and an additional “trauma memory of
collisions was used as well. e trauma memory improved the stability of learning, by ensuring
the agent learns to avoid collisions since these events were very rare during training and a random
sampling from the standard replay memory would therefore rarely include them in the param-
eter update. e system was trained to avoid collisions with pedestrians, which entered in front
of the vehicle at different distances. During evaluation, the collision avoidance was tested for
different Time-To-Collision (TTC) values, with 10,000 tests for each value. For TTC values
above 1.5 s, collisions were avoided with 100% success, while at the lowest TTC value of 0.9 s the
collision rate was 61.29%. e system was also tested in the Euro NCAP standard testing pro-
cedure (CVFA and CVNA tests [99]) and the system passed both tests successfully. erefore,
the system was deemed to provide adequate collision avoidance for autonomous vehicles.
e use of reinforcement learning has been favored for longitudinal control and has yielded
significant improvements in longitudinal control for autonomous vehicles. e use of one-
dimensional measurements, such as intervehicular distance, as inputs means that the network
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset