Full Vehicle Control

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

18 3. DEEP LEARNING FOR VEHICLE CONTROL

have been made toward high-level autonomy and we can expect further advancements using

DNNs to drive vehicles in more complex scenarios in the future.

3.1.2 LONGITUDINAL CONTROL

Deep learning also oﬀers multiple advantages for longitudinal vehicle control. e longitudinal

control of an autonomous vehicle can be described as an optimal tracking control problem for

a complex nonlinear system [89, 90], and is therefore poorly suited to control systems based

on linearized vehicle models or other simpliﬁed analytical solutions [91]. Traditional control

systems used in longitudinal ADAS systems, such as Adaptive Cruise Control, provide the au-

tonomous vehicle with poor adaptability to diﬀerent scenarios. However, deep learning solutions

have shown strong performance in nonlinear control problems and can learn to perform a task

without knowledge of the system model [43, 44, 92, 93].

In early works of neural longitudinal control, Dai et al. [94] presented a fuzzy reinforce-

ment learning control algorithm for longitudinal vehicle control. e proposed algorithm com-

bines reinforcement learning with fuzzy logic, where the reinforcement learning was based on

Q-learning and fuzzy logic used a Takagi-Sugeno-type fuzzy inference system. e Q-learning

module estimates the optimal control action in the current state, while the fuzzy inference sys-

tem produces the ﬁnal control output based on the estimated action value. e reward function

for Q-learning was set up based on the distance between the host vehicle and lead vehicle, to

encourage the vehicle to follow at a safe distance. e trained system was evaluated in a simu-

lated car following scenario, and the vehicle was shown to successfully drive the vehicle without

failures after 68 trials. However, while this approach demonstrated reinforcement learning can

be used to successfully follow vehicles in front, a reward function based on a single objective can

lead to unexpected behavior. e reward function is the method by which the designer can signal

the desired behavior to the reinforcement learning agent, and should therefore accurately repre-

sent the control objective. erefore, reward functions for autonomous vehicles should encour-

age safe, comfortable, and eﬃcient driving strategies. To achieve this, multi-objective reward

functions should be investigated.

For instance, Desjardins and Chaib-Draa [23] used a multi-objective reward function

based on time headway and time headway derivative. is encouraged the agent to maintain an

ideal time headway to the lead vehicle (set to 2 s in experiments), but the time headway derivative

term in the reward function also rewarded actions which moved the vehicle toward the ideal time

headway and penalized actions which moved it farther away from the ideal time headway. e

proposed approach was implemented in a cooperative adaptive cruise control system using a

policy gradient algorithm. e chosen network architecture was a feedforward network, with a

single hidden layer consisting of 20 neurons and an output layer with 3 outputs (accelerate, brake,

do nothing). Ten training runs were completed, and the best performing network was chosen

for testing. During testing, adequate vehicle following behavior was demonstrated, with time

headway errors of 0.039 s achieved in emergency braking scenarios. However, the downside was

3.1. AUTONOMOUS VEHICLE CONTROL 19

that oscillatory velocity proﬁles were observed, which poses safety and passenger comfort issues.

Similarly to oscillatory outputs in lateral control, this could potentially be resolved with the use

of RNNs. Another potential solutions would be to design a reward function with an additional

term for drive smoothness. Such a reward function was used, for instance, by Huang et al. [95],

who used an actor-critic algorithm for autonomous longitudinal control. e multi-objective

reward function considered the velocity tracking error and drive smoothness. is helps ensure

that no sharp accelerations or decelerations are used unnecessarily. is results in a control policy

which is comfortable for occupants in the vehicle. However, no adjacent vehicles or safety were

considered in this work.

An algorithm for a personalized ACC system was presented by Chen et al. [96]. e pro-

posed approach used Q-learning to estimate the desired vehicle velocity at each time-step, which

was then achieved using a Proportional-Integral-Derivative (PID) controller. A single hidden

layer feedforward network was used to estimate the Q-function. e system was evaluated in

simulation, based on driving smoothness, passenger comfort, and safety. A similar approach was

used by Zhao et al. [97], who used an actor-critic algorithm to learn personalized driving styles

for an ACC system. e reward function considered driver habits, passenger comfort, and vehi-

cle safety. e trained network was tested in a hardware-in-the-loop simulation and compared

to PID and Linear Quadratic Regulator (LQR) controllers. e proposed algorithm was shown

to outperform the traditional PID and LQR controllers in the test scenarios, demonstrating the

power of reinforcement learning for longitudinal control.

A collision avoidance system using reinforcement learning was developed by Chae et

al. [98]. e proposed approach used a Deep Q-Network (DQN) algorithm to choose from

four diﬀerent discrete deceleration values. A two-term reward function was used, which con-

sidered collision avoidance and avoiding high risk situations. e reward function was balanced

to avoid too conservative or reckless braking policies using these conﬂicting objectives. A replay

memory was used to improve convergence of training and an additional “trauma memory” of

collisions was used as well. e trauma memory improved the stability of learning, by ensuring

the agent learns to avoid collisions since these events were very rare during training and a random

sampling from the standard replay memory would therefore rarely include them in the param-

eter update. e system was trained to avoid collisions with pedestrians, which entered in front

of the vehicle at diﬀerent distances. During evaluation, the collision avoidance was tested for

diﬀerent Time-To-Collision (TTC) values, with 10,000 tests for each value. For TTC values

above 1.5 s, collisions were avoided with 100% success, while at the lowest TTC value of 0.9 s the

collision rate was 61.29%. e system was also tested in the Euro NCAP standard testing pro-

cedure (CVFA and CVNA tests [99]) and the system passed both tests successfully. erefore,

the system was deemed to provide adequate collision avoidance for autonomous vehicles.

e use of reinforcement learning has been favored for longitudinal control and has yielded

signiﬁcant improvements in longitudinal control for autonomous vehicles. e use of one-

dimensional measurements, such as intervehicular distance, as inputs means that the network

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Full Vehicle Control

Create new playlist

Sign In

Sign Up

Table of Contents for
Full Vehicle Control