50 BIBLIOGRAPHY
[53] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper
with convolutions. In Proc. of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pages 1–9, 2015. DOI: 10.1109/cvpr.2015.7298594 9
[54] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network
training by reducing internal covariate shift. ArXiv Preprint ArXiv:1502.03167, 2015.
10
[55] Quentin J. M. Huys, Anthony Cruickshank, and Peggy Seriès. Reward-based learning,
model-based and model-free. In Encyclopedia of Computational Neuroscience, pages 2634–
2641, Springer, 2015. DOI: 10.1007/978-1-4614-6675-8_674 11
[56] R. S. Sutton and A. G. Barto.
Reinforcement Learning: An Introduction
, vol. 9, MIT Press,
Cambridge, MA, 1998. DOI: 10.1109/tnn.1998.712192 11, 13, 20
[57] Marco Wiering and Martijn Van Otterlo. Reinforcement learning. Adaptation, Learning,
and Optimization, 12:51, 2012. DOI: 10.1007/978-3-642-27645-3
[58] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath.
Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–
38, 2017. DOI: 10.1109/msp.2017.2743240 11
[59] Vijay R. Konda and John N. Tsitsiklis. Onactor-critic algorithms. SIAM Journal on Control
and Optimization, 42(4):1143–1166, 2003. DOI: 10.1137/s0363012901385691 12
[60] Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. Ph.D. thesis,
King’s College, Cambridge, 1989. 12
[61] Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3–
4):279–292, 1992. DOI: 10.1007/bf00992698
[62] Steven J. Bradtke, B. Erik Ydstie, and Andrew G. Barto. Adaptive linear quadratic control
using policy iteration. In Proc. of the American Control Conference, vol. 3, pages 3475–3475,
Citeseer, 1994. DOI: 10.1109/acc.1994.735224 12
[63] Gavin A. Rummery and Mahesan Niranjan. On-Line Q-Learning Using Connectionist
Systems, vol. 37, University of Cambridge, Department of Engineering Cambridge, UK,
1994. 12
[64] Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, and Robert Babuska. A survey of
actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transac-
tions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307,
2012. DOI: 10.1109/tsmcc.2012.2218595 12
BIBLIOGRAPHY 51
[65] Leemon Baird. Residual algorithms: Reinforcement learning with function approxima-
tion. In Machine Learning Proceedings, pages 30–37, Elsevier, 1995. DOI: 10.1016/b978-
1-55860-377-6.50013-x
[66] Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Machine
Learning Proceedings, pages 261–268, Elsevier, 1995. DOI: 10.1016/b978-1-55860-377-
6.50040-2
[67] John N. Tsitsiklis and Benjamin Van Roy. Feature-based methods for large scale dy-
namic programming. Machine Learning, 22(1–3):59–94, 1996. DOI: 10.1007/978-0-585-
33656-5_5 12
[68] Ronald J. Williams. Reinforcement-Learning Connectionist Systems. College of Computer
Science, Northeastern University, 1987. 12
[69] R. Williams. A class of gradient-estimation algorithms for reinforcement learning in neu-
ral networks. In Proc. of the International Conference on Neural Networks, pages II–601,
1987.
[70] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist
reinforcement learning. Machine Learning, 8(3–4):229–256, 1992. DOI: 10.1007/978-1-
4615-3618-5_2 12
[71] Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. Policy
gradient methods for reinforcement learning with function approximation. In Advances in
Neural Information Processing Systems, pages 1057–1063, 2000. 12
[72] Martin Riedmiller, Jan Peters, and Stefan Schaal. Evaluation of policy gradient methods
and variants on the cart-pole benchmark. In Approximate Dynamic Programming and Re-
inforcement Learning, (ADPRL). IEEE International Symposium on, pages 254–261, 2007.
DOI: 10.1109/adprl.2007.368196 12
[73] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lil-
licrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for
deep reinforcement learning. In International Conference on Machine Learning, pages 1928–
1937, 2016. 12
[74] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval
Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement
learning. ArXiv Preprint ArXiv:1509.02971, 2015. 12
[75] M. Barto and Michael T. Rosenstein. J. 4 supervised actor-critic reinforcement learning.
Handbook of Learning and Approximate Dynamic Programming, 2:359, 2004. 12
52 BIBLIOGRAPHY
[76] Vijay R. Konda and John N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural In-
formation Processing Systems, pages 1008–1014, 2000. DOI: 10.1137/s0363012901385691
12
[77] Michael Nielsen. Neural Networks and Deep Learning. Determination Press, 2015. 13
[78] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning, MIT Press, 2016.
13
[79] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine
Learning. Cambridge University Press, 2019. 13
[80] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. e Elements of Statistical Learn-
ing: Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer, New
York, 2009. DOI: 10.1007/978-0-387-21606-5 13
[81] Francois Chollet. Deep Learning with Python. Manning Publications Co., 2017. 13
[82] Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts,
Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc., 2017. 13
[83] Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances
in Neural Information Processing Systems 1, pages 305–313, 1989. 15, 17, 24
[84] D. Pomerleau. Neural network vision for robot driving. Intelligent Unmanned Ground
Vehicles, pages 1–22, 1997. DOI: 10.1007/978-1-4615-6325-9_4 15, 17
[85] Gening Yu and I. K. Sethi. Road-following with continuous learning. In Intelligent Vehicles
Symposium. Proceedings of the, Detroit, MI, 1995. DOI: 10.1109/ivs.1995.528317 16, 17
[86] Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, and Yann L. Cun. Off-road obstacle avoid-
ance through end-to-end learning. In Advances in Neural Information Processing Systems,
pages 739–746, 2006. 16, 17
[87] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp,
Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al.
End to end learning for self-driving cars. ArXiv Preprint ArXiv:1604.07316, 2016. 16,
17, 37
[88] Hesham M. Eraqi, Mohamed N. Moustafa, and Jens Honer. End-to-end deep learn-
ing for steering autonomous vehicles considering temporal dependencies. ArXiv Preprint
ArXiv:1710.03804, 2017. 16, 17
[89] Ardalan Vahidi and Azim Eskandarian. Research advances in intelligent collision avoid-
ance and adaptive cruise control. IEEE Transactions on Intelligent Transportation Systems,
4(3):143–153, 2003. DOI: 10.1109/tits.2003.821292 18
BIBLIOGRAPHY 53
[90] Seungwuk Moon, Ilki Moon, and Kyongsu Yi. Design, tuning, and evaluation of a full-
range adaptive cruise control system with collision avoidance. Control Engineering Practice,
17(4):442–455, 2009. DOI: 10.1016/j.conengprac.2008.09.006 18
[91] Qi Sun. Cooperative adaptive cruise control performance analysis. Ph.D. thesis, Ecole
Centrale de Lille, 2016. 18
[92] Hassan K. Khalil. Nonlinear Systems, 2(5):5–1, Prentice Hall, NJ, 1996. 18
[93] Dan Wang and Jie Huang. Neural network-based adaptive dynamic surface control for a
class of uncertain nonlinear systems in strict-feedback form. IEEE Transactions on Neural
Networks, 16(1):195–202, 2005. DOI: 10.1109/tnn.2004.839354 18
[94] Xiaohui Dai, Chi-Kwong Li, and Ahmad B. Rad. An approach to tune fuzzy controllers
based on reinforcement learning for autonomous vehicle control. IEEE Transactions on
Intelligent Transportation Systems, 6(3):285–293, 2005. DOI: 10.1109/fuzz.2003.1209417
18, 21
[95] Zhenhua Huang, Xin Xu, Haibo He, Jun Tan, and Zhenping Sun. Parameter-
ized batch reinforcement learning for longitudinal control of autonomous land vehicles.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, pages 1–12, 2017. DOI:
10.1109/tsmc.2017.2712561 19, 21
[96] Xin Chen, Yong Zhai, Chao Lu, Jianwei Gong, and Gang Wang. A learning model
for personalized adaptive cruise control. In Intelligent Vehicles Symposium (IV), IEEE,
pages 379–384, 2017. DOI: 10.1109/ivs.2017.7995748 19, 21
[97] Dongbin Zhao, Zhongpu Xia, and Qichao Zhang. Model-free optimal con-
trol based intelligent cruise control with hardware-in-the-loop demonstration (re-
search frontier). IEEE Computational Intelligence Magazine, 12(2):56–69, 2017. DOI:
10.1109/mci.2017.2670463 19, 21
[98] Hyunmin Chae, Chang Mook Kang, ByeoungDo Kim, Jaekyum Kim, Chung Choo
Chung, and Jun Won Choi. Autonomous braking system via deep reinforcement learn-
ing. In IEEE 20th International Conference on Intelligent Transportation Systems (ITSC),
pages 1–6, 2017. DOI: 10.1109/itsc.2017.8317839 19, 21, 24
[99] Euro NCAP. European new car assessment programme: Test protocol—AEB VRU sys-
tems, 2015. 19
[100] Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement
learning: A survey. Journal of Artificial Intelligence Research, 4:237–285, 1996. DOI:
10.1613/jair.301 20
54 BIBLIOGRAPHY
[101] Dongbin Zhao, Zhaohui Hu, Zhongpu Xia, Cesare Alippi, Yuanheng Zhu, and Ding
Wang. Full-range adaptive cruise control based on supervised adaptive dynamic program-
ming. Neurocomputing, 125:57–67, February 2014. DOI: 10.1016/j.neucom.2012.09.034
20
[102] Bin Wang, Dongbin Zhao, Chengdong Li, and Yujie Dai. Design and implementation
of an adaptive cruise control system based on supervised actor-critic learning. 5th Inter-
national Conference on Information Science and Technology (ICIST), pages 243–248, 2015.
DOI: 10.1109/icist.2015.7288976 20
[103] Rui Zheng, Chunming Liu, and Qi Guo. A decision-making method for autonomous
vehicles based on simulation and reinforcement learning. International Conference on Ma-
chine Learning and Cybernetics, pages 362–369, 2013. DOI: 10.1109/icmlc.2013.6890495
20
[104] Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, and Amnon Shashua. Long-term
planning by short-term prediction. ArXiv Preprint ArXiv:1602.01580, 2016. 22, 26
[105] Wei Xia, Huiyun Li, and Baopu Li. A control strategy of autonomous vehicles based on
deep reinforcement learning. In Computational Intelligence and Design (ISCID), 9th Inter-
national Symposium on, vol. 2, pages 198–201, IEEE, 2016. DOI: 10.1109/iscid.2016.2054
22, 26
[106] Ahmad El Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani.
End-to-end deep reinforcement learning for lane keeping assist. ArXiv Preprint
ArXiv:1612.04340, 2016. 22, 26
[107] Jiakai Zhang and Kyunghyun Cho. Query-efficient imitation learning for end-to-end
autonomous driving. ArXiv Preprint ArXiv:1605.06450, 2016. 22, 26
[108] Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning
and structured prediction to no-regret online learning. In Proc. of the 14th International
Conference on Artificial Intelligence and Statistics, pages 627–635, 2011. 22
[109] Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evange-
los eodorou, and Byron Boots. Agile autonomous driving using end-to-end deep
imitation learning. Proc. of Robotics: Science and Systems, Pittsburgh, PA, 2018. DOI:
10.15607/rss.2018.xiv.056 23, 26
[110] Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, and Trevor Darrell. Deep object
centric policies for autonomous driving. ArXiv Preprint ArXiv:1811.05432, 2018. 23, 26
[111] Horia Porav and Paul Newman. Imminent collision mitigation with reinforcement learn-
ing and vision. In 21st International Conference on Intelligent Transportation Systems (ITSC),
pages 958–964, IEEE, 2018. DOI: 10.1109/itsc.2018.8569222 23, 26
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset