Degryse, Baptiste
[UCL]
Lee, John
[UCL]
Q-learning can be used to find an optimal action-selection policy for any given finite Markov Decision Process. The Q-network is a neural network that approximate the value of an action in vast state space. This work will study the deep Q-learning, a combination of the Q-learning and neural networks, and evaluate the impact of meta parameters. The applicative context of this work is the game Robocode and we evaluate the impact of different state representation, rewards, actions, neural network architectures in order to build an artificial intelligence. The hybrid architecture combining a deep feedforward neural network with two convolutional layers for the processing of the log of the states was successful. Adding a long short-term memory layer in parallel, and removing the random memory replay in order to have a sequential training was not suited. The resulting artificial intelligence outperformed the other public machine learning projects thanks to its simplicity of actions enabling learning complex behavior. This project can be used to gain intuition on an efficient state and action representation, on the importance of a complete reward function, as well as the advantages of an hybrid architecture to get the best of each network specificities.

Bibliographic reference |
Degryse, Baptiste. *Deep Q-Learning for Robocode.* Ecole polytechnique de Louvain, Université catholique de Louvain, 2017. Prom. : Lee, John. |

Permanent URL |
http://hdl.handle.net/2078.1/thesis:10589 |