Deep Quality Networks

Python Completely from Scratch Q-Learning Deep Q-Learning Individual
DQN Code on Google Colab Q-Learning Code on Google Colab

Things to Highlight

  • Q-Learning from Scratch
  • Multi-Layered Perceptron Deep Neural Network from Scratch
  • Used OpenAI Gyms Environments

Made as a final-year project with the overall goal of developing an in depth understanding of how reinforcement learning works so that I will be able to use this technique on future AI related projects where standard methods are not applicable. While the assessment had finished, some issues remained in the project and so some time had to be spent fixing them all.

Q-Learning

In order to be able to fully understand how reinforcement learning works as simply and easily as possible, one of the simplest algorithms, Quality Learning was implemented. As designed to solve simple sequential problems it was largely tested with OpenAI Gym’s taxi environment along with some initial testing on the even simpler cliff walking environment.

While the algorithm itself was easy to implement, understanding how it all worked, especially the influence different values of each individual hyperparameter, especially gamma, has along with the temporal difference component of the Q-function.

See below the training and testing videos:

Training Video

Validation Video

Deep Q-Learning

To further my understanding of reinforcement learning, and to really challenge myself, a more complex algorithm, Deep Q-Networks, was implemented. As it’s designed to solve more complex sequential problems, ones that have a discrete action space and a continuous state space, it was largely tested with OpenAI Gym’s Luna Lander environment.

Unlike Q-Learning, the use of a multi-layered perceptron neural network, replay buffer and target network within the algorithm made its implementation a lot more difficult. As well the additional hyper-parameters, such as the replay buffer size and target network update rate required further analysis and fine tuning to ensure issues such as catastrophic forgetting were not occurring.

This was a key issue as on multiple occasions I would see issues that looked like catastrophic forgetting, tweaking the parameters and waiting hours for the re-training only to eventually find the issue being some bug in my implementation, such as the backpropagation being incorrectly setup, the Mean Squared Error Loss having the target and predicted values around the wrong way or even the matrices having the rows and columns flipped.

See below the training and testing videos:

Training Video

Validation Video