Date of Award


Document Type

Capstone Project


MSc in Data Analytics


Reinforcement learning has recently seen an increase in popularity due to its ability to learn from past experience and its capability of adapting quickly and effectively to new market conditions. This research will focus on reinforcement learning and its importance in trading stock options. Option traders can trade options with one of two option expirations: American or European style. This research will base the analysis on the American expiration style, considered more challenging in trading than the European expiration style. This could lead to the possibility of improving the current trading techniques. In addition, this research aims to understand the role that reinforcement learning plays in trading stock options and evaluate its effectiveness in different market environments. Reinforcement learning has the potential to identify optimal trading strategies for stock options, and could assist current traders in their trading strategies. Trading and markets have existed for millennia, going as far back as Babylon in 2000 BC, with currency exchange and commodities (Kirkpatrick and Dahlquist, 2010). However, markets have evolved and become more complex than in those early trading days. Automation of trading and trading tasks has enabled organisations to act more quickly, consistently, and cost-effectively, all while reducing the risk of human error. The complexity of the markets undeniably increases the difficulty of option trading in dynamic environments. Two questions that arise are: Can Reinforcement Learning models use historical option data to develop effective option trading strategies? Can Reinforcement Learning assist human traders in trading options? These questions are hard to answer at a glance and require robust research and exploration to understand the behaviour of this market segment. Additionally, this research will explore the potential benefits of utilising Reinforcement Learning in stock option trading and how it might be used to modify existing techniques (Moody and Saffell, 2001). The Reinforcement Models that will be explored are Actor-Critical (A2C), Deep Deterministic Policy, Proximal Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). These models use Reinforcement Learning algorithms that train an agent to solve tasks by trial and error. This research will attempt to use these trading agents to develop algorithmic trading 9 strategies, which are difficult for human traders. In chapter 5, there is a complete description of how they work. Ultimately, this research found that Reinforcement Learning can develop trading strategies that could assist human traders. These trading agents are based on machine learning models, which allow them to identify and analyse patterns in the data that human traders may miss. But this research gives evidence to support the results and encourages more work to be done before these can be fully autonomous strategies.