Research Interests

I am a generalist researcher as opposed to a specialist. In short, a specialist tends to focus on a relatively narrow domain and become an expert in it. A generalist, however, broadens the domain at the expense of losing depth. While this might hinder a generalist from becoming a true expert in one domain, it allows him/her to connect and join ideas from different disciplines and act as a bridge between specialists. As a generalist, I try to be topic- and method-agnostic, but I have priorities in my research, nonetheless.

Since 2018, my advisors (Prof. Nick Street and Prof. Barrett Thomas) and I started exploring the application of Reinforcement Learning (a machine learning technique that can be used to learn how to make an optimal sequence of decisions) in  Health Analytics. More specifically, we look at the dosing of a commonly used anticoagulant called warfarin.

In the first part of this work, we showed that one can achieve a better dosing protocol using Deep Q-Networks (DQN). However, the model acts as a black box and the user might not trust its recommendations. In the second phase, we focused on the maintenance dosing protocol. In the maintenance phase, we adjust the dose as a percent of change of the current dose. We implemented the work using a Policy Gradient method (Proximal Policy Optimization). We used some Action Forging techniques to reduce the number of dose changes and learn the dose change options instead of pre-selecting them. Finally, Policy Distillation and Decision Trees helped us turn the dosing protocol into a table. The final dosing protocol performs better and is explainable and individualized.

Both of these works only prescribe the dose for weekly use. However, in reality, the physician should prescribe the dose as well as the next time for blood tests and dose adjustment. Adding duration to the decision-making expands the action space dramatically. We proposed a sequential architecture in which one decision (dose or duration) is made first and the second part is decided using a separate PPO model. This way, we reduce the action space and improve the training process.

My other areas of interest include:

Current Research




For my dissertation, I developed a Reinforcement Learning package in Python called ReiL. It is not the best, the fastest, the most efficient, or even the right way of doing RL, but it is available for use under MIT License. You may install it from PyPI (pip install ReiL), or fork the source code from the repository here:

Most of the code is fully type-annotated and documented, and I am dedicated to continuing to improve and support it.