Recommending safe actions by learning from sub-optimal demonstrations

Lars Böcking, Patrick Philipp
Proceedings of the 5th International Workshop on Health Recommender Systems co-located with the 14th ACM Conference on Recommender Systems 2020 (RecSys 2020)
Clinical pathways describe the treatment procedure for a patient from a medical point of view. Based on the patient's condition, a decision is made about the next actions to be carried out. Such recurring sequential process decisions could well be outsourced to a reinforcement learning agent, but the patient's safety should always be the main consideration when suggesting activities. The development of individual pathways is also cost and time intensive, therefore a smart agent could support and relieve physicians. In addition, not every patient reacts in the same way to a clinical intervention, so the personalization of a clinical pathway should be given attention. In this paper we address with the fundamental problem that the use of reinforcement learning agents in the specification of clinical pathways should provide an individual optimal proposal within the limits of safety constraints. Imitating the decisions of physicians can guarantee safety but not optimality. Therefore, we present an approach that ensures compliance with health critical rules without limiting the exploration of the optimum. We evaluate our approach on open source gym environment where we are able to show that our adaptation of behavior cloning not only adheres better to safety regulations, but also manages to better explore the space of the optimum in the collective rewards.
Research focus
Medical Information Technology, Safety and Security
Download .bib
Download .bib
Published by
Lars Böcking