Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Published in Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2020

Recommended citation: Lai, J., Zou, L., & Song, J. (2020). Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies. arXiv preprint arXiv:2011.14359. https://arxiv.org/pdf/2011.14359

This is my first paper as first author. It is the theory part of my undergraduate dissertation. The paper was rejected by NeurIPS 2020 due to strict assumptions and poor writings. To be honest, I was surprised that the reviewers are not picky about the experiments. They actually indicate that our methods only apply to very short horizon.