Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
Published in Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2020
Recommended citation: Lai, J., Zou, L., & Song, J. (2020). Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies. arXiv preprint arXiv:2011.14359. https://arxiv.org/pdf/2011.14359
This is my first paper as first author. It is the theory part of my undergraduate dissertation. The paper was rejected by NeurIPS 2020 due to strict assumptions and poor writings. To be honest, I was surprised that the reviewers are not picky about the experiments. They actually indicate that our methods only apply to very short horizon.