Large state space and long planning horizon are two major challenges in reinforcement learning. Modern reinforcement learning algorithms use function approximation to deal with large state space. In the first part of the talk, I will present exponential lower bounds for value-based and policy-based learning with function approximation. I will then discuss what additional structures that permit statistically efficient algorithms. In the second part of the talk, I will discuss whether the sample complexity of reinforcement learning needs to scale polynomially with the planning horizon, an open problem asked by Agarwal and Jiang in COLT 2018. I will present a method whose sample complexity scales only logarithmically with the planning horizon, thus answering the open problem. 

Paper links: