Gerencser, Balazs
[UCL]
Hollanders, Romain
[UCL]
Delvenne, Jean-Charles
[UCL]
Jungers, Raphaël M.
[UCL]
Unique Sink Orientations (USOs) are an appealing abstraction of several major optimization problems of applied mathematics such as Linear Programming (LP), Markov Decision Processes (MDPs) or 2-player Turn Based Stochastic Games (2TBSGs). A polynomial time algorithm to find the sink of a USO would translate into a strongly polynomial time algorithm to solve the aforementioned problems—a major quest for all three cases. In the case of an acyclic USO of a cube, a situation that captures both MDPs and 2TBSGs, one can apply the well-known Policy Iteration (PI) algorithm. The study of its complexity is the object of this work. Despite its exponential worst case complexity, the principle of PI is a powerful source of inspiration for other methods. In 2012, Hansen and Zwick introduced a new combinatorial relaxation of the complexity problem for PI resulting in what we call Order-Regular (OR) matrices. They conjectured that the maximum number of rows of such matrices—an upper bound on the number of steps of PI—should follow the Fibonacci sequence. As our first contribution, we disprove the lower bound part of Hansen and Zwick's conjecture. Then, for our second contribution, we (exponentially) improve the Ω(1.4142n) lower bound on the number of steps of PI from Schurr and Szabó in the case of OR matrices and obtain an Ω(1.4269n) bound.
Bibliographic reference |
Gerencser, Balazs ; Hollanders, Romain ; Delvenne, Jean-Charles ; Jungers, Raphaël M.. A complexity analysis of Policy Iteration through combinatorial matrices arising from Unique Sink Orientations. In: Journal of Discrete Algorithms, Vol. 44, p. 21-38 (2017) |
Permanent URL |
http://hdl.handle.net/2078.1/192379 |