The Size of MDP Factored Policies

Paolo Liberatore

Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI 2002)

Policies of Markov Decision Processes (MDPs) tell the next action to execute, given the current state and (possibly) the history of actions executed so far. Factorization is used when the number of states is exponentially large: both the MDP and the policy can be then represented using a compact form, for example employing circuits. We prove that there are MDPs whose optimal policies require exponential space even in factored form.

 @inproceedings{libe-02-a,
 title = {The Size of MDP Factored Policies},
 year = {2002},
 author = {Liberatore, Paolo},
 booktitle = {Proceedings of the Eighteenth National Conference on Artificial
 Intelligence (AAAI 2002)},
 pages = {267-272},
 }