2000

A learning algorithm for Markov decision processes with adaptive state aggregation

Baras, J.S. and Borkar, V.S.

Proceedings of the IEEE Conference on Decision and Control, pp.3351-3356,Sydney, Australia, December 2000.

Abstract

We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.

John S. Baras

2000

A learning algorithm for Markov decision processes with adaptive state aggregation

Papers Archive

Related Links