A learning algorithm for Markov decision processes with adaptive state aggregation
Baras, J.S. and Borkar, V.S.
Proceedings of the IEEE Conference on Decision and Control, pp.3351-3356,Sydney, Australia, December 2000.
We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.