APPROXIMATE FIXED POINT ITERATION WITH AN APPLICATION TO INFINITE HORIZON MARKOV DECISION PROCESSES

Almudevar A

SIAM Journal on Control and Optimization, Vol.47, No.5, 2303-2347, 2008

DOI10.1137/S0363012904441520 Export Citation

APPROXIMATE FIXED POINT ITERATION WITH AN APPLICATION TO INFINITE HORIZON MARKOV DECISION PROCESSES

Many approximate iterative algorithms can be represented by the form V(n) = TV(n-1) + U(n), n >= 1, where V(n-1), U(n), n >= 1, are elements of a seminormed linear space (nu, parallel to . parallel to) and T is a contractive operator. The objective is usually to calculate a fixed point V* of T. Here, the quantities Un may be interpreted as the errors obtained when the operator T can be evaluated only approximately. As is well known, in the absence of such an approximation error, the algorithm converges to a fixed point with a geometric rate under general conditions. In this article the convergence properties of such algorithms in the presence of an approximation error Un are studied. It is shown that the error parallel to V(n) - V*parallel to is dominated by parallel to U(n)parallel to, so that convergence of parallel to U(n)parallel to to zero implies the convergence of parallel to V(n) - V*parallel to to zero, at a rate determined by parallel to U(n)parallel to. The results are naturally extended to a relative error of the form parallel to U(n)parallel to/parallel to V(n-1)parallel to, as well as to J-stage contractions. The utility of this general theory is then demonstrated by an extended application to the problem of model-based approximate and adaptive control of Markov decision processes. The theory is shown to permit a sharpening of known convergence rates under more general conditions. Additionally, bounds on regret for adaptive controls with forced exploration are calculated in terms of a stagewise exploration rate. This permits the determination of an optimal choice of exploration rate within the class of certainty-equivalence adaptive control policies.

Keywords:fixed point;contraction mappings;dynamic programming;Markov decision processes