IEEE Transactions on Automatic Control, Vol.48, No.3, 493-497, 2003
The optimal search for a Markovian target when the search path is constrained: The infinite-horizon case
A target moves among a finite number of cells according to a discrete-time homogeneous Markov chain. The searcher is subject to constraints on the search path, i.e., the cells available for search in the current epoch is a function of the cell searched in the previous epoch. The aim is to identify a search policy that maximizes the infinite-horizon total expected reward earned. We show the following structural results under the assumption that the target's transition matrix is ergodic: 1) the optimal search policy is stationary, 2) there exists e-optimal stationary policies which may be constructed by the standard value iteration algorithm in finite time. These results are obtained by showing that the dynamic programming operator associated with the search problem is a m-stage contraction mapping on a suitably defined space. An upper bound of m and the coefficient of contraction alpha is given in terms of the transition matrix and other variables pertaining to the search problem. These bounds on m and alpha may be used to derive bounds on suboptimal search polices constructed.
Keywords:Markovian target;optimal search;partially observed Markov decision process;stochastic shortest path problem