IEEE Transactions on Automatic Control, Vol.61, No.10, 3001-3015, 2016
State Classification of Time-Nonhomogeneous Markov Chains and Average Reward Optimization of Multi-Chains
In a discrete time nonhomogeneous Markov chain (TNHMC), the states spaces, transition probabilities, and reward functions at different times may be different. In this paper, with the confluencity previously introduced, we show that the states of a TNHMC can be classified into the branching states and a number of classes of confluent states (versus the transient and recurrent states in the time homogeneous case). The optimization of average reward in TNHMC's consisting of a single confluent class (uni-chain) have been addressed in a previous paper by the author. In this paper, we show that with confluencity and the state classification and under some bound conditions, we can obtain the necessary and sufficient conditions for optimal policies of the average reward of TNHMCs consisting of multiple confluent classes (multi-chains). Just like in the uni-chain TNHMC case, the sufficient condition does not need to hold in any "zero frequently visited" time sequence. This "under-selectivity" makes the problem not amenable to dynamic programming. A direct comparison based approach is used to prove the results. The results enhance our understanding of state classification and performance optimization with the notion of confluencity.
Keywords:Branching states;confluencity;confluent states;decomposition-separation theorem;direct comparison;optimality equations;performance potentials;synchronicity