Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

Hao T; Tamio A

International Journal of Control, Vol.82, No.10, 1917-1928, 2009

DOI10.1080/00207170902823006 Export Citation

Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

We consider the look-ahead control of a conveyor-serviced production station (CSPS) in the context of semi-Markov decision process (SMDP) model, and our goal is to find an optimal control policy under either average- or discounted-cost criteria. Policy iteration (PI), combined with the concept of performance potential, can be applied to provide a unified optimisation framework for both criteria. However, a major difficulty arises in the exact solution scheme, that is, it requires not only the full knowledge of model parameters, but also a considerable amount of work to obtain and process the necessary system and performance matrices. To overcome this difficulty, we propose a potential-based online PI algorithm in this article. During implementation, by analysing and utilising the historic information of all the past operation of a practical CSPS system, the potentials and state-action values are learned on line through an effective exploration scheme. We finally illustrate the successful application of this learning-based technique in CSPS systems by an example.

Keywords:conveyor-serviced production station (CSPS);semi-Markov decision process (SMDP);look-ahead control;performance potential;policy iteration (PI);exploration