Econometrica: Sep, 1988, Volume 56, Issue 5
Controlling a Stochastic Process with Unknown Parameters
David Easley, Nicholas M. Kiefer
The problem of controlling a stochastic process with unknown parameters over an infinite horizon with discounting is considered. The possibility of sacrificing current period expected reward for information leading to possible increases in future reward is examined. Agents express beliefs about unknown parameters in terms of distributions. Under general conditions the sequence of beliefs converges to a limit distribution. The limit distribution may or may not be concentrated at the true parameter value. In some cases complete learning is optimal; in others the optimal strategy does not imply complete learning. The paper concludes with examination of some special cases including high and low discount rates, discrete parameter and action spaces (the n-armed bandit with correlated arms), and a class of examples in which incomplete learning is optimal.