Econometrica: Sep, 1992, Volume 60, Issue 5
Jeffrey S. Banks, Rangarajan K. Sundaram
This paper studies the class of denumerable-armed (i.e. finite- or countably infinite-armed) bandit problems with independent arms and geometric discounting over an infinite horizon, in which each arm generates rewards according to one of a finite number of distributions, or "types." The number of types in the support of an arm, as also the types themselves, are allowed to vary across the arms. We derive certain continuity and curvature properties of the dynamic allocation (or Gittins) index of Gittins and Jones (1974), and provide necessary and sufficient conditions under which the Gittins-Jones result identifying all optimal strategies for finite-armed bandits may be extended to infinite-armed bandits. We then establish our central result: at each point in time, the arm selected by an optimal strategy will, with strictly positive probability, remain an optimal selection forever. More specifically, for every such arm, there exists (at least) one type of that arm such that, when conditioned on that type being the arm's "true" type, the arm will survive forever and continuously with nonzero probability. When the reward distributions of an arm satisfy the monotone likelihood ratio property (MLRP), the survival prospects of an arm improve when conditioned on types generating higher expected rewards; however, we show how this need not be the case in the absence of MLRP. Implications of these results are derived for the theories of job search and matching, as well as other applications of the bandit paradigm.