John C. Gittins(1989),Multi-armed bandit allocation indices,Wiley-Interscience Series in Systems and Optimization.,Chichester:John Wiley & Sons, Ltd.,ISBN978-0-471-92059-5
Don Berry;Fristedt,Bert(1985),Bandit problems: Sequential allocation of experiments,Monographs on Statistics and Applied Probability,London:Chapman & Hall,ISBN978-0-412-24810-8
Peter Whittle(1979),“Discussion of Dr Gittins' paper”,Journal of the Royal Statistical Society,Series B41(2): 148-177,doi:10.1111/j.2517-6161.1979.tb01069.x
Dayanik,S.;Powell,W.;Yamazaki,K.(2008),“Index policies for discounted bandit problems with availability constraints”,Advances in Applied Probability40(2): 377-400,doi:10.1239/aap/1214950209.
Powell,Warren B.(2007),“Chapter 10”,Approximate Dynamic Programming: Solving the Curses of Dimensionality,New York:John Wiley and Sons,ISBN978-0-470-17155-4.
Herbert Robbins(1952),“Some aspects of the sequential design of experiments”,Bulletin of the American Mathematical Society58(5): 527-535,doi:10.1090/S0002-9904-1952-09620-8.
MABWiser, open source Python implementation of bandit strategies that supports context-free, parametric and non-parametric contextual policies with built-in parallelization and simulation capability.
PyMaBandits, open source implementation of bandit strategies in Python and Matlab.
Contextual, open sourceR package facilitating the simulation and evaluation of both context-free and contextual Multi-Armed Bandit policies.