Thompson sampling
Type of heuristic technique / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about Thompson sampling?
Summarize this article for a 10 year old
SHOW ALL QUESTIONS
Thompson sampling,[1][2][3] named after William R. Thompson, is a heuristic for choosing actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
This article needs additional citations for verification. (May 2012) |