34 3. PLANNING IN GVGAI
a
Q.s; a/ exploitation C
exploration a
s N.s; a/
N.s/ C
C D 0
Q.s; a/ Œ0; 1
C
p
2 C
tree selection
expansion simulation
default policy
backpropagation N.s/ N.s; a/ Q.s; a/
a
N.s; a/ argmax
a2A.s/
Q.s; a/
anytime
sampleMCTS vanilla MCTS
C
p
2 10
simulation