32 3. PLANNING IN GVGAI
              
             
              
              
          
              
            
              
               
               
              
             
   40            
              
           
              
     
             
         
              
  
3.2 MONTE CARLO TREE SEARCH
            
             
              
              
                
     9 9         
             
      
              
           
             
           
                
             
               
                 
3.2. MONTE CARLO TREE SEARCH 33
 Q.s; a/        a   s   
a      s N.s; a/      s    N.s/
              
     
             Tree selection
Expansion Simulation  Backpropagation         
Selection
Tree
Policy
Default
Policy
Expansion Simulation Backpropagation
            
           
    
                
       selection         
                
 multi-armed bandit         
           
      r        
              
               
             
              
        Upper Confidence Bound  
              
           
a
D argmax
a2A.s/
(
Q.s; a/ C C
s
ln N.s/
N.s; a/
)
: 
34 3. PLANNING IN GVGAI
       a         
 Q.s; a/   exploitation          C  
exploration            a   
     s           N.s; a/
 N.s/     C       
  C D 0          
     Q.s; a/      Œ0; 1    
C    
p
2     C      
 tree selection           
                
 expansion    simulation        
     default policy       
             
              
backpropagation    N.s/ N.s; a/  Q.s; a/      
                
               
             
              
               a 
  N.s; a/        argmax
a2A.s/
Q.s; a/
                
 
      anytime          
               
             
             
              
             
         
          
  sampleMCTS  vanilla MCTS           
             
            
 C   
p
2       10      
       simulation           
            
                 
    
3.2. MONTE CARLO TREE SEARCH 35
Algorithm 3.1       
Input:    s
0
Output:         
 procedure s
0
    v
0
  s
0
 while    do
 v
l
.v
0
/
 .s.v
l
// s.v
l
/    v
l
   s
 v
l
 end while
 return a.v
0


 procedure v
 while
v
 
do
 ifv   
 return Expand.v/
 else
 v UCB1.v/  
 s f .v.s/; a.v//
 end while
 return v

 procedure v
  a 2    A.s.v// A.s.v//     s.v/
    v
0
 v
  s.v
0
/ D f .s.v/; a/ f .s.v/; a/    s.v/   a
 return v
0

 procedure s
 while s   do
  a 2 A.s/   
 s f .s; a/
 end while
 return    s

 procedure v;
 while v    do
 N.s.v// N.s.v// C 1
 N.s.v/; a.v// N.s.v/; a.v// C 1 a.v/      s.v/
 Q.s.v/; a.v// Q.s.v/; a.v// C
 v   v
 end while
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset