How to do it...

  1. Create an action space for the mission:
Sample:
MalmoActionSpaceDiscrete actionSpace =
new MalmoActionSpaceDiscrete("movenorth 1", "movesouth 1", "movewest 1", "moveeast 1");
actionSpace.setRandomSeed(rndSeed);

  1. Create an observation space for the mission:
MalmoObservationSpace observationSpace = new MalmoObservationSpacePixels(xSize, ySize);
  1. Create a Malmo consistency policy:
MalmoDescretePositionPolicy obsPolicy = new MalmoDescretePositionPolicy();
  1. Create an MDP (short for Markov Decision Process) wrapper around the Malmo Java client:
Sample:
MalmoEnv mdp = new MalmoEnv("cliff_walking_rl4j.xml", actionSpace, observationSpace, obsPolicy);
  1. Create a DQN using DQNFactoryStdConv:
Sample:
public static DQNFactoryStdConv.Configuration MALMO_NET = new DQNFactoryStdConv.Configuration(
learingRate,
l2RegParam,
updaters,
listeners
);
  1. Use HistoryProcessor to scale the pixel image input:
Sample:
public static HistoryProcessor.Configuration MALMO_HPROC = new HistoryProcessor.Configuration(
numOfFrames,
rescaledWidth,
rescaledHeight,
croppingWidth,
croppingHeight,
offsetX,
offsetY,
numFramesSkip
);
  1. Create a Q-learning configuration by specifying hyperparameters:
Sample:
public static QLearning.QLConfiguration MALMO_QL = new QLearning.QLConfiguration(
rndSeed,
maxEpochStep,
maxStep,
expRepMaxSize,
batchSize,
targetDqnUpdateFreq,
updateStart,
rewardFactor,
gamma,
errorClamp,
minEpsilon,
epsilonNbStep,
doubleDQN
);
  1. Create the DQN model using QLearningDiscreteConv by passing MDP wrapper and DataManager: within the QLearningDiscreteConv constructor:
Sample:
QLearningDiscreteConv<MalmoBox> dql =
new QLearningDiscreteConv<MalmoBox>(mdp, MALMO_NET, MALMO_HPROC, MALMO_QL, manager);
  1. Train the DQN:
dql.train();
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset