How it works...

A workspace is a memory management model that enables the reuse of memory for cyclic workloads without having to introduce a JVM garbage collector. INDArray memory content is invalidated once in every workspace loop. Workspaces can be integrated for training or inference.

In step 1, we start with workspace benchmarking. The detach() method will detach the specific INDArray from the workspace and will return a copy. So, how do we enable workspace modes for our training instance? Well, if you're using the latest DL4J version (from 1.0.0-alpha onwards), then this feature is enabled by default. We target version 1.0.0-beta 3 in this book.

In step 2, we removed workspaces from the memory, as shown here:

Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

This will destroy workspaces from the current running thread only. We can release memory from workspaces in this way by running this piece of code in the thread in question.

DL4J also lets you implement your own workspace manager for layers. For example, activation results from one layer during training can be placed in one workspace, and the results of the inference can be placed in another workspace. This is possible using DL4J's LayerWorkspaceMgr, as mentioned in step 3. Make sure that the returned array (myArray in step 3) is defined as ArrayType.ACTIVATIONS:

LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS,myArray);

It is fine to use different workspace modes for training/inference. But it is recommended you use SEPARATE mode for training and SINGLE mode for inference because inference only involves a forward pass and doesn't involve backpropagation. However, for training instances with high resource consumption/memory, it might be better to go for SEPARATE workspace mode because it consumes less memory. Note that SEPARATE is the default workspace mode in DL4J.

In step 4, two attributes are used while creating PerformanceListener: reportScore and frequency. reportScore is a Boolean variable and frequency is the iteration count by which time needs to be tracked. If reportScore is true, then it will report the score (just like in ScoreIterationListener) along with information on the time spent on each iteration.

In step 7, we used ParallelWrapper or ParallelInference for multi-GPU devices. Once we have created a neural network model, we can create a parallel wrapper using it. We specify the count of devices, a training mode, and the number of workers for the parallel wrapper.

We need to make sure that our training instance is cost-effective. It is not feasible to spend a lot adding multiple GPUs and then utilizing one GPU in training. Ideally, we want to utilize all GPU hardware to speed up the training/inference process and get better results. ParallelWrapper and ParallelInference serve this purpose.

The following some configurations supported by ParallelWrapper and ParallelInference:

prefetchBuffer(deviceCount): This parallel wrapper method specifies dataset prefetch options. We mention the number of devices here.
trainingMode(mode): This parallel wrapper method specifies the distributed training method. SHARED_GRADIENTS refers to the gradient sharing method for distributed training.
workers(Nd4j.getAffinityManager().getNumberOfDevices()): This parallel wrapper method specifies the number of workers. We set the number of workers to the number of available systems.
inferenceMode(mode): This parallel inference method specifies the distributed inference method. BATCHED mode is an optimization. If a large number of requests come in, it will process them in batches. If there is a small number of requests, then they will be processed as usual without batching. As you might have guessed, this is the perfect option if you're in production.
batchLimit(batchSize): This parallel inference method specifies the batch size limit and is only applicable if you use BATCHED mode in inferenceMode().

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...