There's more...

Memory configurations are often applied to master/worker nodes separately. Therefore, memory configuration on worker nodes alone may not bring the required results. The approach we take can vary, depending on the cluster resource manager we use. Therefore, it is important to refer to the respective documentation on the different approaches for a specific cluster resource manager. Also, note that the default memory settings in the cluster resource managers are not appropriate (too low) for libraries (ND4J/DL4J) that heavily rely on off-heap memory space. spark-submit can load the configurations in two different ways. One way is to use the command line, as we discussed previously, while another one is to specify the configuration in the spark-defaults.conf file, like so:

spark.master spark://5.6.7.8:7077
spark.executor.memory 4g

Spark can accept any Spark properties using the --conf flag. We used it to specify off-heap memory space in this recipe. You can read more about Spark configuration here: http://spark.apache.org/docs/latest/configuration.html:

The dataset should justify the memory allocation in the driver/executor. For 10 MB of data, we don't have to assign too much of the memory to the executor/driver. In this case, 2 GB to 4 GB of memory would be enough. Allotting too much memory won't make any difference and it can actually reduce the performance.
The driver is the process where the main Spark job runs. Executors are worker node tasks that have individual tasks allotted to run. If the application runs in local mode, the driver memory is not necessarily allotted. The driver memory is connected to the master node and it is relevant while the application is running in cluster mode. In cluster mode, the Spark job will not run on the local machine it was submitted from. The Spark driver component will launch inside the cluster.
Kryo is a fast and efficient serialization framework for Java. Kryo can also perform automatic deep/shallow copying of objects in order to attain a high speed, low size, and easy-to-use API. The DL4J API can make use of Kryo serialization to optimize the performance a bit further. However, note that since INDArrays consume off-heap memory space, Kryo may not result in much performance gain. Check the respective logs to ensure your Kryo configuration is correct while using it with the SparkDl4jMultiLayer or SparkComputationGraph classes.
Just like in regular training, we need to add the proper ND4J backend for DL4J Spark to function. For newer versions of YARN, some additional configurations may be required. Refer to the YARN documentation for more details: https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html.

Also, note that older versions (2.7.x or earlier) will not support GPUs natively (GPU and CPU). For these versions, we need to use node labels to ensure that jobs are running in GPU-only machines.

If you perform Spark training, you need to be aware of data locality in order to optimize the throughput. Data locality ensures that the data and the code that operates on the Spark job are together and not separate. Data locality ships the serialized code from place to place (instead of chunks of data) where the data operates. It will speed up its performance and won't introduce further issues since the size of the code will be significantly smaller than the data. Spark provides a configuration property named spark.locality.wait to specify the timeout before moving the data to a free CPU. If you set it to zero, then data will be immediately moved to a free executor rather than wait for a specific executor to become free. If the freely available executor is distant from the executor where the current task is executed, then it is an additional effort. However, we are saving time by waiting for a nearby executor to become free. So, the computation time can still be reduced. You can read more about data locality on Spark here: https://spark.apache.org/docs/latest/tuning.html#data-locality.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...