Technical requirements

The code for this chapter is located at https://github.com/PacktPublishing/Java-Deep-Learning-Cookbook/tree/master/12_Benchmarking_and_Neural_Network_Optimization/sourceCode/cookbookapp/src/main/java.

After cloning our GitHub repository, navigate to the Java-Deep-Learning-Cookbook/12_Benchmarking_and_Neural_Network_Optimization/sourceCode directory. Then import the cookbookapp project as a Maven project by importing pom.xml.

The following are links to two examples:

This chapter's examples are based on a customer churn dataset (https://github.com/PacktPublishing/Java-Deep-Learning-Cookbook/tree/master/03_Building_Deep_Neural_Networks_for_Binary_classification/sourceCode/cookbookapp/src/main/resources). This dataset is included in the project directory. 

Although we are explaining DL4J/ND4J-specific benchmarks in this chapter, it is recommended you follow general benchmarking guidelines. The following some important generic benchmarks that are common for any neural network:

  • Perform warm-up iterations before the actual benchmark task: Warm-up iterations refer to a set of iterations performed on benchmark tasks before commencing the actual ETL operation or network training. Warm up iterations are important because the execution of the first few iterations will be slow. This can add to the total duration of the benchmark tasks and we could end up with wrong/inconsistent conclusions. The slow execution of the first few iterations may be because of the compilation time taken by JVM, the lazy-loading approach of DL4J/ND4J libraries, or the learning phase of DL4J/ND4J libraries. This learning phase refers to the time taken to learn the memory requirements for execution.
  • Perform benchmark tasks multiple times: To make sure that benchmark results are reliable, we need to run benchmark tasks multiple times. The host system may have multiple apps/processes running in parallel apart from the benchmark instance. So, the runtime performance will vary over time. In order to assess this situation, we need to run benchmark tasks multiple times.
  • Understand where you set the benchmarks and why: We need to assess whether we are setting the right benchmarking. If we target operation a, then make sure that only operation a is being timed for benchmark. Also, we have to make sure that we are using the right libraries for the right situation. The latest versions of libraries are always preferred. It is also important to assess DL4J/ND4J configurations used in our code. The default configurations may suffice in regular scenarios, but manual configuration may be required for optimal performance. The following some of the default configuration options for reference: 
    • Memory configurations (heap space setup).
    • Garbage collection and workspace configuration (changing the frequency at which the garbage collector is called).
    • Add cuDNN support (utilizing a CUDA-powered GPU machine with better performance).
    • Enable DL4J cache mode (to bring in cache memory for the training instance). This will be a DL4J-specific change. 

We discussed cuDNN in Chapter 1Introduction to Deep Learning in Java, while we talked about DL4J in GPU environments. These configuration options will be discussed further in upcoming recipes.

  • Run the benchmark on a range of sizes: It is important to run the benchmark on multiple different input sizes/shapes to get a complete picture of its performance. Mathematical computations such as matrix multiplications vary over different dimensions. 
  • Understand the hardware: The training instance with the smallest minibatch size will perform better on a CPU than on a GPU system. When we use a large minibatch size, the observation will be exactly the opposite. The training instance will now be able to utilize GPU resources. In the same way, a large layer size can better utilize GPU resources. Writing network configurations without understanding the underlying hardware will not allow us to exploit its full capabilities. 
  • Reproduce the benchmarks and understand their limits: In order to troubleshoot performance bottlenecks against a set benchmark, we always need to reproduce them. It is helpful to assess the circumstance under which poor performance occurs. On top of that, we also need to understand the limitations put on certain benchmarks. Certain benchmarks set on a specific layer won't tell you anything about the performance factor of other layers. 
  • Avoid common benchmark mistakes:
    • Consider using the latest version of DL4J/ND4J. To apply the latest performance improvements, try snapshot versions.
    • Pay attention to the types of native libraries used (such as cuDNN).
    • Run enough iterations and with a reasonable minibatch size to yield consistent results.
    • Do not compare results across hardware without accounting for the differences.

In order to benefit from the latest fixes for performance issues, you need to have latest version in your local. If you want to run the source on the latest fix and if the new version hasn't been released, then you can make use of snapshot versions. To find out more about working with snapshot versions, go to https://deeplearning4j.org/docs/latest/deeplearning4j-config-snapshots.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset