Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Increasing performance

Inference time depends on the Floating-Point Operations Per Second (FLOPS) required to run a model with hardware. The FLOPS is influenced by the number of model parameters and floating-point operations involved. The floating-point operations are mostly matrix operations, such as addition, products, and division. For example, a convolution operation has a few parameters representing the kernel, but takes longer to compute, as the operation has to be performed across the input matrix. In the case of a fully connected layer, the parameters are huge, but run quickly.

The weights of the model are usually double or high precision floating-point values, and an arithmetic operation on such numbers is more expensive than performing an operation on quantized values. In the next section, we will illustrate how quantizing the weights affects the model's performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Increasing performance

Create new playlist

Sign In

Sign Up

Table of Contents for
Increasing performance