TensorFlow Serving

TensorFlow Serving is a project by Google that deploys models to production environments. TensorFlow Serving offers the following advantages:

High speed of inference with good latency
Concurrency, providing good throughput
Model version management, so models can be swapped without a downtime in production

These advantages make TensorFlow Serving a great tool for deployment to the cloud. TensorFlow is served by a gRPC server. gRPC is a remote procedure call system from Google. Since most of the production environments run on Ubuntu, the easiest way to install TensorFlow Serving is by using apt-get, as follows:

sudo apt-get install tensorflow-model-serving

It can also be compiled from a source in other environments. Due to the prevalence of using Docker for deployment, it's easier to build it as an Ubuntu image. For installation-related guidance, please visit https://www.tensorflow.org/serving/setup.

The following figure shows the architectural diagram of TensorFlow Serving:

Once the model has been trained and validated, it can be pushed to the model repository. Based on the version number, TensorFlow Serving will start serving the model. A client can query the server using the TensorFlow Serving client. To use the client from Python, install the TensorFlow Serving API, as follows:

sudo pip3 install tensorflow-serving-api

The preceding command installs the client component of TensorFlow Serving. This can then be used to make inference calls to the server.

Table of Contents for TensorFlow Serving

Create new playlist

Sign In

Sign Up

Table of Contents for
TensorFlow Serving