Variables in TensorFlow

In the context of TensorFlow, variables are a special type of tensor objects that allow us to store and update the parameters of our models in a TensorFlow session during training. The following sections explain how we can define variables in a graph, initialize those variables in a session, organize variables via the so-called variable scope, and reuse existing variables.

Defining variables

TensorFlow variables store the parameters of a model that can be updated during training, for example, the weights in the input, hidden, and output layers of a neural network. When we define a variable, we need to initialize it with a tensor of values. Feel free to read more about TensorFlow variables at https://www.tensorflow.org/programmers_guide/variables.

TensorFlow provides two ways for defining variables:

  • tf.Variable(<initial-value>, name="variable-name")
  • tf.get_variable(name, ...)

The first one, tf.Variable, is a class that creates an object for a new variable and adds it to the graph. Note that tf.Variable does not have an explicit way to determine shape and dtype; the shape and type are set to be the same as those of the initial values.

The second option, tf.get_variable, can be used to reuse an existing variable with a given name (if the name exists in the graph) or create a new one if the name does not exist. In this case, the name becomes critical; that's probably why it has to be placed as the first argument to this function. Furthermore, tf.get_variable provides an explicit way to set shape and dtype; these parameters are only required when creating a new variable, not reusing existing ones.

The advantage of tf.get_variable over tf.Variable is twofold: tf.get_variable allows us to reuse existing variables, and it already uses the popular Xavier/Glorot initialization scheme by default.

Besides the initializer, the get_variable function provides other parameters to control the tensor, such as adding a regularizer for the variable. If you are interested in learning more about these parameters, feel free to read the documentation of tf.get_variable at https://www.tensorflow.org/api_docs/python/tf/get_variable.

Note

Xavier (or Glorot) initialization

In the early development of deep learning, it was observed that random uniform or random normal weight initialization could often result in a poor performance of the model during training.

In 2010, Xavier Glorot and Yoshua Bengio investigated the effect of initialization and proposed a novel, more robust initialization scheme to facilitate the training of deep networks.

The general idea behind Xavier initialization is to roughly balance the variance of the gradients across different layers. Otherwise, one layer may get too much attention during training while the other layer lags behind.

According to the research paper by Glorot and Bengio, if we want to initialize the weights from uniform distribution, we should choose the interval of this uniform distribution as follows:

Defining variables

Here, Defining variables is the number of input neurons that are multiplied with the weights, and Defining variables is the number of output neurons that feed into the next layer. For initializing the weights from Gaussian (normal) distribution, the authors recommend choosing the standard deviation of this Gaussian to be Defining variables.

TensorFlow supports Xavier initialization in both uniform and normal distributions of weights. The documentation provides detailed information about using Xavier initialization with TensorFlow: https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer.

For more information about Glorot and Bengio's initialization scheme, including the mathematical derivation and proof, read their original paper (Understanding the difficulty of deep feedforward neural networks, Xavier Glorot and Yoshua Bengio, 2010), which is freely available at http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf.

In either initialization technique, it's important to note that the initial values are not set until we launch the graph in tf.Session and explicitly run the initializer operator in that session. In fact, the required memory for a graph is not allocated until we initialize the variables in a TensorFlow session.

Here is an example of creating a variable object where the initial values are created from a NumPy array. The dtype data type of this tensor is tf.int64, which is automatically inferred from its NumPy array input:

>>> import tensorflow as tf
>>> import numpy as np
>>>
>>> g1 = tf.Graph()
>>>
>>> with g1.as_default():
...     w = tf.Variable(np.array([[1, 2, 3, 4],
...                               [5, 6, 7, 8]]), name='w')
...     print(w)
<tf.Variable 'w:0' shape=(2, 4) dtype=int64_ref>

Initializing variables

Here, it is critical to understand that tensors defined as variables are not allocated in memory and contain no values until they are initialized. Therefore, before executing any node in the computation graph, we must initialize the variables that are within the path to the node that we want to execute.

This initialization process refers to allocating memory for the associated tensors and assigning their initial values. TensorFlow provides a function named tf.global_variables_initializer that returns an operator for initializing all the variables that exist in a computation graph. Then, executing this operator will initialize the variables as follows:

>>> with tf.Session(graph=g1) as sess:
...     sess.run(tf.global_variables_initializer())
...     print(sess.run(w))

[[1 2 3 4]
 [5 6 7 8]]

We can also store this operator in an object such as init_op = tf.global_variables_initializer() and execute this operator later using sess.run(init_op) or init_op.run(). However, we need to make sure that this operator is created after we define all the variables.

For example, in the following code, we define the variable w1, then we define the operator init_op, followed by the variable w2:

>>> import tensorflow as tf
>>>
>>> g2 = tf.Graph()
>>>
>>> with g2.as_default():
...     w1 = tf.Variable(1, name='w1')
...     init_op = tf.global_variables_initializer()
...     w2 = tf.Variable(2, name='w2')

Now, let's evaluate w1 as follows:

>>> with tf.Session(graph=g2) as sess:
...     sess.run(init_op)
...     print('w1:', sess.run(w1))
w1: 1

This works fine. Now, let's try evaluating w2:

>>> with tf.Session(graph=g2) as sess:
...     sess.run(init_op)
...     print('w2:', sess.run(w2))
FailedPreconditionError
Attempting to use uninitialized value w2
    [[Node: _retval_w2_0_0 = _Retval[T=DT_INT32, index=0, _device="/job:localhost/replica:0/task:0/cpu:0"](w2)]]

As shown in the code example, executing the graph raises an error because w2 was not initialized via sess.run(init_op), and therefore, couldn't be evaluated. The operator init_op was defined prior to adding w2 to the graph; thus, executing init_op will not initialize w2.

Variable scope

In this subsection, we're going to discuss scoping, which is an important concept in TensorFlow, and especially useful if we are constructing large neural network graphs.

With variable scopes, we can organize the variables into separate subparts. When we create a variable scope, the name of operations and tensors that are created within that scope are prefixed with that scope, and those scopes can further be nested. For example, if we have two subnetworks, where each subnetwork has several layers, we can define two scopes named 'net_A' and 'net_B', respectively. Then, each layer will be defined within one of these scopes.

Let's see how the variable names will turn out in the following code example:

>>> import tensorflow as tf
>>>
>>> g = tf.Graph()
>>>
>>> with g.as_default():
...     with tf.variable_scope('net_A'):
...         with tf.variable_scope('layer-1'):
...             w1 = tf.Variable(tf.random_normal(
...                 shape=(10,4)), name='weights')
...         with tf.variable_scope('layer-2'):
...             w2 = tf.Variable(tf.random_normal(
...                 shape=(20,10)), name='weights')
...     with tf.variable_scope('net_B'):
...         with tf.variable_scope('layer-1'):
...             w3 = tf.Variable(tf.random_normal(
...                 shape=(10,4)), name='weights')
... 
...     print(w1)
...     print(w2)
...     print(w3)


<tf.Variable 'net_A/layer-1/weights:0' shape=(10, 4) dtype=float32_ref>
<tf.Variable 'net_A/layer-2/weights:0' shape=(20, 10) dtype=float32_ref>
<tf.Variable 'net_B/layer-1/weights:0' shape=(10, 4) dtype=float32_ref>

Notice that the variable names are now prefixed with their nested scopes, separated by the forward slash (/) symbol.

Note

For more information about variable scoping, read the documentation at https://www.tensorflow.org/programmers_guide/variable_scope and https://www.tensorflow.org/api_docs/python/tf/variable_scope.

Reusing variables

Let's imagine that we're developing a somewhat complex neural network model that has a classifier whose input data comes from more than one source. For example, we'll assume that we have data Reusing variables coming from source A and data Reusing variables comes from source B. In this example, we will design our graph in such a way that it will use the data from only one source as input tensor to build the network. Then, we can feed the data from the other source to the same classifier.

In the following example, we assume that data from source A is fed through a placeholder, and source B is the output of a generator network. We will build by calling the build_generator function within the generator scope, then we will add a classifier by calling build_classifier within the classifier scope:

>>> import tensorflow as tf
>>> 
>>> ###########################
... ##   Helper functions    ##
... ###########################
>>> 
>>> def build_classifier(data, labels, n_classes=2):
...     data_shape = data.get_shape().as_list()
...     weights = tf.get_variable(name='weights',
...                               shape=(data_shape[1],
...                                      n_classes),
...                               dtype=tf.float32)
...     bias = tf.get_variable(name='bias',
...                            initializer=tf.zeros(
...                                      shape=n_classes))
...     logits = tf.add(tf.matmul(data, weights),
...                     bias,
...                     name='logits')
...     return logits, tf.nn.softmax(logits)
>>>
>>>
>>> def build_generator(data, n_hidden):
...     data_shape = data.get_shape().as_list()
...     w1 = tf.Variable(
...         tf.random_normal(shape=(data_shape[1],
...                                 n_hidden)),
...         name='w1')
...     b1 = tf.Variable(tf.zeros(shape=n_hidden),
...                      name='b1')
...     hidden = tf.add(tf.matmul(data, w1), b1,
...                     name='hidden_pre-activation')
...     hidden = tf.nn.relu(hidden, 'hidden_activation')
...         
...     w2 = tf.Variable(
...         tf.random_normal(shape=(n_hidden,
...                                 data_shape[1])),
...         name='w2')
...     b2 = tf.Variable(tf.zeros(shape=data_shape[1]),
...                      name='b2')
...     output = tf.add(tf.matmul(hidden, w2), b2,
...                     name = 'output')
...     return output, tf.nn.sigmoid(output)
>>>
>>> ###########################
... ##  Build the graph      ##
... ###########################
>>>
>>> batch_size=64
>>> g = tf.Graph()
>>>
>>> with g.as_default():
...     tf_X = tf.placeholder(shape=(batch_size, 100),
...                           dtype=tf.float32,
...                           name='tf_X')
...
...     ## build the generator
...     with tf.variable_scope('generator'):
...         gen_out1 = build_generator(data=tf_X,
...                                    n_hidden=50)
...     
...     ## build the classifier
...     with tf.variable_scope('classifier') as scope:
...         ## classifier for the original data:
...         cls_out1 = build_classifier(data=tf_X,
...                                     labels=tf.ones(
...                                         shape=batch_size))
...         
...         ## reuse the classifier for generated data
...         scope.reuse_variables()
...         cls_out2 = build_classifier(data=gen_out1[1],
...                                     labels=tf.zeros(
...                                         shape=batch_size))

Notice that we have called the build_classifier function two times. The first call causes the building of the network. Then, we call scope.reuse_variables() and call that function again. As a result, the second call does not create new variables; instead, it reuses the same variables. Alternatively, we could reuse the variables by specifying the reuse=True parameter, as follows:

>>> g = tf.Graph()
>>>
>>> with g.as_default():
...     tf_X = tf.placeholder(shape=(batch_size, 100),
...                           dtype=tf.float32,
...                           name='tf_X')
...     ## build the generator
...     with tf.variable_scope('generator'):
...         gen_out1 = build_generator(data=tf_X,
...                                    n_hidden=50)
...     
...     ## build the classifier
...     with tf.variable_scope('classifier'):
...         ## classifier for the original data:
...         cls_out1 = build_classifier(data=tf_X,
...                                     labels=tf.ones(
...                                         shape=batch_size))
...         
...     with tf.variable_scope('classifier', reuse=True):
...         ## reuse the classifier for generated data
...         cls_out2 = build_classifier(data=gen_out1[1],
...                                     labels=tf.zeros(
...                                         shape=batch_size))

Note

While we have discussed how to define computational graphs and variables in TensorFlow, a detailed discussion of how we can compute gradients in a computational graph is beyond the scope of this book, where we use TensorFlow's convenient optimizer classes that perform backpropagation automatically for us. If you are interested in learning more about the computation of gradients in computational graphs and the different ways to compute them in TensorFlow, please refer to the PyData talk by Sebastian Raschka at https://github.com/rasbt/pydata-annarbor2017-dl-tutorial.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset