In the context of TensorFlow, variables are a special type of tensor objects that allow us to store and update the parameters of our models in a TensorFlow session during training. The following sections explain how we can define variables in a graph, initialize those variables in a session, organize variables via the so-called variable scope, and reuse existing variables.
TensorFlow variables store the parameters of a model that can be updated during training, for example, the weights in the input, hidden, and output layers of a neural network. When we define a variable, we need to initialize it with a tensor of values. Feel free to read more about TensorFlow variables at https://www.tensorflow.org/programmers_guide/variables.
TensorFlow provides two ways for defining variables:
tf.Variable(<initial-value>, name="variable-name")
tf.get_variable(name, ...)
The first one, tf.Variable
, is a class that creates an object for a new variable and adds it to the graph. Note that tf.Variable
does not have an explicit way to determine shape
and dtype
; the shape and type are set to be the same as those of the initial values.
The second option, tf.get_variable
, can be used to reuse an existing variable with a given name (if the name exists in the graph) or create a new one if the name does not exist. In this case, the name becomes critical; that's probably why it has to be placed as the first argument to this function. Furthermore, tf.get_variable
provides an explicit way to set shape
and dtype
; these parameters are only required when creating a new variable, not reusing existing ones.
The advantage of tf.get_variable
over tf.Variable
is twofold: tf.get_variable
allows us to reuse existing variables, and it already uses the popular Xavier/Glorot initialization scheme by default.
Besides the initializer, the get_variable
function provides other parameters to control the tensor, such as adding a regularizer for the variable. If you are interested in learning more about these parameters, feel free to read the documentation of tf.get_variable
at https://www.tensorflow.org/api_docs/python/tf/get_variable.
Xavier (or Glorot) initialization
In the early development of deep learning, it was observed that random uniform or random normal weight initialization could often result in a poor performance of the model during training.
In 2010, Xavier Glorot and Yoshua Bengio investigated the effect of initialization and proposed a novel, more robust initialization scheme to facilitate the training of deep networks.
The general idea behind Xavier initialization is to roughly balance the variance of the gradients across different layers. Otherwise, one layer may get too much attention during training while the other layer lags behind.
According to the research paper by Glorot and Bengio, if we want to initialize the weights from uniform distribution, we should choose the interval of this uniform distribution as follows:
Here, is the number of input neurons that are multiplied with the weights, and is the number of output neurons that feed into the next layer. For initializing the weights from Gaussian (normal) distribution, the authors recommend choosing the standard deviation of this Gaussian to be .
TensorFlow supports Xavier initialization in both uniform and normal distributions of weights. The documentation provides detailed information about using Xavier initialization with TensorFlow: https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer.
For more information about Glorot and Bengio's initialization scheme, including the mathematical derivation and proof, read their original paper (Understanding the difficulty of deep feedforward neural networks, Xavier Glorot and Yoshua Bengio, 2010), which is freely available at http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf.
In either initialization technique, it's important to note that the initial values are not set until we launch the graph in tf.Session
and explicitly run the initializer operator in that session. In fact, the required memory for a graph is not allocated until we initialize the variables in a TensorFlow session.
Here is an example of creating a variable object where the initial values are created from a NumPy array. The dtype
data type of this tensor is tf.int64
, which is automatically inferred from its NumPy array input:
>>> import tensorflow as tf >>> import numpy as np >>> >>> g1 = tf.Graph() >>> >>> with g1.as_default(): ... w = tf.Variable(np.array([[1, 2, 3, 4], ... [5, 6, 7, 8]]), name='w') ... print(w) <tf.Variable 'w:0' shape=(2, 4) dtype=int64_ref>
Here, it is critical to understand that tensors defined as variables are not allocated in memory and contain no values until they are initialized. Therefore, before executing any node in the computation graph, we must initialize the variables that are within the path to the node that we want to execute.
This initialization process refers to allocating memory for the associated tensors and assigning their initial values. TensorFlow provides a function named tf.global_variables_initializer
that returns an operator for initializing all the variables that exist in a computation graph. Then, executing this operator will initialize the variables as follows:
>>> with tf.Session(graph=g1) as sess: ... sess.run(tf.global_variables_initializer()) ... print(sess.run(w)) [[1 2 3 4] [5 6 7 8]]
We can also store this operator in an object such as init_op = tf.global_variables_initializer()
and execute this operator later using sess.run(init_op)
or init_op.run()
. However, we need to make sure that this operator is created after we define all the variables.
For example, in the following code, we define the variable w1
, then we define the operator init_op
, followed by the variable w2
:
>>> import tensorflow as tf >>> >>> g2 = tf.Graph() >>> >>> with g2.as_default(): ... w1 = tf.Variable(1, name='w1') ... init_op = tf.global_variables_initializer() ... w2 = tf.Variable(2, name='w2')
Now, let's evaluate w1
as follows:
>>> with tf.Session(graph=g2) as sess: ... sess.run(init_op) ... print('w1:', sess.run(w1)) w1: 1
This works fine. Now, let's try evaluating w2
:
>>> with tf.Session(graph=g2) as sess: ... sess.run(init_op) ... print('w2:', sess.run(w2)) FailedPreconditionError Attempting to use uninitialized value w2 [[Node: _retval_w2_0_0 = _Retval[T=DT_INT32, index=0, _device="/job:localhost/replica:0/task:0/cpu:0"](w2)]]
As shown in the code example, executing the graph raises an error because w2
was not initialized via sess.run(init_op)
, and therefore, couldn't be evaluated. The operator init_op
was defined prior to adding w2
to the graph; thus, executing init_op
will not initialize w2
.
In this subsection, we're going to discuss scoping, which is an important concept in TensorFlow, and especially useful if we are constructing large neural network graphs.
With variable scopes, we can organize the variables into separate subparts. When we create a variable scope, the name of operations and tensors that are created within that scope are prefixed with that scope, and those scopes can further be nested. For example, if we have two subnetworks, where each subnetwork has several layers, we can define two scopes named 'net_A'
and 'net_B'
, respectively. Then, each layer will be defined within one of these scopes.
Let's see how the variable names will turn out in the following code example:
>>> import tensorflow as tf >>> >>> g = tf.Graph() >>> >>> with g.as_default(): ... with tf.variable_scope('net_A'): ... with tf.variable_scope('layer-1'): ... w1 = tf.Variable(tf.random_normal( ... shape=(10,4)), name='weights') ... with tf.variable_scope('layer-2'): ... w2 = tf.Variable(tf.random_normal( ... shape=(20,10)), name='weights') ... with tf.variable_scope('net_B'): ... with tf.variable_scope('layer-1'): ... w3 = tf.Variable(tf.random_normal( ... shape=(10,4)), name='weights') ... ... print(w1) ... print(w2) ... print(w3) <tf.Variable 'net_A/layer-1/weights:0' shape=(10, 4) dtype=float32_ref> <tf.Variable 'net_A/layer-2/weights:0' shape=(20, 10) dtype=float32_ref> <tf.Variable 'net_B/layer-1/weights:0' shape=(10, 4) dtype=float32_ref>
Notice that the variable names are now prefixed with their nested scopes, separated by the forward slash (/
) symbol.
For more information about variable scoping, read the documentation at https://www.tensorflow.org/programmers_guide/variable_scope and https://www.tensorflow.org/api_docs/python/tf/variable_scope.
Let's imagine that we're developing a somewhat complex neural network model that has a classifier whose input data comes from more than one source. For example, we'll assume that we have data coming from source A and data comes from source B. In this example, we will design our graph in such a way that it will use the data from only one source as input tensor to build the network. Then, we can feed the data from the other source to the same classifier.
In the following example, we assume that data from source A is fed through a placeholder, and source B is the output of a generator network. We will build by calling the build_generator
function within the generator
scope, then we will add a classifier by calling build_classifier
within the classifier
scope:
>>> import tensorflow as tf >>> >>> ########################### ... ## Helper functions ## ... ########################### >>> >>> def build_classifier(data, labels, n_classes=2): ... data_shape = data.get_shape().as_list() ... weights = tf.get_variable(name='weights', ... shape=(data_shape[1], ... n_classes), ... dtype=tf.float32) ... bias = tf.get_variable(name='bias', ... initializer=tf.zeros( ... shape=n_classes)) ... logits = tf.add(tf.matmul(data, weights), ... bias, ... name='logits') ... return logits, tf.nn.softmax(logits) >>> >>> >>> def build_generator(data, n_hidden): ... data_shape = data.get_shape().as_list() ... w1 = tf.Variable( ... tf.random_normal(shape=(data_shape[1], ... n_hidden)), ... name='w1') ... b1 = tf.Variable(tf.zeros(shape=n_hidden), ... name='b1') ... hidden = tf.add(tf.matmul(data, w1), b1, ... name='hidden_pre-activation') ... hidden = tf.nn.relu(hidden, 'hidden_activation') ... ... w2 = tf.Variable( ... tf.random_normal(shape=(n_hidden, ... data_shape[1])), ... name='w2') ... b2 = tf.Variable(tf.zeros(shape=data_shape[1]), ... name='b2') ... output = tf.add(tf.matmul(hidden, w2), b2, ... name = 'output') ... return output, tf.nn.sigmoid(output) >>> >>> ########################### ... ## Build the graph ## ... ########################### >>> >>> batch_size=64 >>> g = tf.Graph() >>> >>> with g.as_default(): ... tf_X = tf.placeholder(shape=(batch_size, 100), ... dtype=tf.float32, ... name='tf_X') ... ... ## build the generator ... with tf.variable_scope('generator'): ... gen_out1 = build_generator(data=tf_X, ... n_hidden=50) ... ... ## build the classifier ... with tf.variable_scope('classifier') as scope: ... ## classifier for the original data: ... cls_out1 = build_classifier(data=tf_X, ... labels=tf.ones( ... shape=batch_size)) ... ... ## reuse the classifier for generated data ... scope.reuse_variables() ... cls_out2 = build_classifier(data=gen_out1[1], ... labels=tf.zeros( ... shape=batch_size))
Notice that we have called the build_classifier
function two times. The first call causes the building of the network. Then, we call scope.reuse_variables()
and call that function again. As a result, the second call does not create new variables; instead, it reuses the same variables. Alternatively, we could reuse the variables by specifying the reuse=True
parameter, as follows:
>>> g = tf.Graph() >>> >>> with g.as_default(): ... tf_X = tf.placeholder(shape=(batch_size, 100), ... dtype=tf.float32, ... name='tf_X') ... ## build the generator ... with tf.variable_scope('generator'): ... gen_out1 = build_generator(data=tf_X, ... n_hidden=50) ... ... ## build the classifier ... with tf.variable_scope('classifier'): ... ## classifier for the original data: ... cls_out1 = build_classifier(data=tf_X, ... labels=tf.ones( ... shape=batch_size)) ... ... with tf.variable_scope('classifier', reuse=True): ... ## reuse the classifier for generated data ... cls_out2 = build_classifier(data=gen_out1[1], ... labels=tf.zeros( ... shape=batch_size))
While we have discussed how to define computational graphs and variables in TensorFlow, a detailed discussion of how we can compute gradients in a computational graph is beyond the scope of this book, where we use TensorFlow's convenient optimizer classes that perform backpropagation automatically for us. If you are interested in learning more about the computation of gradients in computational graphs and the different ways to compute them in TensorFlow, please refer to the PyData talk by Sebastian Raschka at https://github.com/rasbt/pydata-annarbor2017-dl-tutorial.