There's more...

Initializing word vectors with those retrieved from pretrained unsupervised models is a known method for increasing performance. If you can recall what we have done in this recipe, you will remember that we used pretrained Google News vectors for the same purpose. For a CNN, when applied to text instead of images, we will be dealing with one-dimensional array vectors that represent the text. We perform the same steps, such as convolution and max pooling with feature maps, as discussed in Chapter 4, Building Convolutional Neural Networks. The only difference is that instead of image pixels, we use vectors that represent text. CNN architectures have subsequently shown great results against NLP tasks. The paper found at https://www.aclweb.org/anthology/D14-1181 will contain further insights on this.

The network architecture of a computation graph is a directed acyclic graph, where each vertex in the graph is a graph vertex. A graph vertex can be a layer or a vertex that defines a random forward/backward pass functionality. Computation graphs can have a random number of inputs and outputs. We needed to stack multiple convolution layers, which was not possible in the case of a normal CNN architecture.

ComputaionGraph has an option to set the configuration known as convolutionMode. convolutionMode determines the network configuration and how the convolution operations should be performed for convolutional and subsampling layers (for a given input size). Network configurations such as stride/padding/kernelSize are applicable for a given convolution mode. We are setting the convolution mode using convolutionMode because we want to stack the results of all three convolution layers as one and generate the prediction.

The output sizes for convolutional and subsampling layers are calculated in each dimension as follows:

outputSize = (inputSize - kernelSize + 2*padding) / stride + 1

If outputSize is not an integer, an exception will be thrown during the network initialization or forward pass. We have discussed MergeVertex, which was used to combine the activations of two or more layers. We used MergeVertex to perform the same operation with our convolution layers. The merge will depend on the type of inputs—for example, if we wanted to merge two convolution layers with a sample size (batchSize) of 100, and depth of depth1 and depth2 respectively, then merge will stack the results where the following applies: 

depth = depth1 + depth2

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset