Padding and truncation

The code for making all the sequences of integers equal is as follows:

pad_seq <- pad_sequences(seq, maxlen = 5)
pad_seq
[,1] [,2] [,3] [,4] [,5]
[1,] 7 8 1 9 6
[2,] 4 1 1 8 1
[3,] 0 0 0 2 1
[4,] 9 2 7 3 1
[5,] 3 1 2 3 5

From the preceding code, we can see the following:

  • We have used pad_sequences so that all of the sequences of integers are equal in length.
  • When we specify the maximum length of all the sequences (using maxlen) to be 5, this will truncate sequences that are longer than 5 and add zeros to sequences that are shorter than 5.
  • Note that the default setting for padding here is "pre". This means that when a sequence is longer than 5, truncation will effect integers at the beginning of the sequence. We can observe this for the first sequence in the preceding output, where 4, 5, 6, and 2 have been removed.
  • Similarly, for the third sequence, which has a length of two, three zeros have been added to the beginning of the sequence. 

There may be situations where you may prefer to truncate or add zeroes to the end of the sequences of integers. The code to achieve this is as follows:

pad_seq <- pad_sequences(seq, maxlen = 5, padding = 'post')
pad_seq
[,1] [,2] [,3] [,4] [,5]
[1,] 7 8 1 9 6
[2,] 4 1 1 8 1
[3,] 2 1 0 0 0
[4,] 9 2 7 3 1
[5,] 3 1 2 3 5

In the preceding code, we have specified the padding as post. The impact of this type of padding can be seen in the output, where zeros have been added to the end of sequence 3, which adds up to less than 5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset