The following code is used to convert text into sequences of integers. The output is also provided:
seq <- texts_to_sequences(token, tweets)
seq
[[1]]
[1] 4 5 6 2 7 8 1 9 6
[[2]]
[1] 2 4 1 1 8 1
[[3]]
[1] 2 1
[[4]]
[1] 2 9 2 7 3 1
[[5]]
[1] 3 1 2 3 5
From the preceding code, we can see the following:
- We have used texts_to_sequences to convert tweets into sequences of integers.
- Since we've chosen the most frequent words for tokens to be 10, the integers within each sequence of integers have a maximum value of 9.
- For each tweet, the number of integers in the sequence is less than how many words there are due to only the most frequent words being used.
- The sequences of integers have different lengths, ranging from 2 to 9.
- For the purpose of developing a classification model, all of the sequences need to be the same length. This is achieved by performing padding or truncation.