Converting text into sequences of integers

The following code is used to convert text into sequences of integers. The output is also provided:

seq <- texts_to_sequences(token, tweets)
seq
[[1]]
[1] 4 5 6 2 7 8 1 9 6

[[2]]
[1] 2 4 1 1 8 1

[[3]]
[1] 2 1

[[4]]
[1] 2 9 2 7 3 1

[[5]]
[1] 3 1 2 3 5

From the preceding code, we can see the following:

  • We have used texts_to_sequences to convert tweets into sequences of integers.
  • Since we've chosen the most frequent words for tokens to be 10, the integers within each sequence of integers have a maximum value of 9.
  • For each tweet, the number of integers in the sequence is less than how many words there are due to only the most frequent words being used.
  • The sequences of integers have different lengths, ranging from 2 to 9.
  • For the purpose of developing a classification model, all of the sequences need to be the same length. This is achieved by performing padding or truncation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset