Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Padding and truncation

Converting text into sequences of integers

The following code is used to convert text into sequences of integers. The output is also provided:

seq <- texts_to_sequences(token, tweets)
seq
[[1]]
[1] 4 5 6 2 7 8 1 9 6

[[2]]
[1] 2 4 1 1 8 1

[[3]]
[1] 2 1

[[4]]
[1] 2 9 2 7 3 1

[[5]]
[1] 3 1 2 3 5

From the preceding code, we can see the following:

We have used texts_to_sequences to convert tweets into sequences of integers.
Since we've chosen the most frequent words for tokens to be 10, the integers within each sequence of integers have a maximum value of 9.
For each tweet, the number of integers in the sequence is less than how many words there are due to only the most frequent words being used.

The sequences of integers have different lengths, ranging from 2 to 9.
For the purpose of developing a classification model, all of the sequences need to be the same length. This is achieved by performing padding or truncation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.