Preparing the data for model building

The steps we need to follow in order to prepare the data for model building are as follows:

Tokenization
Converting text into integers
Padding and truncation

To illustrate the steps involved in data preparation, we will make use of a very small text dataset involving five tweets related to when the Apple iPhone X released in September 2017. We will use this small dataset to understand the steps that are involved in data preparation and then we will switch to a larger IMDb dataset in order to build a deep network classification model. The following are the five tweets that we are going to store in t1 to t5:

t1 <- "I'm not a huge $AAPL fan but $160 stock closes down $0.60 for the day on huge volume isn't really bearish"
t2 <- "$AAPL $BAC not sure what more dissapointing: the new iphones or the presentation for the new iphones?"
t3 <- "IMO, $AAPL animated emojis will be the death of $SNAP."
t4 <- "$AAPL get on board. It's going to 175. I think wall st will have issues as aapl pushes 1 trillion dollar valuation but 175 is in the cards"
t5 <- "In the AR vs. VR battle, $AAPL just put its chips behind AR in a big way."

The preceding tweets include text that's in both lowercase and uppercase, punctuation, numbers, and special characters.

Table of Contents for Preparing the data for model building

Create new playlist

Sign In

Sign Up

Table of Contents for
Preparing the data for model building