About the data

The data that we are using to create the model that detects the spam messages is taken from http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/, which contains 747 spam message samples, along with 4,827 non-spam messages.

These messages are taken from different sources and labeled with the category of spam and non-spam. If you open the downloaded file in Notepad or any text editor, it will be in the following format:

ham   What you doing?how are you?
ham Ok lar... Joking wif u oni...
ham dun say so early hor... U c already then say...
ham MY NO. IN LUTON 0125698789 RING ME IF UR AROUND! H*
ham Siva is in hostel aha:-.
ham Cos i was out shopping with darren jus now n i called him 2 ask wat present he wan lor. Then he started guessing who i was wif n he finally guessed darren lor.
spam FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! ubscribe6GBP/ mnth inc 3hrs 16 stop?txtStop
spam Sunshine Quiz! Win a super Sony DVD recorder if you can name the capital of Australia? Text MQUIZ to 82277. B
spam URGENT! Your Mobile No 07808726822 was awarded a L2,000 Bonus Caller Prize on 02/09/03! This is our 2nd attempt to contact YOU! Call 0871-872-9758 BOX95QU

In the preceding sample, we can see that every line starts with the category name and is followed by the actual message.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset