Pros and cons of the BoW approach

Now that we have an understanding of both the theory and implementation of the BoW approach, let's examine the pros and cons of the approach. When it comes to pros, the BoW approach is very simple to understand and implement and therefore offers a lot of flexibility for customization on any text dataset. It may be observed that the BoW approach does not retain the order of words specifically when only unigrams are considered. This problem is generally overcome by retaining n-grams in the DTM. However, it comes at the cost as larger infrastructure is needed to process the text and build a classifier. Another severe drawback of the approach is that it does not respect the semantics of the word. For example, the words "car" and "automobile" are often used in the same context. A model built based on BoW treats the sentences "buy used cars" and "purchase old automobiles" as very different sentences. While these sentences are the same, BoW models do not classify these sentences as the same, as the words in these sentences are not matching. It is possible to consider the semantics of words in sentences using an approach called word embedding. This is something we will explore in our next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset