Inverse Document Frequency (IDF)

This measures how important a term is for the corpus. While computing TF, all terms are considered equally important. However, it is common thinking that stop words occur more often, but they are less important as far as NLP is concerned. Thus, there is a need to bring down the importance of common terms and bring up the importance of rare terms, hence the IDF, which is calculated as follows:

IDF(t) = log_e(Total number of documents/Number of documents with term t in it)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset