Let us say we mean a page by a document (could be a chapter as well) and the whole book by a corpus. In a document, we can simply measure how many times a term has occurred. The more times a term has occurred, the more important it is in the document. But is that really the case? We will end up with articles (a/an/the) being on top of the list every time. Let's define term frequency as TF(t, d).