Text categorization

Given a set of text documents and a set of predefined categories, the objective of text categorization is to assign each document to a category. The output can be a soft assignment or a hard assignment, depending on the problem. Soft assignment means that the category assignment is defined as a probability distribution over all categories. 

There are a wide range of applications of text categorization in industry. The following are a few examples:

  • Spam filtering: Given an email, classify it as spam or legitimate email.
  • Sentiment classification: Given a review text (movie review, product review), identify the user polarity—whether its a positive or negative or neural review.
  • Problem ticket assignment: Typically, in any industry, whenever a user faces an issue regarding any IT application or a software/hardware product, the fist step is to create a problem ticket. These tickets are text documents that describe the problem the user is facing. The next logical step is, someone has to read the description and assign it to the team with the right expertise to solve the issue. Now, given some historical ticket and resolution team categories, it's possible to build a text classifier to automatically classify the problem ticket. 
  • Auto resolution of problem tickets: In some cases the resolution to a problem is also predefined; that is, the expert team knows which steps to follow to solve the issue. So, in such cases, if a text classifier can be built with good accuracy for classifying the tickets, then, once the ticket category is predicted, an automated script can be run to directly resolve the issue. This is one of the goals of future Artificial Intelligence for IT Operations (AIOps). 
  • Targeted marketing: Marketers can monitor users in social media and classify them as promoters or detractors, and, based on that, what they are saying about the product online.
  • Genre classification: Automatic text genre classification is very important for classification and retrieval purposes. Even if a set of documents belongs to the same class as the documents share a common topic, they often serve different purposes, falling into diverse genre classes. If the genre of every document in a search database can be detected, information retrieval results could be better presented to the user, depending on the user preference.
  • Fraud detection in claims: Analyzing insurance claim text documents and detecting whether the claim is fraudulent.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset