Summary

This chapter covered two classification methods that use so-called "greedy" algorithms to partition the data according to feature values. Decision trees use a divide and conquer strategy to create flowchart-like structures, while rule learners separate and conquer data to identify logical if-else rules. Both methods produce models that can be interpreted without a statistical background.

One popular and highly configurable decision tree algorithm is C5.0. We used the C5.0 algorithm to create a tree to predict whether a loan applicant will default. Using options for boosting and cost-sensitive errors, we were able to improve our accuracy and avoid risky loans that could cost the bank more money.

We also used two rule learners, 1R and RIPPER, to develop rules for identifying poisonous mushrooms. The 1R algorithm used a single feature to achieve 99 percent accuracy in identifying potentially fatal mushroom samples. On the other hand, the set of eight rules generated by the more sophisticated RIPPER algorithm correctly identified the edibility of every mushroom.

This chapter merely scratched the surface of how trees and rules can be used. The next chapter, Chapter 6, Forecasting Numeric Data – Regression Methods, describes techniques known as regression trees and model trees, which use decision trees for numeric prediction rather than classification. In Chapter 8, Finding Patterns – Market Basket Analysis Using Association Rules, we will see how association rules—a relative of classification rules—can be used to identify groups of items in transactional data. In Chapter 11, Improving Model Performance, we will discover how the performance of decision trees can be improved by grouping them together in a model known as a random forest.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset