Summary

This chapter covered two classification methods that partition the data according to values of the features. Decision trees use a divide-and-conquer strategy to create flowcharts, while rule learners separate-and-conquer data to identify logical if-else rules. Both methods produce models that can be understood without a statistical background.

One popular and highly-configurable decision tree algorithm is C5.0. We used the C5.0 algorithm to create a tree to predict whether a loan applicant will default. Using options for boosting and cost-sensitive errors, we were able to improve our accuracy and avoid risky loans that cost the bank more money.

We also used two rule learners, 1R and RIPPER, to develop rules for identifying poisonous mushrooms. The 1R algorithm used a single feature to achieve 99 percent accuracy in identifying potentially-fatal mushroom samples. On the other hand, the set of nine rules generated by the more sophisticated RIPPER algorithm correctly identified the edibility of every mushroom.

This chapter merely scratched the surface of how trees and rules can be used. Chapter 6, Forecasting Numeric Data – Regression Methods, describes techniques known as regression trees and model trees, which use decision trees for numeric prediction. In Chapter 11, Improving Model Performance, we will discover how the performance of decision trees can be improved by grouping them together in a model known as a random forest. And in Chapter 8, Finding Patterns – Market Basket Analysis Using Association Rules, we will see how association rules—a relative of classification rules—can be used to identify groups of items in transactional data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset