Improvements to the M5 model

The standard M5 algorithm tree currently has been received as the most state-of-the-art model among decision trees for completing complex regression tasks. This is mainly because of the accurate results it yields as well as its ability to handle tasks with a very large number of dimensions with upwards of hundreds of attributes.

In an attempt to improve on or otherwise optimize the standard M5 algorithm, M5Flex has recently been introduced as perhaps the most viable option. The M5Flex algorithm approach will attempt to augment a standard M5 tree model with domain knowledge. In other words, M5Flex empowers someone who has familiarity with the data population to review and choose the split attributes and split values for those important nodes (within the model tree) with the assumption that, since they may "know best," the resulting model will be even more accurate, consistent, and appropriate for practical applications than it would be by relying exclusively on the standard M5. One drawback or criticism to using M5Flex is that, in most cases, a domain expert may not always be available.

Still another attempt at improving M5 is M5opt. M5opt is a semi-non-greedy algorithm utilizing the unassuming approach of not trying to solve your globally complex optimization problem holistically, or as "one entity", but rather by splitting the procedure of generating the tree layers into two distinct steps, each using a different type or nature of algorithm, based upon the layer of the tree:

  1. Global optimization: Generate upper layers of the tree (from the first layer) by using a global (multi-extremum) optimization algorithm (or a better-than-greedy approach algorithm).
  2. Greedy searching: Generate the rest of the tree (the tree's lower layers) by using a faster "greedy algorithm" like the standard M5.

Additionally, the layer up to which global optimization is applied could be different in different branches. However, it would be reasonable to fix it at some value for all branches; this allows for a flexible trade-off between speed and optimization. Although using the M5opt algorithm to optimize the process of constructing tree models has been shown to be successful in yielding models more accurate than those created using standard M5, the computational costs will be increased due to the nature of how "non-greedy" algorithms work.

To address this, one can control the cost by reviewing what tree level is "most appropriate," or which level would yield the most accuracy with the least possible cost required, and then performing the more exhaustive, non-greedy search at that level of the tree.

Further attempts to optimize standard M5 have been to try to combine both the M5opt and the M5flex approaches.

Finally, artificial neural networks (ANN's) discussed in Chapter 5, Neural Networks have been offered as an alternative to standard M5, but only in such scenarios where the tree model is presumed to be less complex. In complex models, M5 almost always outperforms ANN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset