Abstract |
: |
The process of computation of classification trees can be characterized as involving three basic choices: the type of splits considered in the growing process, the criterion to be optimized on the successively constructed partitions, and the way to get right-sized trees. Most implementations are ordinary binary trees, i.e. trees whose successive cuts are made by hyper-planes perpendicular to the axes, while most of the litterature concerns the various possible criteria and pruning methods. L. Devroye, L. Györfy and G. Lugosi (1996) define and consider the remarkable theoretical properties of a binary tree classifier whose prominent feature is the particular type of splits used in its construction: at a given node, partitioning is made by hyper-rectangles rather than hyper-planes. We propose a simple algorithm for the optimization problem involved. Then we compare the performance of two different implementations of our algorithm with two leading algorithms for ordinary binary trees, namely CART and C4.5 as implemented in the Splus "tree" procedure and in SAS 's Enterprise Miner respectively. For this purpose, data sets which traditionally enhance the weaknesses of classication trees are used. |