Decision Tree Classification
A Decision Tree Classification is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is the root node.
The following decision tree is for the concept buy_computer that indicates whether a customer at a company is likely to buy a computer or not.
Also, Each internal node represents a test on an attribute.
Each leaf node represents a class.
The benefits of having a Decision Tree Classification are as follows:
It does not require any domain knowledge. o It is easy to comprehend.
Also, The learning and classification steps of a decision tree are simple and fast.
A basic algorithm for learning decision trees is as below.
During tree construction, attribute selection measures are used to select the attribute that best partitions the tuples into distinct classes.
Also, When decision trees are built, many of the branches may reflect noise or outliers in the training data.
Tree pruning attempts to identify and remove such branches, with the goal of improving classification accuracy on unseen data.
Algorithm: Generate decision tree. Generate a decision tree from the training tuples of data partition D.
Data partition, D, which is a set of training tuples and their associated class labels; attribute list, the set of candidate attributes;
Attribute selection method, a procedure to determine the splitting criterion that “best” partitions the data tuples into individual classes. This criterion consists of a splitting attribute and, possibly, either a split point or splitting subset.
Output: A decision tree.
Method: Decision Tree Classification
create a node N;
if tuples in D are all of the same class, C then
return N as a leaf node labeled with the class C;
if attribute list is empty then
return N as a leaf node labeled with the majority class in D; // majority voting
apply Attribute selection method(D, attribute list) to find the “best” splitting criterion;
label node N with the splitting criterion;
if a splitting attribute a discrete-valued and multi way splits allowed then // not restricted to binary trees
attribute list attribute list splitting attribute; // remove a splitting attribute
for each outcome j of splitting criterion // partition the tuples and grow subtrees for each partition
let Dj be the set of data tuples in D satisfying outcome j; // a partition
if Dj is empty then
also, attach a leaf labeled with the majority class in D to node N;
else attach the node returned by Generate decision tree(Dj, attribute list) to node N; end for return N;