Skip to main content

Decision Tree for classification with example and why or why not we use them.

 

What is a Decision Tree

A Decision Tree is a type of supervising algorithm which indicates that in the training set you know the input variable ( features ) and their corresponding target. Decision Tree can operate as both classifiers or regressors. In this method, we split the dataset into two or more uniform sets based on the most significant splitter/differentiator in the dataset features.

Image for post

Example:-

Let’s say we have a sample of 50 students with three variables Gender (Boy/ Girl), Class( X/ XI) and Height (5 to 6 ft). 20 out of these 50 play cricket in rest time. Suppose you want to find on unknown dataset which contains all the features(Gender, class, height) that he/she will play or not in rest time.

This is where decision tree supports, it will separate the students based on all values of three variable and identify the variable, which creates the best uniform sets of students like below:

Image for post

Terms you should know

  1. Root Node: It describes the complete dataset and this additional gets split into two or more uniform sets.
  2. Splitting: It is a process of dividing a node into two or more sub-nodes.
  3. Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.
  4. Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.

Why we should use

  1. Clear to Understand quickly.
  2. Beneficial in Data exploration.
  3. Less data cleaning required.
  4. Data type is not a constraint.

Why not

  1. It may lead you to an overfitting problem.
  2. Not a useful approach for continuous variables.

Feel free to share your knowledge, suggestions, and opinions in the comments section below.

Comments

Popular posts from this blog

NEW TREND OF DATA SCIENCE: REINFORCEMENT LEARNING

Reinforcement Learning (RL) is a machine learning method that empowers a specialist to learn in an intuitive environment by performing trial and error utilizing observations from its very own activities and encounters. In spite of the fact that both direct and reinforcement learning use mapping among input and output, not at all like supervised learning where input gave to the specialist is basically the right set of activities for playing out a task, reinforcement learning utilizes prizes and discipline as signs for positive and negative conduct. When compared with unsupervised learning, reinforcement learning is distinctive as far as objectives are taken into consideration. While the objective in unsupervised learning is to discover synonymities and contrasts between data points, in reinforcement learning the objective is to locate a reasonable activity model that would boost the aggregate total reward of the specialist. Reinforcement learning will be a huge thing in Data science in ...