Skip to main content

Random Forest and how it works

 

Random Forest

Random Forest is a Machine Learning Algorithm based on Decision Trees. Random forest works on the ensemble method which is very common these days. The ensemble method means that to make a decision collectively based on the decision trees. Actually, we make a prediction, not simply based on One Decision Tree, but by an unanimous Prediction, made by ‘K’ Decision Trees.

Why should we use

There are four reasons why should we use the random forest algorithm. The one is that it can be used for both classification and regression businesses. Overfitting is one critical problem that may make the results worse, but for the Random Forest algorithm, if there are enough trees in the forest, the classifier won’t overfit the model. The third reason is the classifier of Random Forest can handle missing values, and the last advantage is that the Random Forest classifier can be modeled for categorical values.

How does the Random Forest algorithm work

There are two stages in the Random Forest algorithm, one is random forest creation, the other is to make a prediction from the random forest classifier created in the first stage. The whole process is shown below, and it’s easy to understand using the figure.

In the first stage, we will build the random forest:

  1. Randomly select “K” features from total “m” features where k << m
  2. Among the “K” features, calculate the node “d” using the best split point
  3. Split the node into daughter nodes using the best split
  4. Repeat the a to c steps until “l” number of nodes has been reached
  5. Build forest by repeating steps a to d for “n” number times to create “n” number of trees

In the next stage, with the random forest created, we will make the prediction. The random forest prediction pseudocode is shown below:

  1. Takes the test features and use the rules of each randomly created decision tree to predict the outcome and stores the predicted outcome (target)
  2. Calculate the votes for each predicted target
  3. Consider the high voted predicted target as the final prediction from the random forest algorithm

The process is easy to understand, but it’s somehow efficient.

Where should we use

The random forest can be used in the: Banking, Medicine, Stock Market and E-commerce:

  • For the application in banking, the Random Forest algorithm is used to find loyal customers.
  • For the application in medicine, the Random Forest algorithm can be used to both identify the correct combination of components in medicine.
  • For the application in the stock market, the Random Forest algorithm can be used to identify a stock’s behavior and the expected loss or profit.
  • For the application in e-commerce, the Random Forest algorithm can be used for predicting whether the customer will like the recommend products, based on the experience of similar customers.

Feel free to share your knowledge, suggestions, and opinions in the comments section below.

Comments

Post a Comment

Popular posts from this blog

DBSCAN Clustering Algorithm-with maths

  DBSCAN is a short-form of   D ensity- B ased   S patial   C lustering of   A pplications with   N oise. It is an unsupervised algorithm that will take the set of points and make them into some sets which have the same properties. It is based on the density-based clustering and it will mark the outliers also which do not lie in any of the cluster or set. There are some terms that we need to know before we proceed further for algorithm: Density Reachability A point “p” is said to be   density reachable from a point “q” if point “p” is within ε distance from point “q” and “q” has a sufficient number of points in its neighbors which are within distance ε. Density Connectivity A point “p” and “q” are said to be density connected if there exists a point “r” which has a sufficient number of points in its neighbors and both the points “p” and “q” is within the ε distance. This is a chaining process. So, if “q” is neighbor of “r”, “r” is neighbor of “s”, “s” ...

Neural Network theory and implementation for Regression

Introduction and background In this article, we are going to build the regression model from neural networks for predicting the price of a house based on the features. Here is the implementation and the theory behind it. The neural network is basically if you see is derived from the logistic regression, as we know that in the logistic regression: Formulae for Logistic Regression:  y = ax+b so for every node in each layer, we will apply it and after this output is from the activation function which will have the input from logistic regression and the output is output from the activation function. So now  w e will implement the neural  network  with 5 hidden layers. Implementation 1. Import the libraries which we will going to use 2. Import the dataset and check the types of the columns 3. Now build your training and test set from the dataset. 4. Now we have our data we will now make the model and I will describe to you how it will predict the price. Here we are making...