Skip to main content

Fuzzy sets & Fuzzy C-Means Clustering Algorithm

 All the classification algorithms that we studied in the past, most of them are based on soft-based computation. Soft based computation means as we are giving the prediction from algorithm that point j belongs to which set i ( where i can take any value from 0 to number of sets) but there is category also existed which we called fuzzy-based computation which basically means that the if there is point i and n sets it is not compulsory that it will belong to only one set but there is something called membership function which says that the point can belong to multiple sets. In nutshell, the output is basically the number which indicated the probability of the point to belong to a particular set.

Actually, if you can feel the difference between the simple set and the fuzzy set then you are on the right track otherwise in short note you can say simple set includes the elements which will belong to only one class but the fuzzy sets can contain the element which may belong to multiple sets. This rule is called fuzzy logic.

For e.g. let us take the example of the height of the person you can classify into two classes TALL or SHORT but now think in the terms of the

  1. Soft based computation: You will say that if the input is greater than some threshold than TALL otherwise short.
  2. Fuzzy based computation: If the input is very far from the threshold then we are very sure about the output But think like if there is input which is very near to threshold than practically you can not say it as TALL or SHORT then their fuzzy logic comes in the picture which will tell you about the probabilities. So the membership function will look like:
Image for post

So by now, I think you must be very clear about the fuzzy sets.

Before getting into the FCM algorithm let us understand one more thing called Fuzzy-c partition. The fuzzy-c partition is like to partition the set into C classes. Where the fuzzy-c partition is defined by tuple (µ, S) where µ represents the matrix of size (i*j) i is the number of points and j is the number of classes. The µ(i,j) represents the membership value of point i for the jth class. There are some rules that each class has to follow to be the class of the fuzzy-c partition.

The rules go likes:

  1. Every row in the matrix sums should to one.
  2. Every column in the matrix sums should less than the total number of points in the set.

So as we are done with setting up the base for going fully in the fuzzy c means algorithm. A little bit interesting fact about it is Fuzzy c-means (FCM) clustering was developed by J.C. Dunn in 1973, and improved by J.C. Bezdek in 1981.

Now the algorithm is similar to the k-means but it works differently.

  1. Choose a number of clusters.
  • Assign coefficients randomly to each data point for being in the clusters.
  • Repeat until the algorithm has converged.
  • Compute the centroid for each cluster.

So the fuzzy-c means algorithm will not overfit the data for clustering like the k-means algorithm it will mark the data point to multiple clusters instead of the one cluster which will be more helpful than giving the point to the one cluster.

Why we should use Fuzzy C instead of k means:

Image for post

The main reason it works equal to the k means algorithm but the only thing that differentiates it from the others is that it will be the best to differentiate the point which can be assigned to multiple clusters.

Image for post

I hope you guys learned from this and if not, please feel free to ask questions.

Thanks.

Comments

Popular posts from this blog

Random Forest and how it works

  Random Forest Random Forest is a Machine Learning Algorithm based on Decision Trees. Random forest works on the ensemble method which is very common these days. The ensemble method means that to make a decision collectively based on the decision trees. Actually, we make a prediction, not simply based on One Decision Tree, but by an unanimous Prediction, made by ‘ K’  Decision Trees. Why should we use There are four reasons why should we us e  the random forest algorithm. The one is that it can be used for both  classification and regression  businesses. Overfitting is one critical problem that may make the results worse, but for the Random Forest algorithm, if there are enough trees in the forest, the classifier  won’t overfit  the model. The third reason is the classifier of Random Forest can handle  missing values , and the last advantage is that the Random Forest classifier can be modeled for  categorical values. How does the Random...

DBSCAN Clustering Algorithm-with maths

  DBSCAN is a short-form of   D ensity- B ased   S patial   C lustering of   A pplications with   N oise. It is an unsupervised algorithm that will take the set of points and make them into some sets which have the same properties. It is based on the density-based clustering and it will mark the outliers also which do not lie in any of the cluster or set. There are some terms that we need to know before we proceed further for algorithm: Density Reachability A point “p” is said to be   density reachable from a point “q” if point “p” is within ε distance from point “q” and “q” has a sufficient number of points in its neighbors which are within distance ε. Density Connectivity A point “p” and “q” are said to be density connected if there exists a point “r” which has a sufficient number of points in its neighbors and both the points “p” and “q” is within the ε distance. This is a chaining process. So, if “q” is neighbor of “r”, “r” is neighbor of “s”, “s” ...

Neural Network theory and implementation for Regression

Introduction and background In this article, we are going to build the regression model from neural networks for predicting the price of a house based on the features. Here is the implementation and the theory behind it. The neural network is basically if you see is derived from the logistic regression, as we know that in the logistic regression: Formulae for Logistic Regression:  y = ax+b so for every node in each layer, we will apply it and after this output is from the activation function which will have the input from logistic regression and the output is output from the activation function. So now  w e will implement the neural  network  with 5 hidden layers. Implementation 1. Import the libraries which we will going to use 2. Import the dataset and check the types of the columns 3. Now build your training and test set from the dataset. 4. Now we have our data we will now make the model and I will describe to you how it will predict the price. Here we are making...