How to be a HERO in Machine Learning/Data Science Competitions

At present to master machine learning models one has to participate in the competition which is appearing in various platforms. So how somebody who is new to ml can become a hero from zero. The guideline is in this article.

The idea for this is not too hard. Just patience and some hard work are required. I will take an example of a Competition that is just finished within top 10. So the competition generally gives you the problem in which some of the features are hidden because they want you to explore the data and come up with the feature that explains the target value. By exploring I mean to say the few things:

Look at the data. Get the sense of the data.
Find the correlation of all features with a target value.
Try new features made up of existing features.

Exploration needs some cleaning of the data also. Because in general, the host will add the noise into the data so that it becomes a trouble for us to achieve good accuracy. By cleaning I mean

Trading with NaN values.
Find and remove outliers from training.

One of the most essential steps is to select the features for training. To select the features one should know the correlation and how to generate the new features which are extremely impactful. These features can be mean of some features, result by adding some features, etc. There are many ways people can look at this.

Then the last part but not the least is to select the model. By selecting the model not means just take some model and train it. But the most important part is to train the model with some good value of its parameters. Because I saw that a little tweak in parameters will let you achieve high accuracy.

For beginners:
1- Start learning the concepts of math and statistics.
2- Learn programming tools for data science either R or Python.
3- Learn Machine learning algorithms (part 1)

And here are the resources for the previous list:
Descriptive Statistics from Udacity: https://www.udacity.com/course/intro-to-descriptive-statistics--ud827
A good statistics book (Optional): http://onlinestatbook.com/2/index.html
Introduction to Probabilities course from EDx: https://www.edx.org/course/introduction-probability-science-mitx-6-041x-2
Introduction to Probability book: https://www.stat.berkeley.edu/~aldous/134/grinstead.pdf
Intro to Inferential Statistics course from Udacity: https://www.udacity.com/course/intro-to-inferential-statistics--ud201

For professionals:
1- Learn the concepts of Deep learning and build at least one project.
2- Learn Data Visualization

Recommended Resources

Documentation! Pandas and Scikit-Learn both have extensive documentation with many interesting discussions.
For more information about Kaggle's success stories, I recommend the Kaggle blog, where they frequently interview competition winners regarding their approach and methods.
For detailed summaries of DataFrames, I recommend checking out pandas-summary and pandas-profiling.
For an excellent explanation of more advanced Random Forest usage, I recommend the Intuitive Interpretation of Random Forest.
Reddit for everything related: r/machinelearning, r/learnmachinelearning, r/datascience.

Here is a glimpse of the result that I achieved after following these steps.

I hope these procedures will help you. Let me get a clap if this helps you in some way.

Mindful Machines

Search This Blog

How to be a HERO in Machine Learning/Data Science Competitions

Recommended Resources

Labels

Comments

Post a Comment

Popular posts from this blog

Random Forest and how it works

DBSCAN Clustering Algorithm-with maths

Neural Network theory and implementation for Regression