Nowadays, the digital world has changed a lot, and Machine Learning has become one of the most highlighted fields in technology. When you know that IBM is investing $2 billion in an AI center, then you should now imagine how the future will be.
What’s Machine Learning ?
Machine Learning or ML is one of the applications of Artificial Intelligence (AI). ML can learn by training to do some tasks and improve its experience without being explicitly programmed.
The simplest way to understand this is when your kid is in his first years and you tell him to wear his shoes properly. Sometimes the kid will wear them wrong, but he will learn by time to do it right.
That’s what machine learning is. It’s a computer algorithm that you train it on some data, then you make it predict another. The more data you train ML for, the better results you will get.
The Algorithm of ML
In 2017, a developer in Google Cloud Platform called Yufeng Gue defined the 7 steps of Machine Learning. These 7 steps are still working and referred to up till now. So, let’s know more about them.
The 7 steps are:
1- Data Collection
The first step is to collect the needed data to train your program in. In today’s world the data is very large. So, you don’t need to collect all the data just for the sake of collecting it. You should only collect data related to the project you’re working in.
How much data is needed to train a good model?
The short answer is “it depends”. There’s no golden rule as it depends on the type of the machine learning problem you’re working on.
Some useful resources to get data from are Kaggle, DrivenData, UCI, etc.
2- Data Preparation
Sometimes you will need to prepare your collected data before training your program on. Preparing includes removing duplicates, correcting errors, seeing if there’s missing values, and so on.
If you’re gathering data from more than one source, it’s better to randomize them. This step erases the effect of any particular order in which we collected our data.
The last thing you should do in this step is to split your data between the data needed to train your model and the data needed for testing and evaluation. You can determine the ratio between them and improve it by testing.
3- Choosing a model
There are a number of models in ML that you can train your data on. Every model has its specifications that should be met in order to get the most value of it. Choosing the wrong model may keep you apart from your desired results. So, choose it wisely.
This is a cheat-sheet map provided by Scikit-Learn to help you find the right model for your problem.
4- Train The Model
Most of the fun comes at this step during your model training on the data you prepared before. Training requires a lot of experiments to succeed. The goal of training is to make correct predictions as much as possible.
5- Evaluate the model
After training your model, you need to measure its performance. Here you choose some key performance indicators (KPIs) from your choice. For example the runtime, success ratio, etc.
The next step is to test the model on the test part of data you defined before to see the performance. You can try a variety of splits on your data to find the best match for your model.
6- Parameter Tuning
This step is also known as hyperparameter tuning. In this step we tune the parameters of the model in order to improve the performance. For example tuning the learning rate, training steps, and more.
7- Make Predictions
This is the step that we are waiting for. Our final step is to make the program predict given a set of data as the program is ready in this stage to do so. It’s likely the stage where you see your program do the task you identified.