I read the Microsoft blog entitled ‘How to evaluate model performance in Azure Machine Learning‘. It’s a nice piece of work, and it got me thinking. I didn’t see that the blog post contained anything about neural network evaluation, so this topic is covered here.
How do you evaluate the performance of a Neural Network? This blog focuses on Neural Networks in AzureML, in order to help you to understand what they mean.
What are Neural Networks?
Would you like to know how to make predictions from a dataset? Alternatively, would you like to find exceptions, or outliers, that you need to watch out for? Neural networks are used in business to answer the business questions. They are used to make predictions from a dataset, or to find unusual patterns. They are best used for regression or classification business problems.
What are the different types of Neural Networks?
I’m going to credit the Asimov Institute with this amazing diagram:
In AzureML, we can review the output from a neural network experiment that we created previously. We can see the results by clicking on the Evaluation Model task, and clicking on the Visualise option.
Once we click on Visualise, we can see a number of charts, which are described here:
- Receiver Operating Curve
- Precision / Recall
- Lift visualization
The Receiver Operating Curve
Here is an example:
In our example, we can see that the curve well up into the left hand corner for the ROC curve. When we look on the precision and recall curve, we can see that precision and recall are high figures, and this leads to a high F1 score. This means that the model is effective in terms of how precisely it classifies the data, and that it covers a good proportion of the cases that it should have classified correctly.
Precision and Recall
Precision and recall are very useful for assessing models in terms of business questions. They offer more detail and insights into the model’s performance. Here is an example:
Precision can be described as the fraction of times that the model classifies the number of cases correctly. It can be considered as a measure of confirmation, and it indicates how often the model is correct. Recall is a measure of utility, which means that it identifies how much that the model finds of all that there is to find within the search space. Both scores combine to make the F1 score. The F1 score combines Precision and Recall. If either precision and recall are small, then the F1 score value will be small.
A Lift Chart visually represents the improvement that a model provides when compared against a random guess.This is called a lift score. With a lift chart, you can compare the accuracy of predictions for the models that have the same predictable attribute.
In my next blog, I’ll talk a little about how we can make the Neural Network perform better.
To summarise, we have examined various key metrics in evaluating a neural network in AzureML. These scores also apply to other technologies, such as R.
These criteria can help us to evaluate our models, which, in turn, can help us to fundamentally evaluate our business questions. Understanding the numbers helps to drive the business forward, and visualizing these numbers helps to convey the message of the numbers.