Underfitting or Overfitting? Diagnosing Neural Networks

Neural Networks, inspired by the biological processing of neurons, are being extensively used in Artificial Intelligence. However, obtaining a model that gives high accuracy can pose a challenge.
There can be two reasons for high errors on test set, overfitting and underfitting but what are these and how to know which one is it!
Before we dive into overfitting and underfitting, let us have a look at few relevant terms that we would use
Training set: It is the set of all the instances from which the model learns.
Test set: It is the set of instances which have not been seen by the model during the learning phase.
Model: It is the function obtained after training.
Training error: It is the error of the model on the dataset that is used to train it.
Test error: It is the error of the trained model on the test set.
Generalization error: It is the error of the trained model on the entire space of possible data. This is practically not possible, so we use test data as a subset of this space.
Learning Curve : Learning curve is a graphical representation of model performance over time. In simpler words, a plot of error with respect to experience or epochs. It is widely used as a diagnostic tool to determine what isn’t working well in a model.
When do we call it Underfitting?
Underfitting happens when a model doesn’t perform well on training data as well as test data. An underfit model is too simple to even fit the training set.
To see if a model is underfitting, plot the learning curve. It might show a flat line of relatively high error, telling that the model couldn’t learn the training dataset at all.
To cater this problem, one or more of the following can be tried.
- The number of relevant features can be increased.
- The complexity of the network can be changed by increasing the number of neurons/layers in the network.
An underfit model can also be identified if error is decreasing till the end of the plot, showing that the model could have learned further but training process was stopped before time
To cater this problem, the training time/ number of epochs can be increased.
A small neural network is computationally cheaper since it has fewer parameters, therefore one might be inclined to choose a simpler architecture. However, that is what makes it more prone to underfitting too.
When do we call it Overfitting:
Overfitting happens when a model performs well on training data but not on test data. An overfit model has low training error but high test error i.e. poor generalization. This is because the model is memorizing the training data instead of learning from it.
To understand this better, let’s take an example of a student who has to solve some mathematics problems. Instead of learning how to solve the problem, he/she crams or memorizes the practice questions. This would enable him/her to solve the practice questions easily but his/her performance in the unseen questions asked in the test would be poor. Mapping it to our scenario, the practice questions are the instances of the training set while unseen questions are the instances in the test set.
To see if a model is overfitting, plot the learning curve. If the training error continues to decrease with the increase in epochs and test error decreases to a point but starts to increase again, then the model is overfit.
To solve this issue, one or more of the following can be tried.
- Decreasing the complexity of the network i.e. for this regularization can be used instead of decreasing the number of neurons/layers in the network and changing the whole architecture.
- L1 or L2 regularization can be introduced. L1 regularization tends to force the weight of parameters to be zero while L2 forces it to be near zero but not exactly zero.
- Drop-out layer can be added. In drop-out method, contribution of some randomly chosen neurons is ignored or not carried forward during training, thus reducing the complexity of the network.
- The training time/ number of epochs can be decreased.
- The number of instances in training data can be increased.
Large neural networks have more parameters, which is what makes them more prone to overfitting. This also makes them computationally expensive as compared to small networks.
Good Fit Model:
A good fit model is a soft spot between underfit model and overfit model. It performs well on both training data as well as the test data.
However, finding the good fit model can be a difficult task when one doesn’t know where to start from. Using Learning curves provides you with a direction rather than trying different options that might even worsen the performance of the network. Hopefully this article will help you the next time you try to diagnose a neural network.