Why the buzz around Overfitting and Underfitting ?

Sambit Chakraborty
8 min readFeb 15, 2022

Understanding Bias, Variance and its trend affecting the goodness of fit.

In search of the optimal model.

Fit of a model and Goodness of Fit ?

Fit can be defined as how close your model has approximated to the target function. In simple words , how well your model understands the trend of the data and the relationship between your features and target variable .

Goodness of fit is the measure of comparison between your approximated target function and the actual target function. If the gap is small the model is said to be a good fit and vice versa .

target function : y=3x +5

predicted function (f1) : y =3.3x + 5.9 -> Good Fit

predicted function (f2) : y = x² -> Poor fit

Whenever you start learning about data science or machine learning , overfitting and underfitting are the two words that pop up in every article you read , in every video you watch and the only thing that pops up in your mind is “Wasn’t learning about types of algorithms and models enough?”, “Do I really need to look into the fit of the model?” .

The answer may not be that soothing to your eyes but Yes! you really need to learn and understand what these two redundantly popping words mean in machine learning scenario .

Let’s start with a real-life example which might help you dive into the topic straightaway.

In a class there are 3 students Bill, Elon and Jeff . The teacher tells them about an upcoming practice test on maths to analyze where you stand before the final exams. He also gives a heads up that all the questions in the practice test will be picked from the exercises at the end of each chapters.

Bill decides to mug up all the solutions of exercise questions blindly without understanding the concepts. Elon being a nervous guy decides to learn the concepts thoroughly and then practice the exercise questions. While Jeff being the chill guy he is chooses not to study at all.

They appear for the practice test and when results are disclosed , Bill comes out top with 95% marks , followed by Elon at 80% and Jeff stuck at 30% . The guys follows the same routine of preparation for the final exams Bill still mugging up all the answers , Elon understanding everything that comes in his way though unable to cover each and every topic and Jeff still chilling .

When the final exams results came up everyone was shocked because Elon improved and reached the 85% mark , Jeff degraded to 20% but Bill was a bummer as his score drastically dropped down to only 50% .

Now back to business, if we consider the practice test as training data and final exams as the testing data then :

let’s define Overfitting and Underfitting .

Overfitting :

When a model learns the training data too closely and fits so well that it misses the real sense of the data and hence ends up performing poorly on the test data.

In our case : Bill is a perfect example of Overfitting model. He becomes so keen in mugging up the answers that he performs excellently in the practice test but since ignores the concepts , fails to do well in the final exams.

source : ML | Underfitting and Overfitting — GeeksforGeeks

Underfitting :

When a model isn’t capable enough to learn the relationships and trends in the training data thereby failing to generalize on testing data.

In our case : Jeff is a perfect example of Underfitting model. He is so chilled and ignorant of his studies that he learns nothing and hence gets doomed in both the tests.

source : ML | Underfitting and Overfitting — GeeksforGeeks

But to understand better in machine language context we need to take help of two more terminologies Bias and Variance .

Any machine learning model is said to display two set of errors :

  1. Irreducible Errors : The errors that stay within a machine learning prediction no matter how well you tune it , due to presence of some unknown features.
  2. Reducible Errors : The errors that you can tune to improve the performance of the model. Bias and Variance are types of reducible errors.

Bias

A model is allowed to go through the data and capture the relationship within the features of the data during training. Bias can be defined as the incapability of a model to understand or capture real trend of the training data . In simple words bias is the error on the training data. This means the higher the bias the higher your model drifts away from the data.

  • High Bias : When the model is unable to understand the data at all , it is said to have high bias .
Not Understanding what data is trying to infer .

In our case : Jeff is a case of high bias as he learns and hence gets poor marks in the practice test .

  • Low Bias : When the model starts capturing the trend of the data , it is said to have low bias .
Capturing the real sense of data

In our case : Bill and Elon are cases of high bias as they are able to learn most of the exercise questions and hence good in practise test .

Variance

When undergoing training on a model , if a model starts understanding the trend so well that it starts learning from the noise and fluctuations in data , then errors can be seen happening in the test data . This is called Variance . It means Variance is the error on the test data. The higher the variance the higher the difference in prediction on train data and test data .

  • High Variance : When the model learns the noise in the data so in depth that it eventually fails to work well on the testing data.
Tries to touch every point in training data

In our case : Bill is a case of high variance , as his scores in final exams were way too poor as compared to practice test because all he did was mug up answers.

  • Low Variance : When the model is able to overcome the noise and sticks to the real trend of the data then it is said to have low variance .
Not trying to touch every point in training data .

In our case : Jeff is a case of low variance as he learns nothing hence there is not much difference in final exams and practice test. Similary Elon also has a low variance because he learns the concepts well and hence performs very well and very similar in both the exams.

Bias — Variance Trade Off

You may come across the term Bias Variance Trade off a multiple times while learning about goodness of fit and the reason is Bias and Variance infers a lot about the fitting of a model on a data .

Low Bias and High Variance:

Low Bias suggests that the model has performed very well in training data while High Variance suggests that his test perfomance was extremely poor as compared to the training performance . This is a clear cut case of OVERFITTING .

High Bias and Low Variance:

High Bias suggests that the model has failed to perform when given training data which means it has no knowledge of data hence it is expected to perform poorly in test data as well hence the Low Variance. This leads to UNDERFITTING .

So the big question that is going to bug your mind is

“Aren’t we ever going to get the best fit model?”

Oh yes ! we are . We need to play a trick here and let go off our winning emotions . What I mean is that we make a deal between these two highly incompatible features .We decide on keeping both the bias and the variance low. That’s how we get a decent fit on a training data that gets us a good result on the test data .

In our case : Elon compares to a best fit model because he learns the concepts and depths of the topic . Though he doesnt excel in practice test like Bill yet he improves in the final exams and gets a very decent result .

Yayyy ! Guys so let’s congratulate Elon . He is our winner it seems !

Winner Winner Chicken Dinner !

How to deal with overfitting and underfitting is a bigger scope of discussion and we will be discussing in the next article.

Conclusion

  • Bias : Error on training data . Model is too simple to understand the trend.
  • Variance : Error on testing data . Model is too complex that even learns the noise .
  • High Bias and Low Variance results in UNDERFITTING.
  • Low Bias and High Variance results in OVERFITTING.
  • Bias Variance trade off is the compromise to keep both Low Bias and Low Variance that results in BEST FIT .
Survival of the fittest !

I hope this article helps you to some extent in understanding Overfitting and Underfitting through a layman’s eyes . If you feel this deserves a clap do consider giving one. This is my first ever story here so I would really like some suggestions on how can document my learnings even better . Thank you !

--

--

Sambit Chakraborty

Ex-Data Science Intern @Razorpay | Data Scientist @Arcesium | In love with the unpredictability of the predictable machine learning .