Why is loss displayed as a parabola in mean squared error with gradient descent?

Question

I'm looking at the loss function: mean squared error with gradient descent in machine learning. I'm building a single-neuron network (perceptron) that outputs a linear number. For example:

Input * Weight + Bias > linear activation > output.

Let's say the output is 40 while I expect the number 20. That means the loss function has to correct the weights+bias from 40 towards 20.

What I don't understand about mean squared error + gradient descent is: why is this number 40 displayed as a point on a parabola?

Does this parabola represent all possible outcomes? Why isn't it just a line? How do I know where on the parabola the point "40" is?

Are you using a specific loss function here? Please use edit to explain which one. I expect it is MSE, and if so that should cover everything needed to answer your question — Neil Slater, Commented Apr 18, 2021 at 9:38
@Kostya: Yes, but you probably would not draw a parabola for e.g. $\mathcal{L}(\hat{y}, y) = |\hat{y} - y|$ — Neil Slater, Commented Apr 18, 2021 at 12:02
I suspect the illustration isn't meant to be taken literally; instead, I suspect the author is intending to illustrate that gradient descent attempts to solve the minimization problem by moving downward toward what is (hopefully) a global minimum on some complex surface. — David Hoelzer, Commented Apr 19, 2021 at 14:07
I added "mean squared error" to the question for clarity. The question is still why would all the possible "wrong" loss values happen to be points on a parabola (say, 40, 80, 14... all wrong values)... That connection is not explained in most tutorials. — Kokodoko, Commented Apr 19, 2021 at 21:39

SmarArror · Accepted Answer · 2021-04-20 12:55:37Z

1

Mean Square Error (MSE) is a quadratic function and the further you go away from your optimum the bigger (quadratic) the MSE gets. Take $o_{expected}=20$ and $o_{net}=40$ as example. Your MSE is then 400, because $MSE = (o_{expected}-o_{net})^2$.

Just imagine $y = x^2$ with x being the output of your network. If you want to shift the parabola with optimum at $20$ the formula you get is $y = (20-x)^2$. For every new case you train the net on, you get a different parabola with different parameters.

edited Apr 20, 2021 at 12:55

answered Apr 20, 2021 at 6:40

SmarArror

263 bronze badges

Add a comment |

Stack Exchange Network

Why is loss displayed as a parabola in mean squared error with gradient descent?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
objective-functions
gradient-descent
.

Hot Network Questions

Why is loss displayed as a parabola in mean squared error with gradient descent?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged objective-functionsgradient-descent.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
objective-functions
gradient-descent
.