Why is no activation function needed for the output layer of a neural network for regression?

Question

I'm a bit confused about the activation function in the output layer of a neural network trained for regression. In most tutorials, the output layer uses "sigmoid" to bring the results back to a nice number between 0 and 1.

But in this beginner example on the TensorFlow website, the output layer has no activation function at all? Is this allowed? Wouldn't the result be a crazy number that's all over the place? Or maybe TensorFlow has a hidden default activation?

This code is from the example where you predict miles per gallon based on horsepower of a car.

// input layer
model.add(tf.layers.dense({inputShape: [1], units: 1}));

// hidden layer
model.add(tf.layers.dense({units: 50, activation: 'sigmoid'}));

// output layer - no activation needed ???
model.add(tf.layers.dense({units: 1}));

nbro · Accepted Answer · 2021-03-11 10:20:33Z

3

In regression, the goal is to approximate a function $f: \mathcal{I} \rightarrow \mathbb{R}$, so $f(x) \in \mathbb{R}$. In other words, in regression, you want to learn a function whose outputs can be any number, so not necessarily just a number in the range $[0, 1]$.

You use the sigmoid as the activation function of the output layer of a neural network, for example, when you want to interpret it as a probability. This is typically done when you are using the binary cross-entropy loss function, i.e. you are solving a binary classification problem (i.e. the output can either be one of two classes/labels).

By default, tf.keras.layers.Dense does not use any activation function, which means that the output of your neural network is indeed just a linear combination of the inputs from the previous layer. This should be fine for a regression problem.

answered Mar 11, 2021 at 10:20

nbro

41k12 gold badges111 silver badges196 bronze badges

$\begingroup$ Thanks for the explanation. But if the weights in a neural network can be any number, won't the output also vary wildly? Miles per gallon should be a value of around 20 to 100 for example, but if the random weight is 2.000.000, won't you get a crazy output? $\endgroup$
– Kokodoko
Commented Mar 11, 2021 at 11:03
2

$\begingroup$ @Kokodoko Yes, this can happen, but the weights should converge to some reasonable values, once you optimize the objective function. Moreover, you can also limit the weights and activations e.g. by using specific activation functions that squash the inputs to the neurons to certain ranges, which is your case (in that example, they are using the sigmoid in the hidden layer, which squashes the inputs to the range [0, 1]). You can also have regularization of the weights, to make them small. $\endgroup$
– nbro
Commented Mar 11, 2021 at 11:09

Add a comment |

Stack Exchange Network

Why is no activation function needed for the output layer of a neural network for regression?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
neural-networks
tensorflow
activation-functions
regression
.

Hot Network Questions

Why is no activation function needed for the output layer of a neural network for regression?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged neural-networkstensorflowactivation-functionsregression.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
neural-networks
tensorflow
activation-functions
regression
.