2
$\begingroup$

I'm a bit confused about the activation function in the output layer of a neural network trained for regression. In most tutorials, the output layer uses "sigmoid" to bring the results back to a nice number between 0 and 1.

But in this beginner example on the TensorFlow website, the output layer has no activation function at all? Is this allowed? Wouldn't the result be a crazy number that's all over the place? Or maybe TensorFlow has a hidden default activation?

This code is from the example where you predict miles per gallon based on horsepower of a car.

// input layer
model.add(tf.layers.dense({inputShape: [1], units: 1}));

// hidden layer
model.add(tf.layers.dense({units: 50, activation: 'sigmoid'}));

// output layer - no activation needed ???
model.add(tf.layers.dense({units: 1}));
$\endgroup$

1 Answer 1

3
$\begingroup$

In regression, the goal is to approximate a function $f: \mathcal{I} \rightarrow \mathbb{R}$, so $f(x) \in \mathbb{R}$. In other words, in regression, you want to learn a function whose outputs can be any number, so not necessarily just a number in the range $[0, 1]$.

You use the sigmoid as the activation function of the output layer of a neural network, for example, when you want to interpret it as a probability. This is typically done when you are using the binary cross-entropy loss function, i.e. you are solving a binary classification problem (i.e. the output can either be one of two classes/labels).

By default, tf.keras.layers.Dense does not use any activation function, which means that the output of your neural network is indeed just a linear combination of the inputs from the previous layer. This should be fine for a regression problem.

$\endgroup$
2
  • $\begingroup$ Thanks for the explanation. But if the weights in a neural network can be any number, won't the output also vary wildly? Miles per gallon should be a value of around 20 to 100 for example, but if the random weight is 2.000.000, won't you get a crazy output? $\endgroup$
    – Kokodoko
    Commented Mar 11, 2021 at 11:03
  • 2
    $\begingroup$ @Kokodoko Yes, this can happen, but the weights should converge to some reasonable values, once you optimize the objective function. Moreover, you can also limit the weights and activations e.g. by using specific activation functions that squash the inputs to the neurons to certain ranges, which is your case (in that example, they are using the sigmoid in the hidden layer, which squashes the inputs to the range [0, 1]). You can also have regularization of the weights, to make them small. $\endgroup$
    – nbro
    Commented Mar 11, 2021 at 11:09

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .