Skip to main content

All Questions

0 votes
0 answers
37 views

Why do we use the RELU activation function?

I reading about activation functions in feedforward neural networks. ad a really old paper https://web.njit.edu/~usman/courses/cs677_spring21/hornik-nn-1991.pdf. They prove that by using arbitrary ...
timmy1691's user avatar
1 vote
0 answers
18 views

What is a proper activation function with simulated-annealing trainer for neural network?

I'm developing a gpu-accelerated simulated annealing based neural network trainer library. Currently its stuck on how to converge on "array sorting by neural network 3:10:20:10:3 topology". ...
huseyin tugrul buyukisik's user avatar
0 votes
0 answers
129 views

Alternative to ELU and Leaky ReLU?

I was talking with a friend about different activation functions (we are still new to ML). One thing that I didn't like about ELU was the vanishing gradient, and about Leaky ReLU that it's not ...
Nasa's user avatar
  • 1
0 votes
1 answer
787 views

Whats the advantage of He Intialization over Xavier Intialization?

For Weights initialization, I read that He doesn't consider linear activation of neurons as Xavier Initialization; in this context, what does linear initialization mean?
Carpediem's user avatar
0 votes
1 answer
40 views

Is it possible to tell if one activation function is better than the other one based on their graphs?

I am attempting to formulate my own activation function. However, I'm new to neural networks, am not yet ready to test it, but would want to know if I already landed on a better activation function ...
jwho's user avatar
  • 3
2 votes
1 answer
59 views

Question about non linearity of activation function

I have a basic question about activation functions. It is told that they are added to the network to introduce non linearity. However, the neural network itself is non linear. Isn' it? If we see any ...
Sandeep Bhutani's user avatar
0 votes
1 answer
303 views

Training deep neural networks with ReLU output layer for verification

Most algorithms for verification of deep neural network require ReLU activation functions in each layer (e.g. Reluplex). I have a binary classification task with classes 0 and 1. The main problem I ...
alext90's user avatar
3 votes
0 answers
159 views

Intuitively, why do Non-monotonic Activations Work?

The swish/SiLU activation is very popular, and many would argue it has dethroned ReLU. However, it is non-monotonic, which seems to go against popular intuition (at least on this site: example 1, ...
Jason's user avatar
  • 53
0 votes
1 answer
18 views

Activation Functions in Haykins Neural Networks a comprehensive foundation

In Haykins Neural Network a comprehensive foundation, the piecwise-linear funtion is one of the described activation functions. It is described with: The corresponding shown plot is I don't really ...
DerWolferl's user avatar
0 votes
1 answer
354 views

Why does using tanh worsen accuracy so much?

I was testing how different hyperparameters would change the output of my multilayer perceptron for a regression problem ...
SGfrmthe33's user avatar
3 votes
4 answers
1k views

Neural Network not Deep

I have found this image link I would like to know what NNs are not deep neural? The first three? Also what kind of functional activations do they use?
Inuraghe's user avatar
  • 481
2 votes
1 answer
555 views

How to prove Softmax Numerical Stability?

I was playing around with the softmax function and tried around with the numerical stability of softmax. If we increase the exponent in the numerator and denominator with the same value, the output of ...
Nicoinlas's user avatar
0 votes
0 answers
28 views

What happens if you don't include any activation function on hidden classification layers?

What happens if we don't apply an activation function to the classification hidden layers and apply it only for the final output layer (Sigmoid, Softmax)? I'm asking this because I have trained a CNN ...
Valderas's user avatar
0 votes
2 answers
210 views

Activation and Loss Function not chosen correctly when use Neural Network

I have three classes for my text dataset before. These are my classes: 0 = Cat 1 = Not Both 2 = Dog Then I use this code: ...
grace's user avatar
  • 13
2 votes
0 answers
87 views

Derive backpropagation for PreLU

I want to derive the back propagation functions for the Parametric Relu activation function which is defined as follows: $$ h_a(x) = \text{max}(ax, x) $$ I want to derive $ \frac{\partial L}{\partial ...
Casper's user avatar
  • 21

15 30 50 per page
1
2 3 4 5 6