Skip to main content

All Questions

1 vote
1 answer
45 views

In a Computational Graph, how to calculate the total upstream gradient of a node with multiple upstreams?

Given a Computation Graph with a node (like the one below), I understand that I can use the upstream gradient dL/dz to calculate all of my downstream gradients. But what if there are multiple ...
Ibrahim's user avatar
  • 111
1 vote
2 answers
226 views

Gradient Descent: Is the magnitude in Gradient Vectors arbitrary?

I am only just getting familiar with gradient descent through learning logistic regression. I understand the directional component in the gradient vectors is correct information derived from the slope ...
MrHunda's user avatar
  • 11
0 votes
0 answers
45 views

Gradients of lower layers of a CNN when gradient of an upper layer is 0?

Say we have a convolutional neural network with an input layer, 3 convolutional layers and an output layer. Say the gradients with respect to the weights and biases of the third convolutional layer ...
VJ123's user avatar
  • 147
2 votes
1 answer
407 views

Gradients of lower layers of NN when gradient of an upper layer is 0?

Say we have a neural network with an input layer, a hidden layer and an output layer. Say the gradients with respect to the weights and biases of the output layer are all 0. Then, by backpropagation ...
VJ123's user avatar
  • 147
0 votes
0 answers
47 views

Backpropagation and Gradient Descent: Questions on math behind it

I watched this video which goes over backpropagation calculus and read the Wikipedia page on it. This is my understanding of the equations for the algorithm. I have questions regarding the equations ...
notaorb's user avatar
  • 101
1 vote
1 answer
55 views

Doubt in gradient , vanishing gradient problem in Back propagation

As per my knowledge, in back propagation- loss function or gradient is used to update the weights. in back propagation, weights became small w.r.t gradients, this leads to vanishing gradient problem. ...
tovijayak's user avatar
2 votes
2 answers
6k views

What exactly is Gradient norm?

I found that there is no common resource and well defined definition for "Gradient norm", most search results are based on ML experts providing answers which involves gradient norm or papers ...
StudentV's user avatar
0 votes
1 answer
168 views

Affine layer - gradient shape

In course cs231n, I need to implement backward pass computation for an affine (linear) layer: ...
Ben Lahav's user avatar
0 votes
1 answer
117 views

GAN Generator Backpropagation Gradient Shape Doesn't Match

In the TensorFlow example (https://www.tensorflow.org/tutorials/generative/dcgan#the_discriminator) the discriminator has a single output neuron (assume batch_size=1). Then over in the training loop ...
rkuang25's user avatar
0 votes
0 answers
94 views

Why backpropagation is done in every epoch when loss is always scalar?

I understand the backpropagation algorithm that it calculates the derivate of loss with respect to all the parameters in the neural network. My question is this derivate is constant right because the ...
Jeet's user avatar
  • 101
0 votes
0 answers
58 views

Question about input value in Gradient descent

I am currently going through Udacity´s online course "Intro to Deep Learning with PyTorch". In one of the videos covering the Gradient descent algorithm they show the formula for how the ...
Leo's user avatar
  • 1
2 votes
0 answers
104 views

Can I find the input that maximises the output of a Neural Network?

So I trained a 2 layer Neural Network for a regression problem that takes $D$ features $(x_1,...,x_D)$ and outputs a real value $y$. With the model already trained (weights optimised, fixed), can I ...
puradrogasincortar's user avatar
0 votes
0 answers
559 views

Proof that averaging weights is equal to averaging gradients (FedSGD vs FedAvg)

The first paper of Federated Learning "Communication-Efficient Learning of Deep Networks from Decentralized Data" presents FedSGD and FedAvg. In Federated Learning the learning task is ...
CasellaJr's user avatar
  • 229
0 votes
0 answers
145 views

calculating derivative of bias in backpropagation

Looking at the algorithm in wikipedia, we can implement backpropagation by calculating: $$\delta^{L}=\left(f^{L}\right)'\cdot\nabla_{a^{L}}C$$ (where I treat $\left(f^{L}\right)'$ as an $n\times n$ ...
Ariel Yael's user avatar
2 votes
1 answer
751 views

How does gradient descent avoid local minimums?

In Neural Networks and Deep Learning, the gradient descent algorithm is described as going on the opposite direction of the gradient. Link to place in book. What prevents this strategy from landing in ...
Foobar's user avatar
  • 125

15 30 50 per page
1
2 3 4 5
9