Questions tagged [gradient-descent]
Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.
gradient-descent
1,472
questions
0
votes
1
answer
29
views
Pytorch, use loss that don't return gradient
I'm trying to develop a model that improves the quality of a given audio. For this task I use DAC for the latent space and I run a transformer model to change the value of the latent space to improve ...
0
votes
0
answers
32
views
Gradient Vanishing when training LSTM with pytorch
I was training a simple LSTM neural network with pytorch to predict stock price. And it is confusing to me that my network wouldn't fit. The loss is exploding and the r2 is negative. As the training ...
0
votes
0
answers
21
views
Why does the actor_gradients calculate as [None, None, None, None]?
I'm trying to train a RL agent with DDPG policy to solve an Pendulum problem. An issue occurs when the policy attempts to train the parameter with the optimizers.Adam.apply_gradients. This is because ...
1
vote
0
answers
29
views
Calculating variance of gradient of barren plateau problem in quantum variational circuit
In paper Cost function dependent barren plateaus in shallow
parametrized quantum circuits, the author exhibit an warm-up example in page 2 to show the barren plateau phenomenon. In this example, the ...
2
votes
1
answer
31
views
Torch.unique() alternatives that do not break gradient flow?
In a Pytorch gradient descent algorithm, the function
def TShentropy(wf):
unique_elements, counts = wf.unique(return_counts=True)
entrsum = 0
for x in counts:
p = x/len_a #...
0
votes
0
answers
26
views
How do I code Gradient Descent over a discrete Probability Function in Pytorch?
I am trying to code a gradient descent algorithm to minimize the Shannon entropy of a convolution between a 1D array X and a smaller 1D array A, where the parameters to optimize for are the entries of ...
1
vote
2
answers
128
views
Minimizing Euclidean Norm with Gradient Descent
I'm trying to find a solution for a system of linear equations using Gradient Descent Method ∥Ax-b∥^2 in Python.
The linear equations are:
x - 2y + 3z = - 1
3x + 2y - 5z = 3
2x - 5y + 2z = 0
The ...
2
votes
1
answer
36
views
Cost Function Increases, Then Stops Growing
I understand the zig-zag nature of the cost function when applying gradient descent, but what bothers me is that the cost started out at a low 300 only to increase to 1600 in the end.
The cost ...
0
votes
0
answers
30
views
Is the given code for gradient descent updating the paraments sequentially or simultaneously?
I'm new to machine learning and I have been learning gradient descent algorithm. I believe this code uses simultaneous update, even though it looks like sequential update. Since the values of partial ...
0
votes
0
answers
17
views
Gradient accumulation loss compute
Suppose we have data [b,s,dim], I recently noticed that CrossEntropyLoss is (1) computed the average on all tokens (b * s) in a batch instead of (2) computing on each sentence and then compute the ...
0
votes
0
answers
14
views
derive tensor after using where function
I implemented the following function
def t_asy(self, data, beta: float):
power = 1 + (beta * torch.linspace(0, 1, data.shape[-1], device=data.device)) * data.sqrt()
...
0
votes
0
answers
26
views
Gradient Descent Logistic Regression and Covariate Scaling
I'm trying to understand logistic regression and gradient descent. How hard can it be, right? Well, I used the example from this website
mydata <- read.csv("https://stats.idre.ucla.edu/stat/...
0
votes
0
answers
12
views
Training RL model with TF over all the output vector
I'm training a deep RL model with TensorFlow, but my model doesn't have a single correct action. The output of the network is a vector [x1, x2], and both are actions that need to be optimized.
def ...
0
votes
0
answers
26
views
Why is the matrix transposed when calculating the gradient in a multiple linear regression?
I am taking an online machine learning course and when talking about multivariable linear regression they used the following function to calculate the gradient:
def gradient(X, Y, w):
return 2 * np....
1
vote
0
answers
31
views
MINST Image Classification Gradient Descent Neural Network not working
I have to files PreProcess.java:
/*
* 4/28/24
* Final
*/
package Final;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io....