Skip to main content

All Questions

0 votes
0 answers
9 views

How to handle sequences with crossEntropyLoss

fist of all i am ne wto the whole thing, so sorry if this is superdumb. I'm currently training a Transformer model for a sequence classification task using CrossEntropyLoss. My input tensor has the ...
Tobias's user avatar
  • 101
1 vote
1 answer
50 views

Does using different optimizer change the loss landscape

I plot the landscape using this code, and I notice the landscape shape has changed a lot. My understanding is that the optimizer does not change the loss landscape. But now I'm confused if its just ...
user836026's user avatar
0 votes
0 answers
14 views

How to combine a classificiation dataset with a pair-wise comparison dataset

Let's say I'm trying to train a neural network that predicts a single output [0.0, 1.0] value that correlates to photo realism which I can use either in a classification setting or for ranking. I have ...
ahbutfore's user avatar
  • 201
0 votes
1 answer
37 views

My custom neural network is converging but keras model not

in most cases it is probably the other way round but... I have implemented a basic MLP neural network structure with backpropagation. My data is just a shifted quadratic function with 100 samples. I ...
tymsoncyferki's user avatar
0 votes
0 answers
138 views

Custom Loss Function Returns Graph Execution Error: Can not squeeze dim[0], expected a dimension of 1, got 32

I have built a loss function which adds time and frequency weighted averages and variances to the MSE: ...
Harry Chittenden's user avatar
0 votes
0 answers
170 views

Training loss is much higher than validation loss

I am trying to train a neural network with 2 hidden layers to perform a multi class classification of 3 different classes. There is a huge imbalance to the classes, with the distribution being around ...
joseph wong's user avatar
2 votes
1 answer
150 views

What is the benefit of the exponential function inside softmax?

I know that softmax is: $$ softmax(x) = \frac{e^{x_i}}{\sum_j^n e^{x_j}}$$ This is an $\mathbb{R}^n \implies \mathbb{R}^n$ function, and the elements of the output add up to 1. I understand that the ...
Victor2748's user avatar
0 votes
0 answers
51 views

Train neural network to predict multiple distributions

I aim to train a neural network to predict 2 distributions (10 quantiles, i.e. deciles) at 5 time points. So my y is of shape: ...
A_Murphy's user avatar
0 votes
0 answers
48 views

The cost function gets stuck at 120 epochs

I did a neural network in c++ to recognize handwritten digits using the MNIST dataset without any neural network pre-existing libraries. My network has 784 inputs neuron (the pixel of the image), 100 ...
kripi's user avatar
  • 1
0 votes
0 answers
94 views

Why backpropagation is done in every epoch when loss is always scalar?

I understand the backpropagation algorithm that it calculates the derivate of loss with respect to all the parameters in the neural network. My question is this derivate is constant right because the ...
Jeet's user avatar
  • 101
1 vote
2 answers
3k views

Training and validation loss are almost the same (perfect fit?)

I am developing an ANN from scratch which classifies MNIST digits. These are the curves I get using only one hidden layer composed of 100 neurons activated by ...
tail's user avatar
  • 127
0 votes
1 answer
23 views

Binary crossentropy loss

When we have a binary classification problem, we use a sigmoid activation function in the output layer+ a binary crossentropy loss. We also need to one hot encode the target variable.This s a binary ...
John adams's user avatar
0 votes
1 answer
73 views

How do I know that my weights optimizer have found the best weights?

I am new to deep learning and my understanding of how optimizers work might be slightly off. Also, sorry for a third-grader quality of images. For example if we have simple task our loss to weight ...
Neriko's user avatar
  • 3
1 vote
3 answers
156 views

How to learn steep functions using neural network?

I am trying to use a neural network to learn the below function. In total, I have 25 features and 19 outputs. The above image shows the distribution of two features with respect to one of the outputs....
newbie's user avatar
  • 61
0 votes
1 answer
303 views

Training deep neural networks with ReLU output layer for verification

Most algorithms for verification of deep neural network require ReLU activation functions in each layer (e.g. Reluplex). I have a binary classification task with classes 0 and 1. The main problem I ...
alext90's user avatar

15 30 50 per page
1
2 3 4 5
9