Skip to main content

Questions tagged [train]

Training (or estimation) of statistical models or machine learning algorithms.

0 votes
1 answer
20 views

Using a model to evaluate over or under-priced rental prices for the same apartments used in training

If I have a machine learning model which predicts the rental prices of apartments, can I use the model once complete to analyse the prediction for the same apartments I used to train the model so I ...
AWGIS's user avatar
  • 83
0 votes
0 answers
9 views

Validation accuracy dip and recovery when restarting training

i was fine-tuning this large language model with Stochastic Gradient Descent and mid epoch i stopped training, and saved the model weights. Then at a later time, reloaded the weights and restarted the ...
clam's user avatar
  • 348
0 votes
1 answer
49 views

Is Gaussian Process Regression more suitable for limited amounts of training data than other methods?

In the field of machine learning for molecular properties, one sometimes has to deal with low amounts of (experimental) training data. I have heard some people advising me to use Gaussian Process ...
C_Swann22's user avatar
  • 103
3 votes
1 answer
4k views

Understanding the advantages of BF16 vs. FP16 in mixed precision training

Brain float (BF16) and 16-bit floating point (FP16) both require 2 bytes of memory, but in contrast to FP16, BF16 allows to represent a much larger numerical range than FP16, so under-/overflows won't ...
Green绿色's user avatar
0 votes
1 answer
33 views

Make Predictions with an RNN Using a Multi-dimensional Training Set

I have a 2D matrix TD of training data that is a collection of N non-linear signals that are functions of time (hence the ...
Jonathan Frutschy's user avatar
0 votes
1 answer
36 views

Is there a (lower) limit/minimum for learning rate values?

I'm building a model for traffic prediction with ConvLSTM and A3T-GCN cells. Since the input data is highly complex and the model is relatively big, I can only load ...
olenscki's user avatar
  • 101
1 vote
0 answers
14 views

Determining Optimal Data Period / Time Span for Model Training

I'm seeking advice on determining the ideal time span for optimizing a weather forecast strategy using historical data without overfitting/underfitting our model. In pursuit of optimal performance and ...
RezAm's user avatar
  • 111
2 votes
0 answers
48 views

How was the word2vec model trained?

Let's take the CBOW (continuous bag of words) model as the example. Suppose that, there are $c$ context words, each of which is a one-hot encoding vector. So the total number of elements of input ...
J. Doe's user avatar
  • 66
0 votes
0 answers
24 views

Trained network always predicts zero [duplicate]

I have an encoder model and I'm training it with a dataset of signals with size (500,1). The data set is normalized and then used to train the model but the problem is that after the model is trained, ...
rrSep's user avatar
  • 1
6 votes
1 answer
91 views

Does training time increase more if I add a layer at the beginning of a neural network or at the end?

Let's consider a fixed NN architecture, dataset and hardware. We add a layer, either at the beginning or at the end of the NN. In which case the training time will increase more? Intuitively, I ...
DeltaIV's user avatar
  • 18.3k
0 votes
0 answers
8 views

Deep NN with positive partial derivative

Let's assume we are given a FFNN of type $$F: \mathbb{R}^n \times \mathbb{R} \rightarrow \mathbb{R}, \quad (x_1,...,x_{n+1}) \mapsto y$$ We assume the generic architecture (of depth $H$) $$a^{l+1}=\...
NicAG's user avatar
  • 181
1 vote
0 answers
45 views

Do common implementations of mini-batch gradient descent violate the i.i.d assumption needed for unbiased estimation?

When we perform mini-batch GD, we estimate the true gradient: $$\nabla L = \frac{1}{N} \sum_i \nabla L_i$$ with: $$\nabla_B L = \frac{1}{B} \sum_{i \in B} \nabla L_i$$ where $B$ is the batch size. ...
ado sar's user avatar
  • 477
1 vote
1 answer
118 views

Classification Threshold Optimization after GridSearchCV

In my machine learning problem I am using a CNN to classify images. Since my dataset is imbalanced I want to perform classification probability threshold tuning so I can find the optimal balance ...
Throwaway123's user avatar
1 vote
1 answer
61 views

Model complexity and number of examples

Is there a measure for model complexity? For given units of this measure how many examples do we need to train a network to get the model right and generalize? In essence what is the relation between ...
Justaperson's user avatar
1 vote
0 answers
15 views

XGBoost Training Logloss dropping but Validation staying steady [duplicate]

Im currently hyper parameter tuning my model and returning the model with the least amount of error. Before I start the hyper parameter tuning process I ensure my validation and test data is is ...
 paddockson's user avatar

15 30 50 per page
1
2 3 4 5
24