Skip to main content

Questions tagged [validation]

The process of assessing whether the results of an analysis are likely to hold outside of the original research setting. DO NOT use this tag for discussing 'validity' of a measurement or instrument (such as that it measures what it purports to), use [validity] tag instead.

0 votes
0 answers
12 views

In X-learner uplift modeling, predictions from the 1st-stage models help train the 2nd-stage models. What data splits should these predictions be on?

In uplift modeling with an X-learner metalearner (Künzel et al. 2019), predictions from the two first-stage models are used in training the two second-stage models. Question: What datasets/splits ...
naive_bayesian's user avatar
0 votes
1 answer
20 views

Using a model to evaluate over or under-priced rental prices for the same apartments used in training

If I have a machine learning model which predicts the rental prices of apartments, can I use the model once complete to analyse the prediction for the same apartments I used to train the model so I ...
AWGIS's user avatar
  • 83
1 vote
1 answer
37 views

How should I split my dataset if I am applying oversampling?

Related: How can I apply multiple sampling tenchiques to a single dataset? Suppose I have a dataset called my_dataset.dat with a length of 1079134 rows. This ...
user366312's user avatar
  • 2,201
5 votes
1 answer
37 views

Validating binary prediction model

Suppose we have a model that predicts for binary event $e$ ($0$ or $1$) with a single output $p$ (the expected probability $e$ occurs). If we are able to compare $p$ with the true value of $e$ ($0$ or ...
shrizzy's user avatar
  • 151
5 votes
2 answers
290 views

Is domain knowledge external validation in clustering?

I have cluster results with good values on etc Silhuette Width. The cluster sizes are: 4998, 1, 1 which isn't good knowing my customers doesn't have that particular partition (it's more balanced). I ...
ExchangedVisual111's user avatar
2 votes
1 answer
32 views

Are there strategies for measuring accuracy of Euclidean distance-based similarity without ground truthing?

I have subjects with about 200 features each. These feature vectors are stored in a vector database, where similarity searching with Euclidean distance is used to find subjects that are similar to a ...
T_d's user avatar
  • 23
0 votes
0 answers
19 views

How were the asymmetric recovery ranges in Table A5 of Appendix F from AOAC determined?

I am trying to understand how the recovery ranges in Table A5 of Appendix F from AOAC (https://www.aoac.org/wp-content/uploads/2019/08/app_f.pdf) were determined. I did not understand how the ...
Éderson D'Martin Costa's user avatar
0 votes
0 answers
11 views

What to do with features causing data drift?

I dispose of labeled train data and unlabeled test data. I want to tune and validate a classifier on train data in such a way that it can have good performance on test. By conducting some ...
Yann's user avatar
  • 43
0 votes
0 answers
9 views

Validation accuracy dip and recovery when restarting training

i was fine-tuning this large language model with Stochastic Gradient Descent and mid epoch i stopped training, and saved the model weights. Then at a later time, reloaded the weights and restarted the ...
clam's user avatar
  • 348
0 votes
1 answer
45 views

Are the p-values obtained on the same sample using synthetic AA tests (Monte Carlo) independent values?

Let's say we have the following procedure. We take a fixed sample of size n and perform the procedure 1000 time: we divide (split) it equally into 2 groups; we calculate p value using the F function (...
Романов Андрей's user avatar
4 votes
2 answers
113 views

How to split and sample "Panel Data" when training a Logistic Regression to predict future outcomes

Introduction I have panel data where customer behavior is observed over time. For each customer at a given reference date, I have a lookback window of 12 months for generating features, and a look ...
Esben Eickhardt's user avatar
1 vote
0 answers
27 views

Evaluate hierarchical clustering with partial ground truth

I am performing hierarchical clustering, and I need to decide which agglomeration method to use. While I don't have a ground truth, I know that some datapoints should be closer together: for example, ...
Alexlok's user avatar
  • 145
1 vote
1 answer
21 views

Validation Accuracy higher if trained on training set than in validation set

So, I'm having a problem with the validation of a model, in particular, I'm trying a linear readoff (a logistic regression attached to a middle layer of a neural network) In particular, if I train the ...
Alberto's user avatar
  • 1,217
1 vote
0 answers
7 views

Image classification metrics

I have been working on an image classification task using CNNs and getting some puzzling results. My training, validation and test loss keep going down with epochs and are comparable. So this might ...
Nithin's user avatar
  • 11
0 votes
0 answers
18 views

Leaving duplicated entries in a dataset at pretraining stage

I'm adopting a fine-tuning approach after having pretrained a deep learning model (transformer) on a source dataset (let's call it dataset A) and then fine-tuning it on a target dataset (B). Dataset A ...
James Arten's user avatar

15 30 50 per page
1
2 3 4 5
59