Questions tagged [validation]

Ask Question

The process of assessing whether the results of an analysis are likely to hold outside of the original research setting. DO NOT use this tag for discussing 'validity' of a measurement or instrument (such as that it measures what it purports to), use [validity] tag instead.

875 questions

0 votes

0 answers

12 views

In X-learner uplift modeling, predictions from the 1st-stage models help train the 2nd-stage models. What data splits should these predictions be on?

In uplift modeling with an X-learner metalearner (Künzel et al. 2019), predictions from the two first-stage models are used in training the two second-stage models. Question: What datasets/splits ...

naive_bayesian

asked yesterday

0 votes

1 answer

20 views

Using a model to evaluate over or under-priced rental prices for the same apartments used in training

If I have a machine learning model which predicts the rental prices of apartments, can I use the model once complete to analyse the prediction for the same apartments I used to train the model so I ...

AWGIS

asked Jul 1 at 9:50

1 vote

1 answer

37 views

How should I split my dataset if I am applying oversampling?

Related: How can I apply multiple sampling tenchiques to a single dataset? Suppose I have a dataset called my_dataset.dat with a length of 1079134 rows. This ...

user366312

2,201

asked Jun 30 at 19:34

5 votes

1 answer

37 views

Validating binary prediction model

Suppose we have a model that predicts for binary event $e$ ($0$ or $1$) with a single output $p$ (the expected probability $e$ occurs). If we are able to compare $p$ with the true value of $e$ ($0$ or ...

shrizzy

asked Jun 5 at 19:58

5 votes

2 answers

290 views

Is domain knowledge external validation in clustering?

I have cluster results with good values on etc Silhuette Width. The cluster sizes are: 4998, 1, 1 which isn't good knowing my customers doesn't have that particular partition (it's more balanced). I ...

ExchangedVisual111

asked Jun 2 at 16:08

2 votes

1 answer

32 views

Are there strategies for measuring accuracy of Euclidean distance-based similarity without ground truthing?

I have subjects with about 200 features each. These feature vectors are stored in a vector database, where similarity searching with Euclidean distance is used to find subjects that are similar to a ...

T_d

asked May 29 at 16:14

0 votes

0 answers

19 views

How were the asymmetric recovery ranges in Table A5 of Appendix F from AOAC determined?

I am trying to understand how the recovery ranges in Table A5 of Appendix F from AOAC (https://www.aoac.org/wp-content/uploads/2019/08/app_f.pdf) were determined. I did not understand how the ...

Éderson D'Martin Costa

asked May 29 at 13:41

0 votes

0 answers

11 views

What to do with features causing data drift?

I dispose of labeled train data and unlabeled test data. I want to tune and validate a classifier on train data in such a way that it can have good performance on test. By conducting some ...

Yann

asked May 16 at 23:43

0 votes

0 answers

9 views

Validation accuracy dip and recovery when restarting training

i was fine-tuning this large language model with Stochastic Gradient Descent and mid epoch i stopped training, and saved the model weights. Then at a later time, reloaded the weights and restarted the ...

clam

asked May 7 at 13:31

0 votes

1 answer

45 views

Are the p-values obtained on the same sample using synthetic AA tests (Monte Carlo) independent values?

Let's say we have the following procedure. We take a fixed sample of size n and perform the procedure 1000 time: we divide (split) it equally into 2 groups; we calculate p value using the F function (...

Романов Андрей

asked May 7 at 12:03

4 votes

2 answers

113 views

How to split and sample "Panel Data" when training a Logistic Regression to predict future outcomes

Introduction I have panel data where customer behavior is observed over time. For each customer at a given reference date, I have a lookback window of 12 months for generating features, and a look ...

Esben Eickhardt

asked May 6 at 15:13

1 vote

0 answers

27 views

Evaluate hierarchical clustering with partial ground truth

I am performing hierarchical clustering, and I need to decide which agglomeration method to use. While I don't have a ground truth, I know that some datapoints should be closer together: for example, ...

Alexlok

asked Apr 25 at 12:14

1 vote

1 answer

21 views

Validation Accuracy higher if trained on training set than in validation set

So, I'm having a problem with the validation of a model, in particular, I'm trying a linear readoff (a logistic regression attached to a middle layer of a neural network) In particular, if I train the ...

Alberto

1,217

asked Apr 18 at 15:09

1 vote

0 answers

7 views

Image classification metrics

I have been working on an image classification task using CNNs and getting some puzzling results. My training, validation and test loss keep going down with epochs and are comparable. So this might ...

Nithin

asked Apr 17 at 15:55

0 votes

0 answers

18 views

Leaving duplicated entries in a dataset at pretraining stage

I'm adopting a fine-tuning approach after having pretrained a deep learning model (transformer) on a source dataset (let's call it dataset A) and then fine-tuning it on a target dataset (B). Dataset A ...

James Arten

asked Apr 10 at 8:59

15 30 50 per page

2 3 4 5

…

59 Next

Stack Exchange Network

Questions tagged [validation]

In X-learner uplift modeling, predictions from the 1st-stage models help train the 2nd-stage models. What data splits should these predictions be on?

Using a model to evaluate over or under-priced rental prices for the same apartments used in training

How should I split my dataset if I am applying oversampling?

Validating binary prediction model

Is domain knowledge external validation in clustering?

Are there strategies for measuring accuracy of Euclidean distance-based similarity without ground truthing?

How were the asymmetric recovery ranges in Table A5 of Appendix F from AOAC determined?

What to do with features causing data drift?

Validation accuracy dip and recovery when restarting training

Are the p-values obtained on the same sample using synthetic AA tests (Monte Carlo) independent values?

How to split and sample "Panel Data" when training a Logistic Regression to predict future outcomes

Evaluate hierarchical clustering with partial ground truth

Validation Accuracy higher if trained on training set than in validation set

Image classification metrics

Leaving duplicated entries in a dataset at pretraining stage

Hot Network Questions

Questions tagged [validation]

Related Tags