Skip to main content

Questions tagged [f1]

a popular criterion for evaluating binary decision algorithms and classification models.

1 vote
1 answer
74 views

Logloss worse than random guessing with xgboost

I have a binary classification problem that I am currently trying to tackle with xgboost. This is a low signal-to-noise ratio situation dealing with time series. My out of sample AUC is 0.65, which is ...
Baron Yugovich's user avatar
0 votes
0 answers
10 views

Wilcoxon's Signed-Rank Test in the context of 2 algorithms and 1 domain

I'm trying to understand whether my analysis for a problem is in the right direction. I have 2 algorithms (3d object detectors) that I've applied to the same dataset to obtain TP, FP and FN's for each ...
neoavalon's user avatar
0 votes
0 answers
35 views

Model evaluation approach and How it affects the performance of the model

So the task iam working on is supervised video summarization where the model tries to predict if a video frame is important or no using its features and the labels as annotations of frame scores. ...
moha tech's user avatar
2 votes
1 answer
112 views

Comparing probability threshold graphs for F1 score for different models

Below are two plots, side-by side, for an imbalanced dataset. We have a very large imbalanced dataset that we are processing/transforming in different manner. After each transformation, we run an ...
Ashok K Harnal's user avatar
0 votes
1 answer
29 views

What is F1 Score for this diagram?

I have this Venn chart that represent a dataset prediction of Identifying if our products are classified as "A41" standard or not The Blue Circle represents a Machine Learning Model ...
asmgx's user avatar
  • 291
0 votes
0 answers
22 views

F1 score mismatch with publication

I'm trying to reproduce the results of the baseline model from SEP28k paper but I struggle to get the details. Most strikingly, the F1 score for random prediction doesn't match the paper. Here are the ...
marekjg's user avatar
1 vote
1 answer
16 views

Is there an equivalent for Yates' correction for a confusion matrix-derived metrics?

Given the following table of predictions vs. actual states: ...
Bryan's user avatar
  • 1,109
2 votes
1 answer
100 views

Binary classification metrics - Combining sensitivity and specificity?

The harmonic mean between precision and recall (F1 score) is a common metric to evaluate binary classification. It is useful because it strikes a balance between precision (FP) and recall (FN). For ...
usual me's user avatar
  • 1,247
1 vote
2 answers
145 views

Why don't we use the harmonic mean of sensitivity and specificity?

There is this question on the F-1 score, asking why we compute the harmonic mean of precision and recall rather than its arithmetic mean. There were good arguments in the answers in favor of the ...
user209974's user avatar
0 votes
0 answers
505 views

F1 score for validation and testing datasets is different

I have the following F1 score function that I use for the model when I train it as part of metrics and as well during prediction: ...
Avv's user avatar
  • 249
8 votes
2 answers
430 views

Calculating the Brier or log score from the confusion matrix, or from accuracy, sensitivity, specificity, F1 score etc

Suppose I have a confusion matrix, or alternatively any one or more of accuracy, sensitivity, specificity, recall, F1 score or friends for a binary classification problem. How can I calculate the ...
Stephan Kolassa's user avatar
22 votes
2 answers
2k views

Academic reference on the drawbacks of accuracy, F1 score, sensitivity and/or specificity

Accuracy, as a KPI for assessing binary classification models, has major drawbacks: Why is accuracy not the best measure for assessing classification models?. The exact same issues also plague the F1 ...
Stephan Kolassa's user avatar
1 vote
0 answers
64 views

Statistical significance of performance difference in classification models

Is it possible to assign a p-value to the mean performance difference in three classification models? The models use the same data, same random seed, and use 10-fold cross validation. Model A has a ...
Adam_G's user avatar
  • 371
2 votes
1 answer
224 views

Confidence Interval of the Average of a F1 Score Samples

I have a number of individual F1 score samples and right now I am measuring the average F1 score across this group. However, I would also like to present a confidence interval on it. Its a continuous ...
SriK's user avatar
  • 269
3 votes
1 answer
204 views

Singular beta in the F-beta vs. threshold score?

Consider this plot of the $F_\beta$ score for different values of $\beta$. I have a hard time getting an intuition as to why they intersect at a same point. (Cf. this blog post.) In other words, why ...
Tfovid's user avatar
  • 795

15 30 50 per page
1
2 3 4 5