Questions tagged [f1]
a popular criterion for evaluating binary decision algorithms and classification models.
72
questions
1
vote
1
answer
74
views
Logloss worse than random guessing with xgboost
I have a binary classification problem that I am currently trying to tackle with xgboost. This is a low signal-to-noise ratio situation dealing with time series. My out of sample AUC is 0.65, which is ...
0
votes
0
answers
10
views
Wilcoxon's Signed-Rank Test in the context of 2 algorithms and 1 domain
I'm trying to understand whether my analysis for a problem is in the right direction.
I have 2 algorithms (3d object detectors) that I've applied to the same dataset to obtain TP, FP and FN's for each ...
0
votes
0
answers
35
views
Model evaluation approach and How it affects the performance of the model
So the task iam working on is supervised video summarization where the model tries to predict if a video frame is important or no using its features and the labels as annotations of frame scores.
...
2
votes
1
answer
112
views
Comparing probability threshold graphs for F1 score for different models
Below are two plots, side-by side, for an imbalanced dataset.
We have a very large imbalanced dataset that we are processing/transforming in different manner. After each transformation, we run an ...
0
votes
1
answer
29
views
What is F1 Score for this diagram?
I have this Venn chart that represent a dataset prediction of Identifying if our products are classified as "A41" standard or not
The Blue Circle represents a Machine Learning Model ...
0
votes
0
answers
22
views
F1 score mismatch with publication
I'm trying to reproduce the results of the baseline model from SEP28k paper but I struggle to get the details. Most strikingly, the F1 score for random prediction doesn't match the paper. Here are the ...
1
vote
1
answer
16
views
Is there an equivalent for Yates' correction for a confusion matrix-derived metrics?
Given the following table of predictions vs. actual states:
...
2
votes
1
answer
100
views
Binary classification metrics - Combining sensitivity and specificity?
The harmonic mean between precision and recall (F1 score) is a common metric to evaluate binary classification. It is useful because it strikes a balance between precision (FP) and recall (FN).
For ...
1
vote
2
answers
145
views
Why don't we use the harmonic mean of sensitivity and specificity?
There is this question on the F-1 score, asking why we compute the harmonic mean of precision and recall rather than its arithmetic mean. There were good arguments in the answers in favor of the ...
0
votes
0
answers
505
views
F1 score for validation and testing datasets is different
I have the following F1 score function that I use for the model when I train it as part of metrics and as well during prediction:
...
8
votes
2
answers
430
views
Calculating the Brier or log score from the confusion matrix, or from accuracy, sensitivity, specificity, F1 score etc
Suppose I have a confusion matrix, or alternatively any one or more of accuracy, sensitivity, specificity, recall, F1 score or friends for a binary classification problem.
How can I calculate the ...
22
votes
2
answers
2k
views
Academic reference on the drawbacks of accuracy, F1 score, sensitivity and/or specificity
Accuracy, as a KPI for assessing binary classification models, has major drawbacks: Why is accuracy not the best measure for assessing classification models?. The exact same issues also plague the F1 ...
1
vote
0
answers
64
views
Statistical significance of performance difference in classification models
Is it possible to assign a p-value to the mean performance difference in three classification models? The models use the same data, same random seed, and use 10-fold cross validation. Model A has a ...
2
votes
1
answer
224
views
Confidence Interval of the Average of a F1 Score Samples
I have a number of individual F1 score samples and right now I am measuring the average F1 score across this group. However, I would also like to present a confidence interval on it. Its a continuous ...
3
votes
1
answer
204
views
Singular beta in the F-beta vs. threshold score?
Consider this plot of the $F_\beta$ score for different values of $\beta$. I have a hard time getting an intuition as to why they intersect at a same point. (Cf. this blog post.) In other words, why ...