Questions tagged [log-loss]
The log-loss tag has no usage guidance.
62
questions
0
votes
0
answers
9
views
Extremely high logloss in binary classification problem [duplicate]
I have a binary classification problem that I am currently trying to tackle with xgboost. This is a low signal-to-noise ratio situation dealing with time series. Per this answer "Dumb" log-...
1
vote
1
answer
77
views
Logloss worse than random guessing with xgboost
I have a binary classification problem that I am currently trying to tackle with xgboost. This is a low signal-to-noise ratio situation dealing with time series. My out of sample AUC is 0.65, which is ...
2
votes
1
answer
41
views
SAS log likelihood fit statistic from SCORE statement in PROC LOGISTIC [closed]
When using a SCORE statement in PROC LOGISTIC in SAS, I can get fit statistics with FITSTAT. My response variable is binary.
I want to get log likelihood, but looking at this documentation, I'm ...
4
votes
1
answer
274
views
Selecting the model by bootstrapping: AIC vs. log-loss?
I'm building a predictive model with potentially multiple predictors. To that end, I try different, nested models, each with one more predictor than the previous one and compare their AICs. The AIC ...
1
vote
0
answers
22
views
Log base in Cross Entropy Loss [duplicate]
What is the base for the logarithm used in the cross entropy loss (while doing multiclass classification's backpropagation)? Is it e, 2, or 10?
2
votes
0
answers
54
views
Probabilistic interpretation for log-loss
Suppose I am modelling a binary response variable $Y \sim B(p)$ as a function of $p$ features $X_1, \dots, X_p$ by means of an equation of the form
$$
p(Y = 1 \, | \, X = x) = f(x,\theta),
$$
where $\...
18
votes
2
answers
3k
views
Why not use evaluation metrics as the loss function?
Most algorithms use their own loss function for optimization. But these loss functions are always different from metrics used for actual evaluation. For example, for building binary classification ...
3
votes
3
answers
194
views
Challenge an ICML Paper: For a given set of probability predictions and a log loss value, is the set of true labels giving such a loss unique?
Aggarwal's 2021 ICML paper "Label Inference Attacks from Log-loss Scores", seems to argue that the answer to the question in the title is "YES". The paper claims that, given ...
2
votes
1
answer
2k
views
Predicted Probability with XGBClassifier ranging only from 0.48 to 0.51 for either class
Why does my XGBClassifier predicts probability only from 0.48 to 0.51 for either class?
I'm very new to XGBoost, so any suggestions are greatly appreciated! Here's ...
6
votes
2
answers
1k
views
"Dumb" log-loss for a binary classifier
I am trying to understand how I can best compare a classifier that I have trained and tuned against a "dumb" classifier, particularly in the context of binary classification with imbalanced ...
1
vote
1
answer
291
views
Mean logarithmic square error
I have a fallowing problems: I'm training a neural network against some set of output values (regression problem). Those values are between -inf to inf and I can't normalize them, because they come ...
3
votes
1
answer
362
views
exp(log_softmax) vs softmax as neural network activation
I have read about log_softmax being more numerical stable than softmax, since it circumvents the division. I need to use softmax, probabilities between 0 and 1, for my neural network loss function. So ...
1
vote
0
answers
48
views
Is half the log loss, twice as good?
Lets say I have two different models based on the same dataset.
Model A has a log loss of 0.30 on this dataset.
Model B has a log loss of 0.60 on this dataset.
If our scoring metric is log loss, is ...
1
vote
1
answer
1k
views
Understanding multiclass log-loss
I'm trying to understand the multiclass log-loss as described in sk-learns documentation.
The wording ' Let the true labels (Y) for a set of samples be encoded as a 1-of-K binary indicator matrix...', ...
0
votes
0
answers
139
views
If you're trying to match a vector $p$ to $x$, why doesn't a divisive loss function $\frac{p}{x} + \frac{x}{p}$ work better than negative log loss? [duplicate]
Suppose you had a classification problem where you are trying to predict a class label (e.g., $[0 \: 1 \: 0]^T$) with a model. One way to do this is to use log loss:
$\Large \ell_{\log} = -\sum_i[y_i\...