Skip to main content

All Questions

0 votes
0 answers
23 views

How to choose thresholds to discretize target for binary classification

My group is using logistic regression to investigate the most predictive features in a dataset. Our target variable is actually a continuous variable that we discretized using two cutoff thresholds (...
OstensiblyPutative's user avatar
2 votes
0 answers
53 views

Why is cross-entropy increasing with accuracy?

I'm making an implementation of the softmax regression and I'm struggling to understand the nature behind the problem of increasing value of Cross-Entropy: $H(y_i, p_i)=-\sum_{i=1}^C y_i log(p_i)$, ...
JoshJohnson's user avatar
0 votes
0 answers
76 views

PySpark Logistic regression model weights are inconsistent between runs

I am training a pyspark logistic regression model using pyspark mllib. I am noticing that the weights are not being consistent in between runs. I have set the random seed in the training script and ...
hypothesisusable's user avatar
0 votes
0 answers
23 views

How to estimate this variable in an MILP formulation

This is my first question being asked here. I've thought about different methods to do it, but to no avail. I want to estimate a variable that is either 0 or a positive number. Then I want to use this ...
Mohammad Rajabdorri's user avatar
0 votes
1 answer
2k views

How to calculate accuracy of a logistic regression?

A logistic regression involves a linear combination of features to predict the log-odds of a binary, yes/no-style event. That log-odds can then be transformed to a probability. If $\hat L_i$ is the ...
Dave's user avatar
  • 3,979
1 vote
1 answer
165 views

Probability distribution of probabilities

We can get the prediction probabilities of a binary classifier from sklearn's API using the predict_proba method. Is it reasonable to expect that the shape of a histogram plotted for the prediction ...
zebinx's user avatar
  • 11
0 votes
1 answer
130 views

Quasi complete separation problem

I have some question related to quasi complete seperation problem on logistic regression algorithm. So i run the model to predict credit risk and turns out it gave me good prediction score (AUC around ...
Jovian Aditya's user avatar
6 votes
1 answer
263 views

Logistic Regression Modeling & Interpretation [closed]

I'm building a logistic regression model to predict the credit risk of lending company customers. I'm using dataset from kaggle : https://www.kaggle.com/datasets/ranadeep/credit-risk-dataset/code ...
Jovian Aditya's user avatar
0 votes
1 answer
52 views

Can I use clustering after classification to improve the performance of my classifier?

Say I have a classifier that segments my feature vectors (e.g. representing applicants) into 3 distinct segments A, B, C by assigning each applicant a score between 0 (worst) and 1 (best) with e.g. a ...
user63726's user avatar
  • 101
0 votes
3 answers
961 views

Tweak machine learning algorithm in SciKit to optimize for recall

I am given a dataset to detect fraud. Something similar like this: https://www.kaggle.com/code/imgremlin/4th-place-in-fraud-detection-from-zindi The issue with SciKit machine learning algorithm is ...
Tequil's user avatar
  • 1
0 votes
1 answer
106 views

How to find the optimal cut-off point to minimize both the FNR and FPR in R?

I should find the optimal threshold to minimize both the false positive rate and false negative rate. An equal weight between these two rates should be assumed. I write the following code: ...
ebrahimi's user avatar
  • 1,307
8 votes
1 answer
213 views

Examples where simple classifier systems out-perform deep learning

I have been working on a problem where published results using deep learning are substantially worse than results I have obtained on the same task (using the same experimental protocol) using simple ...
Dikran Marsupial's user avatar
0 votes
0 answers
29 views

Where do I draw the line at unbalanced datasets?

I have a problem where I am to construct a classification variable Yes/No based on another feature's value. We are interested in the Yes class in this case. I am told to use 10-fold cross validation. ...
PythonNewb's user avatar
0 votes
1 answer
116 views

Spot Logistic Regression Training Error

My friend gave me this puzzle awhile ago and I've never figured it out. ...
Muddy Penguin's user avatar
0 votes
1 answer
46 views

Predict data using Pre-Trained Classification Model

I have pre trained classification model (saved as pickle file) to predict employee attrition. My question is when I use new dataset to predict using Pickle file do I need do all preprocessing steps (...
Mini Moon's user avatar

15 30 50 per page
1
2 3 4 5
9