Skip to main content

All Questions

0 votes
0 answers
34 views

PR-AUC vs F1 vs Balanced Accuracy

I'm trying to create a Random Forest Classifier for selecting ~ 700 features. I have a highly imbalanced dataset to select features from. There are significantly fewer positive cases (1%) compared ...
user155775's user avatar
1 vote
2 answers
222 views

Why do we need hyperparameter tuning in Scikit learn? Doesn't sk learn models by default give best model?

When I have the option to build a classifier like this directly clf = RandomForestClassifier() why do we perform tuning by restricting the parameters like this <...
Hola's user avatar
  • 13
1 vote
1 answer
107 views

How to know the confidence of a classification on unlabeled data generated after model training?

I have created (in python) the code for a Random Forest classification model for a labeled dataset using sklearn. The model works very well. ...
Daniel Vieira's user avatar
2 votes
2 answers
309 views

Random Forest Classification model performing much better with 70:30 TEST:TRAIN rather than the opposite

I'm working on a Classification problem as a side project and I'm receiving results contrary to what I'd expect. With 100,000 records, each with 7 components for X, the model is performing much better ...
GroupTheory14's user avatar
0 votes
1 answer
20 views

How do I use a column with data of different layers for AI?

I am working with real estate data for an ML/DL project. In the csv file there is a column in which each cell contains data like the examples below: ...
Muhammad Usman's user avatar
0 votes
2 answers
70 views

Which random_state to use in test_train_split when deploying final model?

I have developed a Random Forest that gives varying results depending on the random state of the test train split. This is normal, because a lot of the values in the data are extreme, without being ...
Nemo_the_scientist's user avatar
1 vote
2 answers
316 views

How to remove test set so that model uses all data as training data?

I have developed a RandomForest classification model and I am pretty satisfied with the results on the test set. Now, my next step is to deploy the model. Before ...
Nemo_the_scientist's user avatar
1 vote
1 answer
1k views

Is it advisable to save an ML model as a Joblib/Pickle file?

Part of our thesis project is to create a Diabetes predictor web application, and I have something I like to clarify. Is it a common practice to save an ML model as a Joblib/Pickle file like this one? ...
mynameiswadey's user avatar
0 votes
1 answer
253 views

Binary classification performance difference between 0 and 1 class

I have trained a binary Random Forest classifier on a dataset containing 7M rows. I also set aside a holdout validation set of 1M rows that the training pipeline never sees. The dataset consists of ...
fendrbud's user avatar
0 votes
1 answer
43 views

How to find best range of independent variables in Random forest classification

I am running a binary random forest classification model and I need to know what the best possible/optimal range of each of the independent variables used in the model that drives best possible class ...
DwitiB's user avatar
  • 1
1 vote
1 answer
426 views

Assess overfitting - All model metrics or only specific metric?

I am working on a binary classification using random forest with 977 records with 77:23 class proportion I got the below performance in train and test data (AUC = 81) Train data Test data My metric ...
The Great's user avatar
  • 2,585
2 votes
0 answers
969 views

Sample size for SHAP explainer and range of a SHAP value

I am working on a binary classification with 977 records with 77:23 class proportion. I used random forest model. Based on my attempt to run SHAP package, I got the below plots And I also see that ...
The Great's user avatar
  • 2,585
3 votes
1 answer
7k views

Why does gridsearchCV fit fail?

I already referred this post here but there is no answer. I am working on a binary classification using a random forest classifier. My dataset shape is (977,8) with 77:23 class proportion. My system ...
The Great's user avatar
  • 2,585
2 votes
1 answer
4k views

How to interpret SHAP summary plot?

I already referred these posts here and here. So, please don't mark it as duplicate I am doing a binary classification using random forest and class labels are 1 and 0. What is the likelihood that ...
The Great's user avatar
  • 2,585
1 vote
0 answers
553 views

RFECV best n_features doesn't correspond to best gridscore

I am working on a feature selection for a binary classification problem with 977 records (and class proportion of 77:23). I already referred these two related posts - here and here. step size = 1 and ...
The Great's user avatar
  • 2,585

15 30 50 per page
1
2 3 4 5
8