All Questions
Tagged with classification random-forest
120
questions
0
votes
0
answers
34
views
PR-AUC vs F1 vs Balanced Accuracy
I'm trying to create a Random Forest Classifier for selecting ~ 700 features.
I have a highly imbalanced dataset to select features from. There are significantly fewer positive cases (1%) compared ...
1
vote
2
answers
222
views
Why do we need hyperparameter tuning in Scikit learn? Doesn't sk learn models by default give best model?
When I have the option to build a classifier like this directly
clf = RandomForestClassifier()
why do we perform tuning by restricting the parameters like this
<...
1
vote
1
answer
107
views
How to know the confidence of a classification on unlabeled data generated after model training?
I have created (in python) the code for a Random Forest classification model for a labeled dataset using sklearn. The model works very well.
...
2
votes
2
answers
309
views
Random Forest Classification model performing much better with 70:30 TEST:TRAIN rather than the opposite
I'm working on a Classification problem as a side project and I'm receiving results contrary to what I'd expect.
With 100,000 records, each with 7 components for X, the model is performing much better ...
0
votes
1
answer
20
views
How do I use a column with data of different layers for AI?
I am working with real estate data for an ML/DL project. In the csv file there is a column in which each cell contains data like the examples below:
...
0
votes
2
answers
70
views
Which random_state to use in test_train_split when deploying final model?
I have developed a Random Forest that gives varying results depending on the random state of the test train split. This is normal, because a lot of the values in the data are extreme, without being ...
1
vote
2
answers
316
views
How to remove test set so that model uses all data as training data?
I have developed a RandomForest classification model and I am pretty satisfied with the results on the test set.
Now, my next step is to deploy the model. Before ...
1
vote
1
answer
1k
views
Is it advisable to save an ML model as a Joblib/Pickle file?
Part of our thesis project is to create a Diabetes predictor web application, and I have something I like to clarify. Is it a common practice to save an ML model as a Joblib/Pickle file like this one? ...
0
votes
1
answer
253
views
Binary classification performance difference between 0 and 1 class
I have trained a binary Random Forest classifier on a dataset containing 7M rows. I also set aside a holdout validation set of 1M rows that the training pipeline never sees. The dataset consists of ...
0
votes
1
answer
43
views
How to find best range of independent variables in Random forest classification
I am running a binary random forest classification model and I need to know what the best possible/optimal range of each of the independent variables used in the model that drives best possible class ...
1
vote
1
answer
426
views
Assess overfitting - All model metrics or only specific metric?
I am working on a binary classification using random forest with 977 records with 77:23 class proportion
I got the below performance in train and test data (AUC = 81)
Train data
Test data
My metric ...
2
votes
0
answers
969
views
Sample size for SHAP explainer and range of a SHAP value
I am working on a binary classification with 977 records with 77:23 class proportion. I used random forest model.
Based on my attempt to run SHAP package, I got the below plots
And I also see that ...
3
votes
1
answer
7k
views
Why does gridsearchCV fit fail?
I already referred this post here but there is no answer.
I am working on a binary classification using a random forest classifier. My dataset shape is (977,8) with 77:23 class proportion. My system ...
2
votes
1
answer
4k
views
How to interpret SHAP summary plot?
I already referred these posts here and here. So, please don't mark it as duplicate
I am doing a binary classification using random forest and class labels are 1 and 0. What is the likelihood that ...
1
vote
0
answers
553
views
RFECV best n_features doesn't correspond to best gridscore
I am working on a feature selection for a binary classification problem with 977 records (and class proportion of 77:23). I already referred these two related posts - here and here. step size = 1 and ...