All Questions
Tagged with classification python
428
questions
0
votes
1
answer
24
views
How can I improve xgboost classifier if overfitting start from the initial epochs?
I am training a XGBoost multi-class classifier, but got very bad result. The train/val leaning multi-class logloss curve showed that overfitting started from the early epochs. What directions can I ...
1
vote
0
answers
20
views
How to apply CalibratedClassifierCV in external validation of a Random Forest model
I have a model trained on my data. I used joblib to get the model and shared with other teams to evaluate the performance of the model on their data. One of the team came back and said that the models ...
0
votes
1
answer
52
views
Train/test split of data, stratified based on label, but ensuring no athletes are In both train/test sets
I’m working on a project that uses data from wearable tech for activity classification. However, I’m having trouble deciding on how to do the train/test split. I’m currently doing the split based on ...
1
vote
1
answer
39
views
Data binning for interval data
I am trying to create a ML model for salary classification into 5 categories (0-90k, 90-120k, 120-180k and so on).
The problem is that in my dataset almost all salary data is presented in intervals. ...
0
votes
1
answer
41
views
How to optimize my CNN classification architecture
I have this CNN based model architecture that takes an RGB image. Now I'm trying to change it for a color classification case on an object (10 color classes: white, black, yellow, etc). This current ...
0
votes
0
answers
9
views
How to find the minimum data point that predicts the target class in longitudinal data
I am working on medical data where a screening is done regularly for 200 days. I need to know the minimum number of screenings that can predict the outcome. I also need to know the best time/times to ...
0
votes
0
answers
38
views
What feature selection method is ideal for a large dimensional data frame after the result of one hot encoding?
I am trying to solve a sports related multi class classification problem in Python, I aim to train a custom neural network and also a SVM. I have performed prior data cleaning and encoded my data ...
0
votes
0
answers
35
views
How can I identify coverage types in NFL games using Computer vision
I am currently working on a project that classifies coverage types from sports highlights using advanced computer vision techniques. Next Gen Stats effectively utilizes tracking data to identify ...
0
votes
0
answers
12
views
Measuring Product Search effectiveness
I want to measure the effectiveness of my search engine, one of the ways i can do that is by measuring the rate at which a customer reformulates the previous query. Hence, I need to quantify inter-...
1
vote
2
answers
222
views
Why do we need hyperparameter tuning in Scikit learn? Doesn't sk learn models by default give best model?
When I have the option to build a classifier like this directly
clf = RandomForestClassifier()
why do we perform tuning by restricting the parameters like this
<...
0
votes
0
answers
19
views
RNN model for predicting sequences based on sequences of different lengths with Keras
I have data that are sequences of repeated values of different lengths. The value is categorical and can take values from 1 to 184. I used padded with 0 and masking:
...
0
votes
1
answer
49
views
Which Python lib to use for classify data without training any model?
I want to classify data without training any model (nor using neural networks?), Should I use scikit-learn or scipy?
There are also others like pytorch or keras that also have the classify method. ...
0
votes
0
answers
56
views
ROC curve for multiclassification - results sound not correct
I'm working on a multiclassification task using LSTM algorithm, i generated my roc curve plots but they give scores like 1 , 0.99, 0.97 however i have an accuracy of 0.97, Precision 0.65, Sensitivity/...
0
votes
0
answers
25
views
Is it possible okay to use regression MLP for ordinal classification problem when target variable is numerical?
I have a target variable of 1-10 that represent difficulty level. These are individual classes represented by integers with 1 being the easiest and 10 most difficult. I have decided to use regression ...
1
vote
1
answer
107
views
How to know the confidence of a classification on unlabeled data generated after model training?
I have created (in python) the code for a Random Forest classification model for a labeled dataset using sklearn. The model works very well.
...