Skip to main content

All Questions

Tagged with
0 votes
1 answer
24 views

How can I improve xgboost classifier if overfitting start from the initial epochs?

I am training a XGBoost multi-class classifier, but got very bad result. The train/val leaning multi-class logloss curve showed that overfitting started from the early epochs. What directions can I ...
jabberwoo's user avatar
  • 101
1 vote
0 answers
20 views

How to apply CalibratedClassifierCV in external validation of a Random Forest model

I have a model trained on my data. I used joblib to get the model and shared with other teams to evaluate the performance of the model on their data. One of the team came back and said that the models ...
user2704338's user avatar
0 votes
1 answer
52 views

Train/test split of data, stratified based on label, but ensuring no athletes are In both train/test sets

I’m working on a project that uses data from wearable tech for activity classification. However, I’m having trouble deciding on how to do the train/test split. I’m currently doing the split based on ...
Shane O Mahony's user avatar
1 vote
1 answer
39 views

Data binning for interval data

I am trying to create a ML model for salary classification into 5 categories (0-90k, 90-120k, 120-180k and so on). The problem is that in my dataset almost all salary data is presented in intervals. ...
pinkkdeerr's user avatar
0 votes
1 answer
41 views

How to optimize my CNN classification architecture

I have this CNN based model architecture that takes an RGB image. Now I'm trying to change it for a color classification case on an object (10 color classes: white, black, yellow, etc). This current ...
Mary's user avatar
  • 217
0 votes
0 answers
9 views

How to find the minimum data point that predicts the target class in longitudinal data

I am working on medical data where a screening is done regularly for 200 days. I need to know the minimum number of screenings that can predict the outcome. I also need to know the best time/times to ...
Ghof-90's user avatar
0 votes
0 answers
38 views

What feature selection method is ideal for a large dimensional data frame after the result of one hot encoding?

I am trying to solve a sports related multi class classification problem in Python, I aim to train a custom neural network and also a SVM. I have performed prior data cleaning and encoded my data ...
pastybake2002's user avatar
0 votes
0 answers
35 views

How can I identify coverage types in NFL games using Computer vision

I am currently working on a project that classifies coverage types from sports highlights using advanced computer vision techniques. Next Gen Stats effectively utilizes tracking data to identify ...
Shah Zeb's user avatar
0 votes
0 answers
12 views

Measuring Product Search effectiveness

I want to measure the effectiveness of my search engine, one of the ways i can do that is by measuring the rate at which a customer reformulates the previous query. Hence, I need to quantify inter-...
ricardo's user avatar
  • 23
1 vote
2 answers
222 views

Why do we need hyperparameter tuning in Scikit learn? Doesn't sk learn models by default give best model?

When I have the option to build a classifier like this directly clf = RandomForestClassifier() why do we perform tuning by restricting the parameters like this <...
Hola's user avatar
  • 13
0 votes
0 answers
19 views

RNN model for predicting sequences based on sequences of different lengths with Keras

I have data that are sequences of repeated values of different lengths. The value is categorical and can take values from 1 to 184. I used padded with 0 and masking: ...
meyer's user avatar
  • 1
0 votes
1 answer
49 views

Which Python lib to use for classify data without training any model?

I want to classify data without training any model (nor using neural networks?), Should I use scikit-learn or scipy? There are also others like pytorch or keras that also have the classify method. ...
SSSOF's user avatar
  • 17
0 votes
0 answers
56 views

ROC curve for multiclassification - results sound not correct

I'm working on a multiclassification task using LSTM algorithm, i generated my roc curve plots but they give scores like 1 , 0.99, 0.97 however i have an accuracy of 0.97, Precision 0.65, Sensitivity/...
biihu's user avatar
  • 21
0 votes
0 answers
25 views

Is it possible okay to use regression MLP for ordinal classification problem when target variable is numerical?

I have a target variable of 1-10 that represent difficulty level. These are individual classes represented by integers with 1 being the easiest and 10 most difficult. I have decided to use regression ...
Yoseph Ismail's user avatar
1 vote
1 answer
107 views

How to know the confidence of a classification on unlabeled data generated after model training?

I have created (in python) the code for a Random Forest classification model for a labeled dataset using sklearn. The model works very well. ...
Daniel Vieira's user avatar

15 30 50 per page
1
2 3 4 5
29