Skip to main content

All Questions

0 votes
0 answers
13 views

Classification techniques for continuous arrays as inputs and scalar categorical variable as output

a newbie here. If you had any ideas about the following, that would be great. Suppose for a given data set: T’s and Y’s are arrays with T = [0 1 2 3 5 6 7] Y= [4 7 9 3 6 1] So at T=0, Y=4 and so on Z =...
Ash Ketchump's user avatar
1 vote
1 answer
53 views

how many samples do i need to be sure about my model metrics?

i have 50 features(columns) and 100 samples(rows) dataset for binary classification problem, i have build a ML model by using cross validation and it has model metrics like roc_auc=0.71 f1=0.75 ...
M.SEL's user avatar
  • 13
0 votes
1 answer
162 views

How can I labelling a sequence of network traffic to one single classification?

I want to labelling network traffic (several .pcap-files) to different classifications. But this network traffic are not just one entry, there are sequence of entries (~50). So how is it possible, to ...
user155518's user avatar
1 vote
1 answer
13 views

Classifier learn from model or data

I'm new in data science. I read context . It saids "The clf (for classifier) estimator instance is first fitted to the model; that is, it must learn from the model. ...." and example shows: <...
Mike Liu's user avatar
-1 votes
1 answer
124 views

Which is the best binary classification model? Train and Test Accuracy are similar

I am building a binary classification model where classes are imbalanced but used SMOTE, I used 4 different models to compare performance and decide which to choose. They have same train and test ...
Sarah's user avatar
  • 1
0 votes
1 answer
22 views

Human Classification Error Detection

I'm working on a model to detect errors on human classifications. I already have a classifier model M: X -> Y, but I need now a model M' (X,Y) -> {0,1}. Y contains a lot of classes (~5-6k). My ...
EzrielS's user avatar
  • 323
1 vote
0 answers
37 views

ML Modeling Recommendation for Predicting Snake Encounters in Historical Journey Data

I have a dataset consisting of historical journey data where individuals travel from point A to point B. During their journeys, they may encounter varying numbers of animal sightings, including snakes....
Sita's user avatar
  • 11
0 votes
0 answers
143 views

SVM taking too much time to train

I'm trying to train my ML model with Svm.svc from sklearn, but it is taking so much time, it won't even train for once. This happens only when kernel function is used. Currently i selected 10 Features ...
OctoCat's user avatar
0 votes
1 answer
41 views

Is classification enough for this?

I have a DB with 2 tables connected with a one to many relationship. Let's say one of A is linked to many of B. The tables have both two fields for a date and some text. And both tables get new ...
denpal's user avatar
  • 1
0 votes
1 answer
20 views

Applying the model on validation data achieves higher performance than on test set. Is this possible?

I trained a binary cross-validated classification model and got high performance (about 90) on the test data but when I apply the model to new unseen data to see how to performs, i get even higher ...
Din's user avatar
  • 11
1 vote
0 answers
18 views

Looking at feature contribution after classifying groups using components

I have a lot of features and many are correlated, so I performed dimensionality reduction. I then used these components in binary classification and got high accuracy. I also performed feature ...
Din's user avatar
  • 11
1 vote
1 answer
64 views

The Sklearn train_test_split function is create training data and test data which are not similar

I am working on loan default data and my model is not able to make accurate predictions on the test set because the the default percentage on the test set is very different from that of the training ...
J.Sriram's user avatar
0 votes
1 answer
108 views

How to use 5 different datasets to re-train & test your ML Classification models multiple times

My R scripts and my 5 source datasets can be found in my GitHub Repository for this project, and I originally found this source data on Kaggle. This set of source data includes 5 datasets with over ...
Marlen's user avatar
  • 167
1 vote
3 answers
2k views

Unable to build a XGBoost classifier that gives good precision and recall on highly imbalanced data

The XGBoost Classifier I built is consistently returning a f1 score of 0 and I am unable to fix this despite experimenting with various hyperparameters. The data is heavily imbalanced and hence I feel ...
J.Sriram's user avatar
1 vote
1 answer
42 views

ML for predictions that exist "in-between" classification targets

I have 5 broad metagenomic "ecoregion" categories (just think lots of DNA at different nice locations) which become the training targets for their complete (and augmented) metagenomic data. ...
M__'s user avatar
  • 216

15 30 50 per page
1
2 3 4 5
7