Skip to main content

All Questions

0 votes
0 answers
12 views

Measures of efficacy for one classification models on the same data set with different numbers of classes?

I am currently doing a university project in supervised learning. The variable to be predicted varies across the integers [0,100] and my supervisor suggested to split this range into different classes ...
Oliver's user avatar
  • 1
1 vote
0 answers
32 views

How does ROC work with SVM?

Could someone please explain how ROC works with SVM? Specifically i'm using RocCurveDisplay.from_predictions(y_test, y_pred, ax=ax[1]) which works fine. Since the ...
lemintare's user avatar
15 votes
2 answers
711 views

Why does data science see class imbalance as a problem for supervised learning when statistics does not?

Why does data science see class imbalance as a problem in supervised learning when statistics says it is not? Data science seems to seem class imbalance as problematic and needing special techniques ...
Dave's user avatar
  • 3,979
0 votes
0 answers
12 views

Automating the task of figuring out if a task is classification or regression

When manually identifying if a given dataset and dependent variable are suitable for classification or regression I look at the type of variable (continuous or discrete) in which the name and values ...
str31's user avatar
  • 13
0 votes
1 answer
2k views

How to calculate accuracy of a logistic regression?

A logistic regression involves a linear combination of features to predict the log-odds of a binary, yes/no-style event. That log-odds can then be transformed to a probability. If $\hat L_i$ is the ...
Dave's user avatar
  • 3,979
1 vote
0 answers
26 views

Laben Encoding for Target Classes: Any Integer or Consecutive Integers from Zero?

I'm handling an very conventional supervised classification task with three (mutually exclusive) target categories (not ordinal ones): ...
Hendrik's user avatar
  • 8,677
1 vote
1 answer
118 views

High Performance Classification or Similarity Algorithim for Mixed Data Types?

I have a database holding 10-ish features that describe different breeds of dogs. They are mostly categorical features, but some provide ranges for values. Here's a demo representation of the database,...
CyberBully2003's user avatar
1 vote
1 answer
47 views

What are the benefits of combining semi-supervised and supervised learning methods?

I've been looking into semi-supervised learning more, specifically label propagation and label spreading. When reading through tutorials and some papers I've seen it mentioned that often times the ...
lamyvista's user avatar
2 votes
1 answer
222 views

ROC_AUC score is higher before tuning n _neighbors for KNN

This is for multiclass classification. Before tuning the n_neighbors for KNN, these were the results: ...
user2807477's user avatar
0 votes
1 answer
382 views

Classification for two dimensional data

I have time series like 500 data points of $(x,y)$ pairs, where $x$ = time in seconds and $y$ = signals. Each of these candidates/time series has an additional label, which tells about the nature of ...
Ayan Mitra's user avatar
0 votes
2 answers
163 views

How to deal with temporal trend in ML

I am fitting a binary classifier and I observe a temporal trend in the response variable, meaning that the actual percentage of positives fluctuates with time, I can see periods where it is high and ...
Anatole's user avatar
  • 181
2 votes
1 answer
49 views

Some questions about supervised learning, model evaluation and preprocessing [closed]

I've been trying to employ some basic techniques of supervised learning on a dataset that I have and I have several questions about the overall procedure (i.e. data preprocessing, model evaluation etc)...
ChrisNick92's user avatar
0 votes
1 answer
50 views

What does "S" in Shannon's entropy stands for?

I see many machine learning texts using the following notation to represent Shannon's entropy in classification/supervised learning contexts: $$ H(S) = \sum_{i \in Y}p_i \log(p_i) $$ Where $p_i$ is ...
heresthebuzz's user avatar
5 votes
2 answers
185 views

Dealing with unbalanced training set compared with real world data

I am working on a fraud detection model that prevents fraudulent users from using our solution. My model is performing great but the issue I have is that the more the model becomes performant the less ...
Anatole's user avatar
  • 181
0 votes
1 answer
357 views

Two-level (large category and small category) label classification problem

At present, there is an app classification task, the input is the function description of the app, and the two labels are the major category to which the app belongs and the small categories under the ...
Paul Ji's user avatar

15 30 50 per page
1
2 3 4 5 6