Questions tagged [classification]
An instance of supervised learning that identifies the category or categories which a new instance of dataset belongs.
197
questions
265
votes
10
answers
435k
views
How to set class weights for imbalanced classes in Keras?
I know that there is a possibility in Keras with the class_weights parameter dictionary at fitting, but I couldn't find any example. Would somebody so kind to ...
42
votes
6
answers
54k
views
Unbalanced multiclass data with XGBoost
I have 3 classes with this distribution:
Class 0: 0.1169
Class 1: 0.7668
Class 2: 0.1163
And I am using xgboost for ...
35
votes
4
answers
16k
views
Quick guide into training highly imbalanced data sets
I have a classification problem with approximately 1000 positive and 10000 negative samples in training set. So this data set is quite unbalanced. Plain random forest is just trying to mark all test ...
31
votes
1
answer
33k
views
How is a splitting point chosen for continuous variables in decision trees?
I have two questions related to decision trees:
If we have a continuous attribute, how do we choose the splitting value?
Example: Age=(20,29,50,40....)
Imagine that we have a continuous attribute $f$...
15
votes
2
answers
711
views
Why does data science see class imbalance as a problem for supervised learning when statistics does not?
Why does data science see class imbalance as a problem in supervised learning when statistics says it is not?
Data science seems to seem class imbalance as problematic and needing special techniques ...
2
votes
1
answer
741
views
Can a decision in a node of a decision tree be based on comparison between 2 columns of the dataset?
Assume the features in the dataframe are columns - A,B,C and my target is Y
Can my decision tree have a decision node which looks for say, ...
12
votes
1
answer
5k
views
Using a pre trained CNN classifier and apply it on a different image dataset
How would you optimize a pre-trained neural network to apply it to a separate problem? Would you just add more layers to the pre-trained model and test it on your ...
10
votes
1
answer
4k
views
Can The linearly non-separable data be learned using polynomial features with logistic regression?
I know that Polynomial Logistic Regression can easily learn a typical data like the following image:
I was wondering whether the following two data also can be ...
4
votes
2
answers
6k
views
Imbalanced Dataset: Train/test split before and after SMOTE
This question is similar but different from my previous one. I have a binary classification task related to customer churn for a bank. The dataset contains 10,000 instances and 11 features. The target ...
16
votes
2
answers
37k
views
How to calculate VC-dimension?
Im studying machine learning, and I would like to know how to calculate VC-dimension.
For example:
$h(x)=\begin{cases} 1 &\mbox{if } a\leq x \leq b \\
0 & \mbox{else } \end{cases} $, with ...
7
votes
2
answers
2k
views
Doesn't over(/under)sampling an imbalanced dataset cause issues?
I'm reading a lot about how to use different metrics specifically for imbalanced datasets (e.g. two classes present, but 80% of the data is one class) and how to tackle the issue of imbalanced ...
3
votes
1
answer
132
views
Class imbalance strategies
When dealing with the class imbalance problem in a binary classifier, there are three ways I know of to address it: over-sampling, under-sampling and using cost-sensitive methods.
Are there any ...
3
votes
4
answers
808
views
What is the difference between classification and regression?
I understand classification....a discrete response or category, like animal is dog or cat.
The author says..."Regression techniques predict continuous changes such as the change in temperature, power ...
3
votes
1
answer
589
views
When should I oversample data?
I am dealing with multi-class classifiers. My data is unbalanced. Hence, I need to apply sampling techniques before training (undersampling or oversampling). When I apply undersampling, ...
2
votes
2
answers
2k
views
Explain Binary Classification with output 0.5 (True)
What is the interpretation of output 0.5 of a typical classifier?
I made a prediction and the probability of that data point being from the True class is 0.5.