Questions tagged [one-hot-encoding]
The one-hot-encoding tag has no usage guidance.
141
questions
0
votes
1
answer
23
views
How to create consistent dummy variables in Inference code?
I am using pd.get_dummies on a categorical column to create dummy variables.
The Training pipeline is something like this
Normalization
Dummy variable Creation
...
0
votes
1
answer
30
views
SMOTE Oversampling for Text Classification with Multiple Input Features
SMOTE Oversampling for Text Classification with Multiple Input Features
I have a text classification problem where the input has 2 features: a text and a language:
the text is a string variable.
the ...
0
votes
0
answers
61
views
What feature selection method is best for a multi class classification problem with one-hot-encoded columns?
I am trying to solve a multi-class classification involving prediction the outcome of a football match (target variable = Win, Lose or Draw). With a dataset of 2280 rows, which is 6 seasons of ...
0
votes
1
answer
78
views
Use prediction after using get_dummies in pandas?
I found similar question on this topic but no answer was helpful.
I had a data frame with a categorical column in it with 5 different values. I used get_dummies and used linear regression for ...
1
vote
1
answer
77
views
Beginner basic clustering model and one-hot encoding?
I have a dataframe of natural disaster incidents in Afghanistan from 2016 - 2023.
Column names:
REGION (Northern, Eastern etc)
PROV_CODE (province)
PROV_NAME
DIST_CODE (district)
DIST_NAME
INC_DATE (...
1
vote
0
answers
42
views
sklearn - OneHotEncoding and SelectPercintile
in sklearn example there is a code
...
0
votes
0
answers
24
views
Numerical issue with softmax regression implementation on MNIST
I'm having numpy numerical issues with my implementation of softmax regression/multiclass logistic regression on the MNIST dataset.
The numpy exp and log numerical issue goes away when I divide the x ...
0
votes
0
answers
9
views
Error while using saved logistic regression model on scoring vector data -The columns of A don't match the number of elements of x. A: 6011, x: 232964
0
I'm getting error while using saved logistic regression model on scoring vector data.
SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (ProbabilisticClassificationModel$$...
0
votes
1
answer
198
views
Best practices on encoding on an increasing number of categorical variables
I'm currently using Gradient Boosting Regressor as my model to predict production risk based off a set number of features as a side-project. One of these features, ...
0
votes
0
answers
71
views
Hot-encoding warning when using gridsearch
I ran an experiment with the classical holdout method to predict price and hot-encoded categorical data. However, when optimising, I got the warning below even though that I ignored the unknown ...
1
vote
0
answers
47
views
One-Hot encoded variables dominates importance among other variables
I am currently training some machine learning models to predict the 28-day compressive strength of cement, a continuous real-valued variable. The available dataset comprises samples from three ...
1
vote
1
answer
38
views
How to prepare data if each item has multiple categories (like tags)
I'm working on a recommender system that will recommend movies to users.
Movie ratings
Movie
User
Rating
100
201
5
105
256
8
...
...
...
Movie tags
Movie
Tag
100
1
100
2
100
8
105
2
105
5
....
0
votes
2
answers
272
views
How is PCA applied to (one-hot encoded) DNA sequence data?
I realize some questions have been asked already about one-hot encoding for PCA. The answer seems to be along the lines of 'The PCA will run, but does not necessarily make sense.'
However, I have a ...
1
vote
2
answers
1k
views
Can decision trees handle Nominal Categorical variables?
I have read that decision trees can handle categorical columns without encoding them.
However, as decision trees make splits on the data, how does it handle Nominal Categorical variables?
Surely a ...
2
votes
1
answer
356
views
Multiple classes present in one-hot encoding
When dealing with classification for multiple classes present in the same sample, can the output layer have the form of one-hot encoding, but instead of only one hot, have multiple?
That is, in case ...