Skip to main content

Questions tagged [one-hot-encoding]

The tag has no usage guidance.

0 votes
1 answer
23 views

How to create consistent dummy variables in Inference code?

I am using pd.get_dummies on a categorical column to create dummy variables. The Training pipeline is something like this Normalization Dummy variable Creation ...
Sociopath's user avatar
  • 1,253
0 votes
1 answer
30 views

SMOTE Oversampling for Text Classification with Multiple Input Features

SMOTE Oversampling for Text Classification with Multiple Input Features I have a text classification problem where the input has 2 features: a text and a language: the text is a string variable. the ...
Sandra Sukarieh's user avatar
0 votes
0 answers
61 views

What feature selection method is best for a multi class classification problem with one-hot-encoded columns?

I am trying to solve a multi-class classification involving prediction the outcome of a football match (target variable = Win, Lose or Draw). With a dataset of 2280 rows, which is 6 seasons of ...
pastybake2002's user avatar
0 votes
1 answer
78 views

Use prediction after using get_dummies in pandas?

I found similar question on this topic but no answer was helpful. I had a data frame with a categorical column in it with 5 different values. I used get_dummies and used linear regression for ...
Ali.A's user avatar
  • 73
1 vote
1 answer
77 views

Beginner basic clustering model and one-hot encoding?

I have a dataframe of natural disaster incidents in Afghanistan from 2016 - 2023. Column names: REGION (Northern, Eastern etc) PROV_CODE (province) PROV_NAME DIST_CODE (district) DIST_NAME INC_DATE (...
Mas's user avatar
  • 55
1 vote
0 answers
42 views

sklearn - OneHotEncoding and SelectPercintile

in sklearn example there is a code ...
Maciej778's user avatar
0 votes
0 answers
24 views

Numerical issue with softmax regression implementation on MNIST

I'm having numpy numerical issues with my implementation of softmax regression/multiclass logistic regression on the MNIST dataset. The numpy exp and log numerical issue goes away when I divide the x ...
KaizerBox's user avatar
0 votes
0 answers
9 views

Error while using saved logistic regression model on scoring vector data -The columns of A don't match the number of elements of x. A: 6011, x: 232964

0 I'm getting error while using saved logistic regression model on scoring vector data. SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (ProbabilisticClassificationModel$$...
Kunal Sinha's user avatar
0 votes
1 answer
198 views

Best practices on encoding on an increasing number of categorical variables

I'm currently using Gradient Boosting Regressor as my model to predict production risk based off a set number of features as a side-project. One of these features, ...
Andrew Narvaez's user avatar
0 votes
0 answers
71 views

Hot-encoding warning when using gridsearch

I ran an experiment with the classical holdout method to predict price and hot-encoded categorical data. However, when optimising, I got the warning below even though that I ignored the unknown ...
Aze 's user avatar
  • 1
1 vote
0 answers
47 views

One-Hot encoded variables dominates importance among other variables

I am currently training some machine learning models to predict the 28-day compressive strength of cement, a continuous real-valued variable. The available dataset comprises samples from three ...
Felipe's user avatar
  • 21
1 vote
1 answer
38 views

How to prepare data if each item has multiple categories (like tags)

I'm working on a recommender system that will recommend movies to users. Movie ratings Movie User Rating 100 201 5 105 256 8 ... ... ... Movie tags Movie Tag 100 1 100 2 100 8 105 2 105 5 ....
Silver Light's user avatar
0 votes
2 answers
272 views

How is PCA applied to (one-hot encoded) DNA sequence data?

I realize some questions have been asked already about one-hot encoding for PCA. The answer seems to be along the lines of 'The PCA will run, but does not necessarily make sense.' However, I have a ...
Chris_abc's user avatar
1 vote
2 answers
1k views

Can decision trees handle Nominal Categorical variables?

I have read that decision trees can handle categorical columns without encoding them. However, as decision trees make splits on the data, how does it handle Nominal Categorical variables? Surely a ...
Connor's user avatar
  • 661
2 votes
1 answer
356 views

Multiple classes present in one-hot encoding

When dealing with classification for multiple classes present in the same sample, can the output layer have the form of one-hot encoding, but instead of only one hot, have multiple? That is, in case ...
smone's user avatar
  • 23

15 30 50 per page
1
2 3 4 5
10