Questions tagged [categorical-encoding]

Ask Question

Representing categorical variables as sets of numerical variables. Necessary in many types of analysis for them to process categorical data. A common example is using a categorical predictor in regression/ANOVA via dummy coding, effect coding, Helmert coding, user-defined contrasts, etc.

843 questions

0 votes

0 answers

23 views

Normalizing the embedding space of an encoder language model with respect to categorical data

Suppose we have a tree/hierarchy of categories (e.g. categories of products in an e-commerce website), each node being assigned a title. Assume that the title of each node is semantically accurate, ...

mtcicero

asked Jul 15 at 22:21

0 votes

0 answers

31 views

Multiple Dependent Variables, One Independent - with dummy variables

I am trying to run regression models and don't know what type of regression to be running. I have one independent variable (binary variable) and 8 dependent variables (3 discrete, 3 categorical, 2 ...

Margot

asked Jun 30 at 10:24

0 votes

0 answers

23 views

Is the intercept of a complex sum-coded regression basically useless for interpretation? Maybe even for some simple models?

In regression analysis, one may choose to code categorical variables differently depending on interpretability considerations. One such coding scheme known as sum coding (a kind of effect coding ...

nsa

asked Jun 6 at 22:46

0 votes

0 answers

21 views

Understanding softmax as an activation function, and sparsity in data and gradients

I’m working on a project that includes a probabilistic model that uses one hots, and also occasionally partially freezes weights or zeros gradients to specific regions of the weights. In some parts of ...

Danny

asked May 13 at 10:07

3 votes

1 answer

77 views

What is the connection between lift and logistic regression?

I have noticed that there is an interesting connection between two (apparently different) measures. I am under a market basket analysis framework (aka frequent itemset mining, both are common names) , ...

Oscar Flores

asked May 12 at 19:10

0 votes

0 answers

19 views

Firm Fixed Effects Model dropping Sector Dummies? Potential Solution?

For my thesis, I am using panel data with stock returns and other firm data. I first used an event study to calculate abnormal returns (with event window of 7 days so 7 observations for 500 firms) ...

mek1401

asked May 12 at 12:31

0 votes

1 answer

48 views

Indicator variables/treatment variables as an independent variable?

Can a dummy variable or treatment variable be an independent variable? My independent variable take the value 1 if a flood occurs in a specific country in a specific year and 0 if no flood happens. ...

zeinab hassano

asked May 9 at 18:12

0 votes

0 answers

12 views

Regression model on edge list

I would like to fit a regression in which my data is links (edges) from the network and the output is weight of each link. Income level is a node attribute and for each link two nodes are involved, so ...

Jina

asked Apr 19 at 10:48

0 votes

0 answers

26 views

Why does removing the offset change the F-statistic of an anova model in R?

When a linear model with only a single categorical variables is defined without an offset, the F-statistic reported by summary() and ...

meta7

asked Apr 6 at 13:17

1 vote

1 answer

89 views

Small sample in categorical explanatory variable vs overall sample size

In a statistical model e.g. regression, we have to ensure the sample size is sufficient to estimate a given number of parameters. Rules of thumb e.g. n=10 per parameter, or a power analysis, will ...

user167591

asked Mar 26 at 4:23

4 votes

2 answers

217 views

Interpretation of dummy-coded variable

I have a dummy variable, with 1 meaning the years in which an historical event took place and 0 meaning the years in which it didn't take place. I used 0 as the reference category. When the regression ...

brian

asked Mar 25 at 9:46

1 vote

1 answer

66 views

Regression with single-observation dummies: F-test under heteroskedasticity

I have a linear regression model with an intercept and a few dummy variables. Each of the dummies indicate a single observation, so the fit is perfect for these observations. Having fit the model, the ...

Richard Hardy

68.6k

asked Mar 24 at 16:54

3 votes

1 answer

117 views

Warning when using sparse categorical values with LightGBM

When training a LightGBM model with lgbm.train, I get the following warning: [LightGBM] [Warning] Met categorical feature which contains sparse values. Consider ...

DustByte

asked Mar 22 at 10:04

0 votes

0 answers

13 views

What is the standard performance metric for categorical data clustering?

I performed a categorical clustering with some selected UCI datasets. I one-hot encoded the features, then directly used Binomial Mixture Model and KModes using this one-hot encoded data. On the ...

NOT-A-CS-GUY

asked Mar 21 at 13:20

3 votes

1 answer

40 views

Logistic regression in R: Handling mixed numerical and categorical variables

I'm attempting to fit a logistic regression model in R and need some guidance on handling both numerical and categorical variables simultaneously, especially when looking for significant explanatory ...

kabin

asked Feb 29 at 15:53

15 30 50 per page

2 3 4 5

…

57 Next

Stack Exchange Network

Questions tagged [categorical-encoding]

Normalizing the embedding space of an encoder language model with respect to categorical data

Multiple Dependent Variables, One Independent - with dummy variables

Is the intercept of a complex sum-coded regression basically useless for interpretation? Maybe even for some simple models?

Understanding softmax as an activation function, and sparsity in data and gradients

What is the connection between lift and logistic regression?

Firm Fixed Effects Model dropping Sector Dummies? Potential Solution?

Indicator variables/treatment variables as an independent variable?

Regression model on edge list

Why does removing the offset change the F-statistic of an anova model in R?

Small sample in categorical explanatory variable vs overall sample size

Interpretation of dummy-coded variable

Regression with single-observation dummies: F-test under heteroskedasticity

Warning when using sparse categorical values with LightGBM

What is the standard performance metric for categorical data clustering?

Logistic regression in R: Handling mixed numerical and categorical variables

Hot Network Questions

Questions tagged [categorical-encoding]

Related Tags