Skip to main content

Questions tagged [categorical-encoding]

Representing categorical variables as sets of numerical variables. Necessary in many types of analysis for them to process categorical data. A common example is using a categorical predictor in regression/ANOVA via dummy coding, effect coding, Helmert coding, user-defined contrasts, etc.

0 votes
0 answers
23 views

Normalizing the embedding space of an encoder language model with respect to categorical data

Suppose we have a tree/hierarchy of categories (e.g. categories of products in an e-commerce website), each node being assigned a title. Assume that the title of each node is semantically accurate, ...
mtcicero's user avatar
  • 123
0 votes
0 answers
31 views

Multiple Dependent Variables, One Independent - with dummy variables

I am trying to run regression models and don't know what type of regression to be running. I have one independent variable (binary variable) and 8 dependent variables (3 discrete, 3 categorical, 2 ...
Margot's user avatar
  • 1
0 votes
0 answers
23 views

Is the intercept of a complex sum-coded regression basically useless for interpretation? Maybe even for some simple models?

In regression analysis, one may choose to code categorical variables differently depending on interpretability considerations. One such coding scheme known as sum coding (a kind of effect coding ...
nsa's user avatar
  • 272
0 votes
0 answers
21 views

Understanding softmax as an activation function, and sparsity in data and gradients

I’m working on a project that includes a probabilistic model that uses one hots, and also occasionally partially freezes weights or zeros gradients to specific regions of the weights. In some parts of ...
Danny's user avatar
  • 1
3 votes
1 answer
77 views

What is the connection between lift and logistic regression?

I have noticed that there is an interesting connection between two (apparently different) measures. I am under a market basket analysis framework (aka frequent itemset mining, both are common names) , ...
Oscar Flores's user avatar
0 votes
0 answers
19 views

Firm Fixed Effects Model dropping Sector Dummies? Potential Solution?

For my thesis, I am using panel data with stock returns and other firm data. I first used an event study to calculate abnormal returns (with event window of 7 days so 7 observations for 500 firms) ...
mek1401's user avatar
0 votes
1 answer
48 views

Indicator variables/treatment variables as an independent variable?

Can a dummy variable or treatment variable be an independent variable? My independent variable take the value 1 if a flood occurs in a specific country in a specific year and 0 if no flood happens. ...
zeinab hassano's user avatar
0 votes
0 answers
12 views

Regression model on edge list

I would like to fit a regression in which my data is links (edges) from the network and the output is weight of each link. Income level is a node attribute and for each link two nodes are involved, so ...
Jina's user avatar
  • 1
0 votes
0 answers
26 views

Why does removing the offset change the F-statistic of an anova model in R?

When a linear model with only a single categorical variables is defined without an offset, the F-statistic reported by summary() and ...
meta7's user avatar
  • 1
1 vote
1 answer
89 views

Small sample in categorical explanatory variable vs overall sample size

In a statistical model e.g. regression, we have to ensure the sample size is sufficient to estimate a given number of parameters. Rules of thumb e.g. n=10 per parameter, or a power analysis, will ...
user167591's user avatar
4 votes
2 answers
217 views

Interpretation of dummy-coded variable

I have a dummy variable, with 1 meaning the years in which an historical event took place and 0 meaning the years in which it didn't take place. I used 0 as the reference category. When the regression ...
brian's user avatar
  • 75
1 vote
1 answer
66 views

Regression with single-observation dummies: F-test under heteroskedasticity

I have a linear regression model with an intercept and a few dummy variables. Each of the dummies indicate a single observation, so the fit is perfect for these observations. Having fit the model, the ...
Richard Hardy's user avatar
3 votes
1 answer
117 views

Warning when using sparse categorical values with LightGBM

When training a LightGBM model with lgbm.train, I get the following warning: [LightGBM] [Warning] Met categorical feature which contains sparse values. Consider ...
DustByte's user avatar
  • 131
0 votes
0 answers
13 views

What is the standard performance metric for categorical data clustering?

I performed a categorical clustering with some selected UCI datasets. I one-hot encoded the features, then directly used Binomial Mixture Model and KModes using this one-hot encoded data. On the ...
NOT-A-CS-GUY's user avatar
3 votes
1 answer
40 views

Logistic regression in R: Handling mixed numerical and categorical variables

I'm attempting to fit a logistic regression model in R and need some guidance on handling both numerical and categorical variables simultaneously, especially when looking for significant explanatory ...
kabin's user avatar
  • 131

15 30 50 per page
1
2 3 4 5
57