Questions tagged [categorical-encoding]
Representing categorical variables as sets of numerical variables. Necessary in many types of analysis for them to process categorical data. A common example is using a categorical predictor in regression/ANOVA via dummy coding, effect coding, Helmert coding, user-defined contrasts, etc.
843
questions
0
votes
0
answers
23
views
Normalizing the embedding space of an encoder language model with respect to categorical data
Suppose we have a tree/hierarchy of categories (e.g. categories of products in an e-commerce website), each node being assigned a title. Assume that the title of each node is semantically accurate, ...
0
votes
0
answers
31
views
Multiple Dependent Variables, One Independent - with dummy variables
I am trying to run regression models and don't know what type of regression to be running.
I have one independent variable (binary variable) and 8 dependent variables (3 discrete, 3 categorical, 2 ...
0
votes
0
answers
23
views
Is the intercept of a complex sum-coded regression basically useless for interpretation? Maybe even for some simple models?
In regression analysis, one may choose to code categorical variables differently depending on interpretability considerations. One such coding scheme known as sum coding (a kind of effect coding ...
0
votes
0
answers
21
views
Understanding softmax as an activation function, and sparsity in data and gradients
I’m working on a project that includes a probabilistic model that uses one hots, and also occasionally partially freezes weights or zeros gradients to specific regions of the weights. In some parts of ...
3
votes
1
answer
77
views
What is the connection between lift and logistic regression?
I have noticed that there is an interesting connection between two (apparently different) measures. I am under a market basket analysis framework (aka frequent itemset mining, both are common names) , ...
0
votes
0
answers
19
views
Firm Fixed Effects Model dropping Sector Dummies? Potential Solution?
For my thesis, I am using panel data with stock returns and other firm data. I first used an event study to calculate abnormal returns (with event window of 7 days so 7 observations for 500 firms) ...
0
votes
1
answer
48
views
Indicator variables/treatment variables as an independent variable?
Can a dummy variable or treatment variable be an independent variable? My independent variable take the value 1 if a flood occurs in a specific country in a specific year and 0 if no flood happens. ...
0
votes
0
answers
12
views
Regression model on edge list
I would like to fit a regression in which my data is links (edges) from the network and the output is weight of each link.
Income level is a node attribute and for each link two nodes are involved, so ...
0
votes
0
answers
26
views
Why does removing the offset change the F-statistic of an anova model in R?
When a linear model with only a single categorical variables is defined without an offset, the F-statistic reported by summary() and ...
1
vote
1
answer
89
views
Small sample in categorical explanatory variable vs overall sample size
In a statistical model e.g. regression, we have to ensure the sample size is sufficient to estimate a given number of parameters. Rules of thumb e.g. n=10 per parameter, or a power analysis, will ...
4
votes
2
answers
217
views
Interpretation of dummy-coded variable
I have a dummy variable, with 1 meaning the years in which an historical event took place and 0 meaning the years in which it didn't take place. I used 0 as the reference category. When the regression ...
1
vote
1
answer
66
views
Regression with single-observation dummies: F-test under heteroskedasticity
I have a linear regression model with an intercept and a few dummy variables. Each of the dummies indicate a single observation, so the fit is perfect for these observations. Having fit the model, the ...
3
votes
1
answer
117
views
Warning when using sparse categorical values with LightGBM
When training a LightGBM model with lgbm.train, I get the following warning:
[LightGBM] [Warning] Met categorical feature which contains sparse values. Consider ...
0
votes
0
answers
13
views
What is the standard performance metric for categorical data clustering?
I performed a categorical clustering with some selected UCI datasets. I one-hot encoded the features, then directly used Binomial Mixture Model and KModes using this one-hot encoded data. On the ...
3
votes
1
answer
40
views
Logistic regression in R: Handling mixed numerical and categorical variables
I'm attempting to fit a logistic regression model in R and need some guidance on handling both numerical and categorical variables simultaneously, especially when looking for significant explanatory ...