Questions tagged [categorical-data]
Categorical data can take on a limited (usually fixed) number of possible values called categories. Categorical values "label", they do not "measure". Nominal and dichotomous/binary scale types are categorical. Some people consider ordinal scale categorical too.
398
questions
0
votes
0
answers
6
views
What's the right machine learning approach to mark rubrics based on sequences of data?
I'm a teacher and I'm working on a pet project to help streamline some of my assessment workflows for my students. One of those workflows is gathering data on student progress in the form of a rubric ...
0
votes
1
answer
15
views
How does encoding categorical featured with Target Encoding variants work when the Target feature is continuous?
I have been reading about target encoding and its variants like Leave One Out, James Stein, etc. and in all cases, the target variable itself is usually binary (or can be divided into categories).
How ...
1
vote
1
answer
29
views
Chi-square test results interpretation
I am comparing with Chi Square the distributions of two categorical variables. Both have the same number of classes. After counting each class per variable, I obtain very similar counts but the p-...
3
votes
1
answer
808
views
How does a Neural Net handle an unseen class for a Categorical Feature?
Let's say I train a Neural Net, and I have a Categorical Feature X.
During training, there are only 3 classes seen in feature X; A, B, C.
Now, let's say I want to make predictions from this trained ...
0
votes
0
answers
18
views
Model Architecture for Time-Series Forecasting with Categorical and Multivariate Data
Context:
I was looking at using an LSTM model to forecast the amount of gold gained for each of 10 heroes in a game of Dota 2, a MOBA game, as a base model in some type of model architecture. The game ...
0
votes
1
answer
26
views
Dealing with only categorical features dataset
I'm trying to do multi-class classification on a labeled dataset with purely categorical features. There are around 30 features in total. 3 of the features in particular have around 100 unique values (...
0
votes
0
answers
3
views
Options for representing the following "conditional rules" in some data, and if categorical data science would be helpful?
I am new to data science, so I am hoping the following question is a reasonably elementary exercise for someone more experienced.
Let us say I have $n$ categories of data. Each category is a ...
0
votes
0
answers
28
views
Visualise intersections of group membership (several low-cardinality variables)
I need to visualise joint and marginal frequencies of several low-cardinality categorical variables. Equivalently, I want to visualise sizes of groups and their intersections, where membership in some ...
3
votes
0
answers
39
views
How to predict multi-variate time-series from different samples [closed]
I'm having issues seeing the best way to predict a time-series when training on a dataset with different samples.
I have a dataset that shows the weight of 10 rabbits from their first day to their ...
1
vote
1
answer
38
views
How to deal with categorical disalignment in test and train in binary classification problems
I have a train and test datasets (600k observations) that have different categories for the same categorical variable.
For example train has the categorical variable Letters having unique categories ...
0
votes
0
answers
19
views
Feature selection on datasets with both categorical and numerical features
I'm proposing a novel methodology for feature selection in the context of tabular datasets that contain both numerical and categorical features. In order to prove the efficacy of my methodology, I ...
4
votes
1
answer
853
views
Decision Tree only splits to the left
I can’t really understand, why my decision tree only splits to the left.
I originally have 2 categorical features (further named feature 0 and 1), which I concat to one feature since feature 1 is ...
0
votes
0
answers
13
views
classification using simple relationships between time series data
I am looking to predict which courses are taught by which university professors at my school. More specifically, for each semester and professor I want to know the probability breakdown of which ...
0
votes
0
answers
67
views
Combining Textual, Categorical and Numerical data for Semantic Search using SentenceTransformers model
I'm building a food semantic search model and I want to use a pre-trained SentenceTransformers model with cosine similarity. I'm using Epicurious dataset for the corpus which consists of textual (&...
0
votes
0
answers
60
views
Training Biased/Uneven Categorical Data with CatBoost, Unbalanced/Unseen Categories Handling
Summary:
I am training a discount eligibility model where the dataset represents historical data for products where people availed discounts based on simple features like product category, discount ...