Questions tagged [feature-selection]
Methods and principles of selecting a subset of attributes for use in further modelling
978
questions
0
votes
0
answers
14
views
Expecting your comments to structure my project?
I am developing an binary classification project. Initially I got a dataset including real data in 3290 rows and 15 columns. Then using CTGAN network I generated synthetic dataset with 100000 rows. ...
-1
votes
1
answer
33
views
How can I select subsets of features using neural network?
This listing selects the best features from the 1000 available columns in a given dataset.
The first three columns are dropped because they are useless data.
The dataset is huge. So, they were read in ...
0
votes
0
answers
11
views
Cumulative feature importance in Random Forest taking into account past days data
I have a dataframe with past days data and current day data. Example columns [ cases , mobility, temp , rh , cases_1, mobility_1 , temp_1 , rh_1, cases_2, mobility_2, temp_2, rh_2 and so on. . ]. My ...
0
votes
0
answers
16
views
Understanding most important features from an additional column
I'm fairly new to data science in general and I'm doing some analysis. Let us say I have N rows and D features, and I have a ...
0
votes
0
answers
21
views
dummy features has almost the same effect as actual features
4 dummy random features (using np.random.randn) and 4 new real features (brough from some ideas) shows almost the same improvement.
In cross validation, 4 dummies ...
0
votes
0
answers
26
views
How to select the best feature?
I have both binary(0=is not or 1=is) and numerical values in x_train data. The target (y_train) is binary (0=is not or 1=is). What is the best method to do feature selection simultaneously?
I used ...
0
votes
0
answers
13
views
Estimating the Increase in Rademacher Complexity after Feature Selection
I'm trying to estimate how much the Rademacher complexity (or empirical Rademacher complexity) increases when performing feature selection using methods like Sequential Forward Selection or Genetic ...
2
votes
0
answers
25
views
How to properly select features for time series ML models
I've been trying to get good references on how to solve a problem that's been bothering me regarding the modelling techniques I've used. I'm currently interested in making forecasts using ML for ...
0
votes
1
answer
27
views
how to handle a variable number of feature-values (1:many) without one-hot
I am using Catboost and one thing I notice in the guide is that it says to not preprocess to one-hot encoding.
My data has a single target per row however the feature can have both thousands of values ...
0
votes
0
answers
15
views
Whats a suitable feature selection method for Time series data across multiple files?
My problem is basically a higher dimensional regression, where my input is (100 levels, 300 timesteps, 23 features)
My goal is to build a deep learning LSTM model that finds which level the data ...
0
votes
1
answer
14
views
Data splitting for OLS regression
This is what I have done ::
divided my dataset into training and testing sets--> got significant features via. feature selection using sequential feature selector ( MLxtend) on the training set--&...
1
vote
1
answer
43
views
Feature selection in binary classification
I have a dataset with two classes and am interested in learning which features are 'important' for predicting the class. There are a lot of features available and I want to find subset(s) that lead to ...
0
votes
0
answers
21
views
Variable Selection and model prediction
In a supervised problem, I used randomForest for variable selection to identify the most important features.
Question: am I required to use a random forest model for subsequent predictions, or can I ...
0
votes
1
answer
29
views
Missing data in train set and test set
I have a dataset of N columns. Now I'm able to preprocess data and find a subset of features that I can use to train a model and make predictions. In the case where the train data has missing feature ...
0
votes
0
answers
24
views
RFECV with Random Stratified K-Folds returns different features everytime
I am trying to learn more about Feature Selection in machine learning. I am working on a dataset that contains 17 features, and I have achieved about 75% accuracy on a Random Forest model with no ...