Skip to main content

Questions tagged [feature-selection]

Methods and principles of selecting a subset of attributes for use in further modelling

0 votes
0 answers
14 views

Expecting your comments to structure my project?

I am developing an binary classification project. Initially I got a dataset including real data in 3290 rows and 15 columns. Then using CTGAN network I generated synthetic dataset with 100000 rows. ...
Lasantha Kulasooriya's user avatar
-1 votes
1 answer
33 views

How can I select subsets of features using neural network?

This listing selects the best features from the 1000 available columns in a given dataset. The first three columns are dropped because they are useless data. The dataset is huge. So, they were read in ...
user366312's user avatar
0 votes
0 answers
11 views

Cumulative feature importance in Random Forest taking into account past days data

I have a dataframe with past days data and current day data. Example columns [ cases , mobility, temp , rh , cases_1, mobility_1 , temp_1 , rh_1, cases_2, mobility_2, temp_2, rh_2 and so on. . ]. My ...
SHARVARI WANJARI's user avatar
0 votes
0 answers
16 views

Understanding most important features from an additional column

I'm fairly new to data science in general and I'm doing some analysis. Let us say I have N rows and D features, and I have a ...
OlorinIstari's user avatar
0 votes
0 answers
21 views

dummy features has almost the same effect as actual features

4 dummy random features (using np.random.randn) and 4 new real features (brough from some ideas) shows almost the same improvement. In cross validation, 4 dummies ...
Crispy13's user avatar
  • 133
0 votes
0 answers
26 views

How to select the best feature?

I have both binary(0=is not or 1=is) and numerical values in x_train data. The target (y_train) is binary (0=is not or 1=is). What is the best method to do feature selection simultaneously? I used ...
Mahmoud's user avatar
0 votes
0 answers
13 views

Estimating the Increase in Rademacher Complexity after Feature Selection

I'm trying to estimate how much the Rademacher complexity (or empirical Rademacher complexity) increases when performing feature selection using methods like Sequential Forward Selection or Genetic ...
x H's user avatar
  • 1
2 votes
0 answers
25 views

How to properly select features for time series ML models

I've been trying to get good references on how to solve a problem that's been bothering me regarding the modelling techniques I've used. I'm currently interested in making forecasts using ML for ...
loguimaraes's user avatar
0 votes
1 answer
27 views

how to handle a variable number of feature-values (1:many) without one-hot

I am using Catboost and one thing I notice in the guide is that it says to not preprocess to one-hot encoding. My data has a single target per row however the feature can have both thousands of values ...
tuj's user avatar
  • 101
0 votes
0 answers
15 views

Whats a suitable feature selection method for Time series data across multiple files?

My problem is basically a higher dimensional regression, where my input is (100 levels, 300 timesteps, 23 features) My goal is to build a deep learning LSTM model that finds which level the data ...
Youssef Badr's user avatar
0 votes
1 answer
14 views

Data splitting for OLS regression

This is what I have done :: divided my dataset into training and testing sets--> got significant features via. feature selection using sequential feature selector ( MLxtend) on the training set--&...
pomelo's user avatar
  • 1
1 vote
1 answer
43 views

Feature selection in binary classification

I have a dataset with two classes and am interested in learning which features are 'important' for predicting the class. There are a lot of features available and I want to find subset(s) that lead to ...
Shawn's user avatar
  • 35
0 votes
0 answers
21 views

Variable Selection and model prediction

In a supervised problem, I used randomForest for variable selection to identify the most important features. Question: am I required to use a random forest model for subsequent predictions, or can I ...
Zakaria Faouzi's user avatar
0 votes
1 answer
29 views

Missing data in train set and test set

I have a dataset of N columns. Now I'm able to preprocess data and find a subset of features that I can use to train a model and make predictions. In the case where the train data has missing feature ...
0-0's user avatar
  • 1
0 votes
0 answers
24 views

RFECV with Random Stratified K-Folds returns different features everytime

I am trying to learn more about Feature Selection in machine learning. I am working on a dataset that contains 17 features, and I have achieved about 75% accuracy on a Random Forest model with no ...
rehanqb's user avatar

15 30 50 per page
1
2 3 4 5
66