Questions tagged [feature-selection]
Methods and principles of selecting a subset of attributes for use in further modelling
2,424
questions
0
votes
1
answer
21
views
Would it be possible to use regularization methods as a feature selection method and then use machine learning models to analyses data?
My data is RNA-seq data with more than 14000 features and the problem is binary classification. Then the total sample is 50 and p>>n. When I use Elasticnet method with train and test data, the ...
1
vote
0
answers
22
views
I have a dataset with 18 biomarker features and a target variable. I want to find the features which are having the biggest impact on the target
I Have some disease biomarker datasets that contain 18 biomarker readings from different samples and a target variable which shows presence or absence of disease (features are both categorical and ...
0
votes
1
answer
18
views
Why using mutual information is allowed for feature selection if depends on the "scale" of entropies?
It is common to use mutual information as feature selection method. However, I fail to see why this is the case, since the mutual information $I(X, Y)$ depends on both entropies $H(X)$ and $H(Y)$ via ...
-2
votes
0
answers
29
views
What is the basic difference between "feature selection" and "feature extraction" and "dimensionality reduction"? [duplicate]
What is the basic difference between "feature selection" and "feature extraction" and "dimensionality reduction"?
I have one thousand features and one million samples in ...
1
vote
0
answers
41
views
How many samples should I use at minimum for successful feature selection?
Suppose I have a million samples and a thousand columns in a tabular dataset.
I want to run a feature subset selection algorithm on the dataset.
Loading a full dataset will overwhelm my system. So, I ...
0
votes
0
answers
11
views
How can I use a neural network to choose a subset of features from a dataset? [duplicate]
Suppose my dataset has 256 features.
Right now, what I can think of is this:
Create an NN model like this:
a. create a sequential model
b. add a Conv1D layer
c. add a flatten layer
d. add 1024 dense ...
1
vote
0
answers
11
views
Assessing Random Search Cross Validation: Tuning in ElasticNet with Large Feature Sets
I'm working on estimating an ElasticNet model for a large dataframe with over 100,000 variables, resulting in a well overidentified scenario. To tune my model, I've set up a grid of hyperparameters (...
0
votes
0
answers
21
views
What kind of classifiers we shouldn't use for feature selection?
Generally, I see that, for feature-selection, people use PSO as optimizer and inside the cost function, they use less powerful classifiers like SVC, Logistic regression, KNN, etc.
Is there a reason ...
0
votes
0
answers
15
views
Feature selection for logistic regression and random forest (using Orange - no coding)
I’m using Orange to create a prediction model for the Indian liver patient dataset (binary target variable – either has or does not have liver disease – with 580 instances and 10 features). I’m using ...
2
votes
2
answers
59
views
Looking for a modification of variable importance in ANCOVA-type glmm
This question is about a statistical concept I think should exist. I would like to know if it has a name and hopefully an R package that will implement it. It is related to variable importance/...
0
votes
0
answers
30
views
PLS Regression on data with high number of zeros in dependent variable
I want to perform a PLS regression on a data set coming from spectral images (NIRS). My goal is to relate the different spectra to the total amount of a compound. To do this, I have a dataset ...
3
votes
1
answer
37
views
What is the boundary curve for $λ_1$ and $λ_2$ that give at least a 0 component in elastic net?
Define the elastic net estimate:
$
\hat{\beta}^{\lambda_1, \lambda_2} = \arg \min_{\beta \in \mathbb{R}^p} \left( \frac{1}{2n} \| y - X\beta \|_2^2 + \lambda_1 \ \frac{1}{2} \|\beta \|_2^2 + \lambda_2 ...
1
vote
0
answers
57
views
Least-bad stepwise procedure for a simulation that shows issues with stepwise regression
I am well-aware of the issues that stepwise regression causes. I want to demonstrate some of them via simulation in a particular situation.
I am thinking of a regression where I have some categorical ...
4
votes
1
answer
175
views
Motivation for automated variable selection in case of p>n
I have written the following text as a motivation for using automated variable selection in cases where the number of variables (p) is greater than the number of observations (n). However, I am not ...
0
votes
0
answers
38
views
Evaluating Lasso's Unique Solution and its consequences in applications?
I've grasped from a paper (https://www.stat.cmu.edu/%7Eryantibs/papers/lassounique.pdf) that Lasso may not yield a unique solution when the number of variables (p) exceeds the number of observations (...