Questions tagged [scikit-learn]
scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
2,319
questions
0
votes
0
answers
8
views
Imbalanced Cost-Sensitive Learning Workflow - How to split the data, tune hyperparameters and apply adecision threshold?
I am facing a problem with imbalanced dataset in which I would like to detect the rare event. My questions are more of general strategy about the whole workflow and I would like to hear your thoughts ...
1
vote
2
answers
40
views
How do sklearn's trees evaluate NaNs on inference?
Imagine we have fitted a sklearn.tree.DecisionTreeClassifier object like this one:
If we wanted to predict the class of this observation:
...
0
votes
0
answers
10
views
What's the difference between my OLS from scratch vs sklearn's OLS?
I'm coding linear regression via OLS from scratch. When I compare the results to scikit-learn's implementation, the coefficients in my version appear to be twice the magnitude of scikit-learn's.
I'm ...
0
votes
1
answer
14
views
ValueError: Found input variables with inconsistent numbers of samples: [0, 6]
I am trying to fit some data inside an algorithm, but i am getting this error:
ValueError: Found input variables with inconsistent numbers of samples: [0, 6]
How i ...
0
votes
1
answer
20
views
Sklearn EstimatorCV vs GridSearchCV
sklearn has the following description for EstimatorCV estimators:
https://scikit-learn.org/stable/glossary.html#term-cross-validation-estimator
An estimator that has built-in cross-validation ...
1
vote
2
answers
45
views
What is the most elegant way to produce Linear Regression metrics such as p-values after fitting with sklearn?
Excel provides a full printout of betas, p-values, F-values, R², etc. However, when I use sklearn's LinearRegression, it does not calculate p-values for me. I then have to use statsmodels to get all ...
0
votes
0
answers
33
views
How to predict price behavior according to model predictions for a week ahead?
I wrote the simplest linear regression model (I'm a noob, please don't scold me; this is my first model) to predict the price of solana, I would like to get some advice or tips on how to improve. The ...
0
votes
0
answers
15
views
LeaveOneOut CV for Bandwidth selection of Kernel Density Estimation
I've taken this code in order to try optimization of bandwidth_selection with GridSearchCV (while implementing LeaveOneOut logics within this CV: "LeaveOneOut() is equivalent to KFold(n_splits=n)&...
0
votes
1
answer
11
views
How to tune the classification threshold in a cost-sensitive manner?
I have trained a classifier outputting probabilities for each class. I want to tune the decision threshold in such a way that it accounts for different costs/gains assigned to false positives ($FP$), $...
1
vote
1
answer
32
views
Unexpected behaviour of Scikit-Learn SVR
I'm using Scikit-learn to fit a support vector regression on a really simple dataset of car stopping distances vs car speed.
My code for applying SVR to this dataset is:
...
4
votes
2
answers
94
views
Loss function in Isolation Forest
I have recently came across on this algorithm and was working on my graduation project.
As per my understanding, we creates sub trees for each sub samples. Then we calculates the scores for each ...
0
votes
0
answers
67
views
Confused with Isolation Forest
Let say, I have the anomaly detection (unsupervised learning) dataset with 10 observations (two features). The datasets is like below:
After executing the model, following are the results (anomalies ...
2
votes
2
answers
396
views
How can I fit sklearn.svm.SVC with three features, given that the features are actually arrays of lengths 128, 12 and 40?
To clarify, each instance of feature_1 is a 128 item long array, each instance of feature_2 is a 12 item long array, and each instance of feature_3 is a 40 item long array. I am currently simply doing ...
1
vote
0
answers
26
views
Interpreting the SHAP values presented in layered violin plot in SHAP-library for Scikit binary RandomForestClassifier
I am using the SHAP-library for computing feature Shapley values for a binary RandomForestClassifier which has naturally two outputs, 0 or 1. The forest itself consists from 100 decision tree ...
0
votes
1
answer
51
views
How can I improve my predictive model?
Here is my interpretation of my model so far, I am investigating the relationship between ratings and followers on video games, but there is a problem. The more you get high ratings, the more you get ...