Skip to main content

Questions tagged [scikit-learn]

scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.

0 votes
0 answers
8 views

Imbalanced Cost-Sensitive Learning Workflow - How to split the data, tune hyperparameters and apply adecision threshold?

I am facing a problem with imbalanced dataset in which I would like to detect the rare event. My questions are more of general strategy about the whole workflow and I would like to hear your thoughts ...
GeorgeM's user avatar
1 vote
2 answers
40 views

How do sklearn's trees evaluate NaNs on inference?

Imagine we have fitted a sklearn.tree.DecisionTreeClassifier object like this one: If we wanted to predict the class of this observation: ...
Tendero's user avatar
  • 255
0 votes
0 answers
10 views

What's the difference between my OLS from scratch vs sklearn's OLS?

I'm coding linear regression via OLS from scratch. When I compare the results to scikit-learn's implementation, the coefficients in my version appear to be twice the magnitude of scikit-learn's. I'm ...
vxnuaj's user avatar
  • 11
0 votes
1 answer
14 views

ValueError: Found input variables with inconsistent numbers of samples: [0, 6]

I am trying to fit some data inside an algorithm, but i am getting this error: ValueError: Found input variables with inconsistent numbers of samples: [0, 6] How i ...
Filipy's user avatar
  • 1
0 votes
1 answer
20 views

Sklearn EstimatorCV vs GridSearchCV

sklearn has the following description for EstimatorCV estimators: https://scikit-learn.org/stable/glossary.html#term-cross-validation-estimator An estimator that has built-in cross-validation ...
wannabedatascientist's user avatar
1 vote
2 answers
45 views

What is the most elegant way to produce Linear Regression metrics such as p-values after fitting with sklearn?

Excel provides a full printout of betas, p-values, F-values, R², etc. However, when I use sklearn's LinearRegression, it does not calculate p-values for me. I then have to use statsmodels to get all ...
user164819's user avatar
0 votes
0 answers
33 views

How to predict price behavior according to model predictions for a week ahead?

I wrote the simplest linear regression model (I'm a noob, please don't scold me; this is my first model) to predict the price of solana, I would like to get some advice or tips on how to improve. The ...
bobrya_ziben's user avatar
0 votes
0 answers
15 views

LeaveOneOut CV for Bandwidth selection of Kernel Density Estimation

I've taken this code in order to try optimization of bandwidth_selection with GridSearchCV (while implementing LeaveOneOut logics within this CV: "LeaveOneOut() is equivalent to KFold(n_splits=n)&...
JeeyCi's user avatar
  • 133
0 votes
1 answer
11 views

How to tune the classification threshold in a cost-sensitive manner?

I have trained a classifier outputting probabilities for each class. I want to tune the decision threshold in such a way that it accounts for different costs/gains assigned to false positives ($FP$), $...
MuhammedYunus's user avatar
1 vote
1 answer
32 views

Unexpected behaviour of Scikit-Learn SVR

I'm using Scikit-learn to fit a support vector regression on a really simple dataset of car stopping distances vs car speed. My code for applying SVR to this dataset is: ...
oweydd's user avatar
  • 113
4 votes
2 answers
94 views

Loss function in Isolation Forest

I have recently came across on this algorithm and was working on my graduation project. As per my understanding, we creates sub trees for each sub samples. Then we calculates the scores for each ...
Mayank Singh's user avatar
0 votes
0 answers
67 views

Confused with Isolation Forest

Let say, I have the anomaly detection (unsupervised learning) dataset with 10 observations (two features). The datasets is like below: After executing the model, following are the results (anomalies ...
Bits's user avatar
  • 131
2 votes
2 answers
396 views

How can I fit sklearn.svm.SVC with three features, given that the features are actually arrays of lengths 128, 12 and 40?

To clarify, each instance of feature_1 is a 128 item long array, each instance of feature_2 is a 12 item long array, and each instance of feature_3 is a 40 item long array. I am currently simply doing ...
Karn Varshneya's user avatar
1 vote
0 answers
26 views

Interpreting the SHAP values presented in layered violin plot in SHAP-library for Scikit binary RandomForestClassifier

I am using the SHAP-library for computing feature Shapley values for a binary RandomForestClassifier which has naturally two outputs, 0 or 1. The forest itself consists from 100 decision tree ...
jjepsuomi's user avatar
  • 111
0 votes
1 answer
51 views

How can I improve my predictive model?

Here is my interpretation of my model so far, I am investigating the relationship between ratings and followers on video games, but there is a problem. The more you get high ratings, the more you get ...
Hugo Guay's user avatar

15 30 50 per page
1
2 3 4 5
155