Skip to main content

Questions tagged [scikit-learn]

A machine-learning library for Python. Use this tag for any on-topic question that (a) involves scikit-learn either as a critical part of the question or expected answer, & (b) is not just about how to use scikit-learn.

1 vote
0 answers
22 views

I have a dataset with 18 biomarker features and a target variable. I want to find the features which are having the biggest impact on the target

I Have some disease biomarker datasets that contain 18 biomarker readings from different samples and a target variable which shows presence or absence of disease (features are both categorical and ...
Alex Keir's user avatar
0 votes
0 answers
21 views

How Random Forest handle missing value in sk-learn? [duplicate]

What is the technic used in Random Forest Regressor from scikit-learn to handle missing value ? First I thought that a Random Forest regressor was able to natively handle missing value during training ...
Maxime Charrière's user avatar
0 votes
0 answers
7 views

Why the different default parameters for scikit-learn gradient boosting classifiers? (GradientBoostingClassifier and HistGradientBoostingClassifier)

Why do gradient boosting classifiers (GradientBoostingClassifier) and histogram-based gradient boosting classifiers (HistGradientBoostingClassifier) have significantly different default hyperparameter ...
Grendel13G's user avatar
0 votes
0 answers
18 views

scikit-learn CCA: x_loadings_x attribute

I'm doing a canonical correlation analysis using scikit-learn's CCA. After doing the usual steps and calling ca.x_loadings_, I see that I get values bigger than 1. ...
Hendrik's user avatar
  • 21
2 votes
1 answer
46 views

Meaning/interpretation of intercept_ in partial least squares

After using sklearn library for Partial Least Squares, I have doubts about the interpretation of the "intercept" of the model. As you can see in the code that follows, and its corresponding ...
Francisco Angel's user avatar
0 votes
0 answers
32 views

How to handle Data Normalization in case that a Logarithmic scale is required?

Let's say we wished to build a Regressor (e.g. a Support Vector Regressor) to predict the price of an asset, within a given time span from now on. However, what if the historical data we have ...
Juan Flautista De Torrepacheco's user avatar
1 vote
0 answers
26 views

What are the best options for imputing time series that is missing lots of days [closed]

I have many months of temperature data recorded roughly every ten minutes. Except it has gaps. If the gap is an hour or so, I can linearly interpolate, but if the gap is a few days this obviously ...
Tunneller's user avatar
  • 111
0 votes
1 answer
20 views

How does KNNImputer stores fitted values of the train set?

If someone here is familiar with the KNNImputer implementation of Scikit-learn, I would be eager to learn this from him. When you fit an Imputer transformer on your ...
Yann's user avatar
  • 43
0 votes
0 answers
14 views

GridSearchCV performs worse than baseline

I'm working on a binary classification problem using scikit-learn. One of the models I've tested is KNeighborsClassifier, for ...
AndreaTerenz's user avatar
4 votes
2 answers
89 views

Finding the corners of noisy polygons

I have some polygons that look for example like this: If I zoom in very close on one side, you can see the noise. The data is a list of x coordinates and a corresponding list of y coordinates. I ...
sav's user avatar
  • 239
2 votes
1 answer
74 views

Is my understanding/approach to nested cross-validation, final model tuning correct?

I am training a SVM on limited training data with unbalanced classes. Here are the things that I want to do: 1.) I want to make a statement of the generalizability ...
curious's user avatar
  • 115
1 vote
1 answer
38 views

Reason for high MSE and negative R square value

I am getting really high MSE and negative R square value. Dataset: https://docs.google.com/spreadsheets/d/1moTZS_LgOn6d74NC44i9lVcWchj-abVx/edit?usp=sharing&ouid=100514649347129021200&rtpof=...
user avatar
2 votes
1 answer
32 views

How to interpret the results of a classifier when train/test method gives much better results than cross validated one?

I need your help to understand a situation where using train and test set produces perfect results (in terms of accuracy, precision, and recall) but when cross validation is used, the accuracy on ...
letdatado's user avatar
  • 367
1 vote
0 answers
60 views

An error occurred when using the xgboost as a classifier for hiclass [closed]

Bellow it's my example when using the xgboost classifier for hiclass. My question is specifically directed to the hiClass Python package for hierarchical classification. I would like to model the ...
Ramzy's user avatar
  • 21
6 votes
1 answer
43 views

What is happening behind the scenes when we use CalibratedClassifierCV without prefit?

From what I understood by reading sklearn Probability Calibration, when we run CalibratedClassifierCV we will fit "a regressor (called a calibrator) that maps the output of the classifier (as ...
andy mot's user avatar

15 30 50 per page
1
2 3 4 5
121