Scikit-Learn LinearReression Doesn't preform well, on a very simple data set,

Question

When I try the use scikit-learn LinearRegression, the model doesn't preform well, however, when I try scipy linear regression, it works perfectly, the dataset are very simple, is there a flaw in the logic or in the code?

I tried multiple linear data self_generated, all of which consisted of 1 columns for features and 1 columns for labels.

importing libraries

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from scipy import stats

generating data

X=[]
Y=[]
for i in range (100):
    X.append(2*i+3)
    Y.append(1.8*X[i]+32)
X=np.array(X,dtype=float)
Y=np.array(Y,dtype=float)

creating a model and split into test and train

reg = LinearRegression()
X_train, Y_train, X_test, Y_test = train_test_split(X, Y, test_size=0.5, random_state=0)

reshaping Test and Train since it is a single column features

X_train,X_test=(X_train.reshape(-1,1),X_test.reshape(-1,1))

fitting the training data and scoring it

reg.fit(X_train,Y_train)
reg.score(X_test,Y_test)

the score I get varies depending on the dataset size but it was never good, mostly negative,

however when I use scipy model

slope, intercept, r_value, p_value, std_err = stats.linregress(X, Y)

it works perfectly, and always find the slope 1.8 and intercept of 32

When you use stats.linregress you use the entire dataset. Why not use the same entire dataset for your sklearn model? — Mr_U4913, Commented Sep 3, 2019 at 23:29
@Mr_U4913 I did the same before I post the question with many multiple dataset slipts or entirely for training, the result was the same, — Serilena, Commented Sep 4, 2019 at 12:28

SQL Police · Accepted Answer · 2019-09-05 12:52:50Z

2

train_test_split returns the data splitted in the same order that you put the parameters, so first, return the X and then the Y. But you mixed the X and Y.

Your problem will be solved if you do this:

X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.5,random_state=0)

Scipy works because you were using the whole dataset.

edited Sep 5, 2019 at 12:52

SQL Police

4,1871 gold badge27 silver badges57 bronze badges

answered Sep 4, 2019 at 0:18

jjurado

1211 silver badge5 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Scikit-Learn LinearReression Doesn't preform well, on a very simple data set,

importing libraries

generating data

creating a model and split into test and train

reshaping Test and Train since it is a single column features

fitting the training data and scoring it

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
machine-learning
scikit-learn
linear-regression
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

importing libraries

generating data

creating a model and split into test and train

reshaping Test and Train since it is a single column features

fitting the training data and scoring it

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonmachine-learningscikit-learnlinear-regression or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
machine-learning
scikit-learn
linear-regression
or ask your own question.