When I try the use scikit-learn LinearRegression
, the model doesn't preform well, however, when I try scipy linear regression, it works perfectly,
the dataset are very simple, is there a flaw in the logic or in the code?
I tried multiple linear data self_generated, all of which consisted of 1 columns for features and 1 columns for labels.
importing libraries
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from scipy import stats
generating data
X=[]
Y=[]
for i in range (100):
X.append(2*i+3)
Y.append(1.8*X[i]+32)
X=np.array(X,dtype=float)
Y=np.array(Y,dtype=float)
creating a model and split into test and train
reg = LinearRegression()
X_train, Y_train, X_test, Y_test = train_test_split(X, Y, test_size=0.5, random_state=0)
reshaping Test and Train since it is a single column features
X_train,X_test=(X_train.reshape(-1,1),X_test.reshape(-1,1))
fitting the training data and scoring it
reg.fit(X_train,Y_train)
reg.score(X_test,Y_test)
the score I get varies depending on the dataset size but it was never good, mostly negative,
however when I use scipy model
slope, intercept, r_value, p_value, std_err = stats.linregress(X, Y)
it works perfectly, and always find the slope 1.8 and intercept of 32
stats.linregress
you use the entire dataset. Why not use the sameentire
dataset for your sklearn model?