3

I am building a neural network with my research data in two ways: with a statistical programm (SPSS) and with python. I am using the scikit learn MLPRegressor. The problem I have is that whereas my code is , apparently, well written (because it runs), the results do not make sense. The r2score should be around 0.70 ( it is-4147.64) and the correlation represented in the graph should be almost linear. (it is just a straight line at a constant distance from X axis). Also the x and y axis should have values ranging from 0 to 180, which is not the case ( X from 20 to 100, y from -4100 to -3500)

If any of you can give a hand I would really appreciate it. Thank you!!!!!!

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn import neighbors, datasets, preprocessing 
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score

vhdata = pd.read_csv('vhrawdata.csv')
vhdata.head()

X = vhdata[['PA NH4', 'PH NH4', 'PA K', 'PH K', 'PA NH4 + PA K', 'PH NH4 + PH K', 'PA IS', 'PH IS']]
y = vhdata['PMI']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
X_train_norm = scaler.transform(X_train)
X_test_norm = scaler.transform(X_test)

nnref = MLPRegressor(hidden_layer_sizes = [4], activation = 'logistic', solver = 'sgd', alpha = 1, 
                     learning_rate= 'constant', learning_rate_init= 0.6, max_iter=40000, momentum= 
                     0.3).fit(X_train, y_train)

y_predictions= nnref.predict(X_test)

print('Accuracy of NN classifier on training set (R2 score): {:.2f}'.format(nnref.score(X_train_norm, y_train)))
print('Accuracy of NN classifier on test set (R2 score): {:.2f}'.format(nnref.score(X_test_norm, y_test)))

plt.figure()
plt.scatter(y_test,y_predictions, marker = 'o', color='red')
plt.xlabel('PMI expected (hrs)')
plt.ylabel('PMI predicted (hrs)')
plt.title('Correlation of PMI predicted by MLP regressor and the actual PMI')
plt.show()

2 Answers 2

3

You have a couple of issues. First, it is important to use the right scaler or normalization to work with an MLP. NNs work best between 0 and 1, so consider using sklearn's MinMaxScaler to accomplish this.

So:

from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
X_train_norm = scaler.transform(X_train)
X_test_norm = scaler.transform(X_test)

Should be:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_norm = scaler.fit_transform(X_train)
X_test_norm = scaler.fit_transform(X_test)

Next, you are training and testing on the unscaled data, but then performing your scores on the scaled data. Meaning:

nnref = MLPRegressor(hidden_layer_sizes = [4], activation = 'logistic', solver = 'sgd', alpha = 1, 
                     learning_rate= 'constant', learning_rate_init= 0.6, max_iter=40000, momentum= 
                     0.3).fit(X_train, y_train)

should be:

nnref = MLPRegressor(hidden_layer_sizes = [4], activation = 'logistic', solver = 'sgd', alpha = 1, 
                     learning_rate= 'constant', learning_rate_init= 0.6, max_iter=40000, momentum= 
                     0.3).fit(X_train_norm , y_train)

And...

y_predictions= nnref.predict(X_test)

Should be:

y_predictions= nnref.predict(X_test_norm)

Additional notes...

  • It doesn't make any sense to predict on your training data. That provides no value, as it is testing the same data it learned from and should predict 100%. That is an example of overfitting.
2
  • First of all THANK YOU very much for taking the time. Secondly, I did the normalisation instead of the min max scaler because unless I am wrong, I think it standardises and I need to normalise (I have a reference study done on my samples but not with python, but with a statistical programm) Again, thank you. I will try the changes you suggested. Commented Jun 3, 2020 at 10:24
  • No problem, welcome to Stack Overflow! If this worked for you, please make sure to mark it as accepted @CovadongaPalacio
    – artemis
    Commented Jun 3, 2020 at 12:54
0

Well, I found a mistake:

You train the model on samples, that weren't normalized:
nnref = MLPRegressor(...).fit(X_train, y_train)
But later you're trying to predict values from normalized samples:
nnref.score(X_train_norm, y_train)


Also the x and y axis should have values ranging from 0 to 180, which is not the case ( X from 20 to 100, y from -4100 to -3500)
Scikit-learn do not change values by itself. If X is not in range you need, it means that you've changed it somehow. Or, maybe your vision of X values is incorrect.

1
  • Thank you very much for taking the time! It has been pointed out the normalisation problem so I will change it. Commented Jun 3, 2020 at 10:27

Not the answer you're looking for? Browse other questions tagged or ask your own question.