How to tune a MLPRegressor?

Question

I currently have a dataset with variables and observations. I want to predict a variable (demand), which is a continuous one, thus I need to use a Regression model. I tried with Linear Regression, and evaluated it using the R2 metric, which was around 0.85. I wanted to evaluate its performance with other models, and one of them was the NNs. I believe that Neural Networks are more suitable in other task like classification, nevertheless I wanted to give them a try.

I decided to use scikit-learn mainly because it offers both models (Linear Regression and Multi Layer Perceptron), the thing is that the R2 metric was way too far and bad compared to the Linear Regression's one. Thus, I concluded that I am missing many important configurations. Below you can see my code and how the data comes.

My data has the following columns, only demand (which is my label), population,gdp, day and year are numerical continuous, the rest are categorical.

['demand','holy','gdp','population', 'day','year', 'f0', 'f1', 'f2', 'f3', 'f4','f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'g0', 'g1', 'g2', 'g3', 'g4', 'g5', 'g6', 'g7', 'g8', 'g9', 'g10', 'g11']

This is what I actually do, I removed some outputs.

import pandas as pd
import numpy as np
import math as math

from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score



training_data, validation_data = np.split(data.sample(frac=1), [int(.8*len(data))])

linear_model = LinearRegression().fit(training_data[[c for c in data.columns if c != "demand"]], training_data[["demand"]])

validation_data_predictions = linear_model.predict(validation_data[[c for c in training_data.columns if c != "demand"]])

validation_predictions_pd = pd.DataFrame(data=validation_data_predictions, 
                                         index=validation_data.index.values,
                                         columns=["prediction"])

# join both pandas
result_df = validation_data.join(validation_predictions_pd, how="inner")

r2_error = r2_score(y_true=result_df[["demand"]], y_pred=result_df[["prediction"]], multioutput="uniform_average")

print(r2_error) # outputs 0.85


# NN section
clf = MLPRegressor(hidden_layer_sizes=(10,), max_iter=100000)

neural_model = clf.fit(training_data[[c for c in training_data.columns if c != "demand"]], training_data[["demand"]])

validation_data_predictions = neural_model.predict(validation_data[[c for c in training_data.columns if c != "demand"]])

validation_predictions_pd = pd.DataFrame(data=validation_data_predictions, 
                                     index=validation_data.index.values,
                                     columns=["prediction"])

result_df = validation_data.join(validation_predictions_pd, how="inner")

r2_error = r2_score(y_true=result_df[["demand"]], y_pred=result_df[["prediction"]], multioutput="uniform_average")
print(r2_error) # outputs 0.23

So, as you can see the NN´s performance is very poor. And I think its performance can be improved, any hints?

Alberto, could you please make your example reproducible? Define the data variable so that others can provide you a tangible help. Cheers! — Oleg Melnikov, Commented Feb 10, 2017 at 5:57

zhenv5 · Accepted Answer · 2017-06-05 18:07:37Z

8

MLP is sensitive to feature scaling. Have you normalzied your data?
Modify your network structure: add more hidden layers and change number of perceptrons of each layer
change activation function to sigmod/tanh/relu etc.

answered Jun 5, 2017 at 18:07

zhenv5

2893 silver badges5 bronze badges

4. Change learning rate: learning_rate, learning_rate_init. 5. Toggle early_stopping
– ijoseph
Commented Apr 10, 2018 at 22:13

Add a comment |

Community · Accepted Answer · 2017-04-13 12:44:13Z

3

Maybe is not that the NN's performance is bad, maybe you are just using the wrong metric for comparing them. Usually it is not a good idea to trust the R2 score for evaluating linear regression models with many regressors: in fact, the more regressors you put in your model the higher your R squared (see this video for a quick explanation).

Anyway I think this question is more appropriate for https://stats.stackexchange.com/

edited Apr 13, 2017 at 12:44

CommunityBot

11 silver badge

answered Dec 24, 2016 at 14:52

G. Iacono

1265 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How to tune a MLPRegressor?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
scikit-learn
neural-network
regression
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonscikit-learnneural-networkregression or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
scikit-learn
neural-network
regression
or ask your own question.