I want to plot the learning error curve of a neural net with respect to the number of training examples. Here is the code :
import sklearn
import numpy as np
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
from sklearn import neural_network
from sklearn import cross_validation
myList=[]
myList2=[]
w=[]
dataset=np.loadtxt("data", delimiter=",")
X=dataset[:, 0:6]
Y=dataset[:,6]
clf=sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(2,3),activation='tanh')
# split the data between training and testing
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=0.25, random_state=33)
# begin with few training datas
X_eff=X_train[0:int(len(X_train)/150), : ]
Y_eff=Y_train[0:int(len(Y_train)/150)]
k=int(len(X_train)/150)-1
for m in range (140) :
print (m)
w.append(k)
# train the model and store the training error
A=clf.fit(X_eff,Y_eff)
myList.append(1-A.score(X_eff,Y_eff))
# compute the testing error
myList2.append(1-A.score(X_test,Y_test))
# add some more training datas
X_eff=np.vstack((X_eff,X_train[k+1:k+101,:]))
Y_eff=np.hstack((Y_eff,Y_train[k+1:k+101]))
k=k+100
plt.figure(figsize=(8, 8))
plt.subplots_adjust()
plt.title("Erreur d'entrainement et de test")
plt.plot(w,myList,label="training error")
plt.plot(w,myList2,label="test error")
plt.legend()
plt.show()
However, I get a very strange result, with curves fluctuating, the training error very close to the testing error which does not appear to be normal. Where is the mistake? I can't understand why there are so many ups and downs and why the training error does not increase, as it would be expected to.Any help would be appreciated !
EDIT : the dataset I am using is https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29 where I got rid of the classes having less than 1000 instances. I manually re-encoded the litteral data.
tanh
IMO gives this kind of curve usually. Maybe try changing that. Try 'relu' or 'logistic' in its place