Trying to use sc.fit_transform(X)
, I get a huge drop in accuracy on the same model. Without scaling the values of my dataset, I get accuracy values of 80 - 82%. When I try to scale them, using sc.fit_transform(X)
, I get accuracy values of 70 - 74%.
What could be the reasons for this huge drop in accuracy?
EDIT:
Here is the code I am using:
# read the dataset file
basic_df = pd.read_csv('posts.csv', sep=';', encoding = 'ISO-8859-1', parse_dates=[2], dayfirst=True)
# One-Hot-Encoding for categorical (strings) features
basic_df = pd.get_dummies(basic_df, columns=['industry', 'weekday', 'category_name', 'page_name', 'type'])
# bring the label column to the end
cols = list(basic_df.columns.values) # Make a list of all of the columns in the df
cols.pop(cols.index('successful')) # Remove target column from list
basic_df = basic_df[cols+['successful']] # Add it at the end of dataframe
dataset = basic_df.values
# separate the data from the labels
X = dataset[:,0:45].astype(float)
Y = dataset[:,45]
#standardizing the input feature
X = sc.fit_transform(X)
# evaluate model with standardized dataset
#estimator = KerasClassifier(build_fn=create_baseline, epochs=5, batch_size=5, verbose=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=seed)
#estimator.fit(X_train, Y_train)
#predictions = estimator.predict(X_test)
#list(predictions)
# build the model
model = Sequential()
model.add(Dense(100, input_dim=45, kernel_initializer='normal', activation='relu'))
model.add(Dense(50, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
history = model.fit(X_train, Y_train, validation_split=0.3, epochs=500, batch_size=10)
There is a part of code commented, as I tried to use the KerasClassifier in the beginning. But both methods end up with much less accuracy (as stated above), when I use fit_transform(X). Without using fit_transform(X) I get an accuracy of 80 - 82%. Without 70 - 74%. How come? Am I doing something wrong? Does scaling the input data not always lead to better (or almost same accuracy results at least) AND primarily faster fitting? Why is this big drop in accuracy when using it?
PS: 'sc' is StandardScaler() --> sc = StandardScaler()
Here is the dataframe used (in 2 photos, because it too wide to make a screenshot in just one photo) with column 'successful' as label-column:
sc
in this context? $\endgroup$