Your test frame will need to have the same column as the training set. You can achieve this as follows:
This code creates sample data
# initialize list of lists
data = [['tom', 10], ['carol', 15], ['juli', 14], ['winston', 20], ['carol', 11], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Test_Score'])
train = df.loc[0:2]
test = df.loc[3:]
train = pd.get_dummies(train)
test = pd.get_dummies(test)
print("test before")
print(test)
Code to adjust the test set so that the columns are similar to the train.
# Get missing columns in the training test
missing_cols = set( train.columns ) - set( test.columns )
# Add a missing column in test set with default value equal to 0
for c in missing_cols:
test[c] = 0
# Ensure the order of column in the test set is in the same order than in train set
test = test[train.columns]
print("test after")
print(test)
The code above is identifying columns in the training set that are missing in the test set. The columns that are missing are then added as 0 columns.
Reference: