2

When training my model, I'm getting very different results when I use something like sklearn.model_selection.train_test_split(X, y, stratify=y, train_size=0.9) vs. sklearn.model_selection.StratifiedKFold(n_splits=10) and was wondering if there was a difference between how they stratified their data. I'm almost certain I implemented everything according to the docs, but strangely enough, the latter gives much worse testing accuracy than the first.

1
  • Can you post a minimal complete code which we can try and duplicate your behaviour? Commented Jun 15, 2017 at 0:55

1 Answer 1

1

When stratify is not None train_test_split uses StratifiedShuffleSplit internally, not StratifiedKFold. So yeah, there is a big difference.

1
  • @hyperdo In addition, obvious difference is that StratifiedKFold will give 10 folds of different train and test data, and train_test_split will give only one. Commented Jun 15, 2017 at 5:27

Not the answer you're looking for? Browse other questions tagged or ask your own question.