Difference between stratifying in StratifiedKFold vs. train_test_split

Question

When training my model, I'm getting very different results when I use something like sklearn.model_selection.train_test_split(X, y, stratify=y, train_size=0.9) vs. sklearn.model_selection.StratifiedKFold(n_splits=10) and was wondering if there was a difference between how they stratified their data. I'm almost certain I implemented everything according to the docs, but strangely enough, the latter gives much worse testing accuracy than the first.

Can you post a minimal complete code which we can try and duplicate your behaviour? — Vivek Kumar, Commented Jun 15, 2017 at 0:55

Mikhail Korobov · Accepted Answer · 2017-06-15 01:06:51Z

1

When stratify is not None train_test_split uses StratifiedShuffleSplit internally, not StratifiedKFold. So yeah, there is a big difference.

answered Jun 15, 2017 at 1:06

Mikhail Korobov

22.1k8 gold badges74 silver badges65 bronze badges

@hyperdo In addition, obvious difference is that StratifiedKFold will give 10 folds of different train and test data, and train_test_split will give only one.
– Vivek Kumar
Commented Jun 15, 2017 at 5:27

Add a comment |

Collectives™ on Stack Overflow

Difference between stratifying in StratifiedKFold vs. train_test_split

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
scikit-learn
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged scikit-learn or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
scikit-learn
or ask your own question.