I'm trying to understand what the shuffle parameter does in StratifiedKFold from sklearn.model_selection.
I've read the documentation but still don't understand what shuffle=True
does. Can someone please explain what shuffle=True
does in plain english?
From the documentation:
shuffle: bool, default=False Whether to shuffle each class’s samples before splitting into batches. Note that the samples within each split will not be shuffled.
The implementation is designed to:
- Generate test sets such that all contain the same distribution of classes, or as close as possible.
- Be invariant to class label: relabelling y = ["Happy", "Sad"] to y = [1, 0] should not change the indices generated.
- Preserve order dependencies in the dataset ordering, when shuffle=False: all samples from class k in some test set were contiguous in y, or separated in y by samples from classes other than k.
- Generate test sets where the smallest and largest differ by at most one sample.