What is the scope of Keras' ImageDataGenerator.flow_from_dataframe seed parameter?

Question

I've been working on a U-Net model using training images stored on my local drive. To load these I have been using Keras' ImageDataGenerator.flow_from_dataframe method and optionally applying some augmentations.

I have had no problems with this but noticed some odd behaviour when I retrieve batches of data from the flow.

In the below, simplified, example I am loading 8-bit RGB files from a directory and setting the seed - I've omitted augmentation parameters in this example but get the same behaviour with and without those present.

For QA/QC purposes I will typically get a batch and look at a random selection of images. However, when I get a batch and generate some random image indices I always get the same result. This only occurs after batch generation, not initialisation of the flow generator object.

# Step 1
# Set up image data flow
img_generator = ImageDataGenerator(rescale=1/255.)
train_gen = img_generator.flow_from_dataframe(
                img_df, # filnames are read from column "filename"
                img_dir, # local directory containing image files
                y_col=None,
                target_size=(512,512),
                class_mode=None,
                shuffle=False, # I'm using separate mask images so no shuffling here
                batch_size=16,
                seed=42 # behavior occurs when using seed
            )

# Step 2
# Generate and print 8 random indices
# No batch of images retrieved yet; no use of seed
print(np.random.randint(16, size=8))
>>> [ 7 15 13  3  6  3  2 14] # always random

# Step 3
# Now get a batch of images; seed is used
batch = next(train_gen)

# Step 4
# Generate and print 8 random indices
print(np.random.randint(16, size=8))
>>> [ 6  1  3  8 11 13  1  9] # always the same result

Using a seed of 42, the output of Step 2 changes each time Steps 1 & 2 are executed. This is expected behaviour since Step 1 should not impact Step 2. However, once a batch is retrieved from the generator in Step 3, Step 4 always returns the same indices.

This behaviour continues as new batches are yielded; the seed is changed on each yield so each batch returns different indices but always the same indices.

With the seed set to 42 the indices generated after first few batches are:

>>> [ 6 1 3 8 11 13 1 9]  # Batch 1
>>> [10 10 5 5 5 8 10 11] # Batch 2
>>> [ 5 3 0 10 4 9 15 2]  # Batch 3

This suggests to me that when a batch of images is generated the global numpy seed is changed. In practical terms, I end up always examining the same sample of images. When the seed parameter is not provided the global seed remains unmodified and no outputs are alike.

I'm wondering if others have come across this - is this a bug or am I misunderstanding something?

Looking at the source code, it looks as though the seed for numpy is set as soon as the ImageDataGenerator is initialized. — Oxbowerce, Commented Jun 18, 2021 at 16:48
Interestingly, the code calls np.random.seed and therefore modifies the global random number generator as my example suggests. Turns out that there are active issues on the Keras repo to address this behavior in other parts of the library such as dataset loaders: github.com/keras-team/keras/issues/12258. — Ali, Commented Jun 19, 2021 at 12:35

Ali · Accepted Answer · 2021-06-19 12:49:42Z

1

Further investigation confirms that, in this case, Keras does indeed modify the global random number generator.

The repo has active issues and PRs that address this behaviour in other areas of the library by using a local random state e.g. this issue.

answered Jun 19, 2021 at 12:49

Ali

1114 bronze badges

Add a comment |

Stack Exchange Network

What is the scope of Keras' ImageDataGenerator.flow_from_dataframe seed parameter?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
keras
data-augmentation
or ask your own question.

Hot Network Questions

What is the scope of Keras' ImageDataGenerator.flow_from_dataframe seed parameter?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged kerasdata-augmentation or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
keras
data-augmentation
or ask your own question.