6
$\begingroup$

Does the conv layer, max pooling layer, or anything else do the job? In my opinion, the Conv layer or max pooling layer is able to do the job only when the rotations or translations are not too big.

$\endgroup$

2 Answers 2

9
$\begingroup$

Convolutional layers are not equivariant to rotation, and pooling layers only help with invariance to small rotations. "Invariance" of the whole classifier to rotations is not part of the inductive bias, but it's actually learned through heavy data augmentation.

However, for each group action there exists a corresponding group convolution operator which is equivariant to it. This concept is used, for example, in 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data by Weiler, Geiger, Welling, Boomsma and Cohen, 2018, to design layers which are equivariant to 3D rotations:

https://arxiv.org/pdf/1807.02547.pdf

$\endgroup$
5
$\begingroup$

For translational invariance, you can follow the discussion here. In general, pooling layer is the important player in local translational invariance by removing the spatial dimension in, for example, max-pooling. For instance, if an object slightly moves towards some direction, max-pooling still captures the max element and the same output will appear after the pooling. The convolutional layer is actually equivariant in translation.

Neither layers are rotation-invariant. Though, the network can exhibit this behavior if the properties of the data, and the overall architecture permit. A NIPS paper addresses this issue and use Spatial Transformers to improve CNNs invariance to rotation, scale and translation.

$\endgroup$
2
  • $\begingroup$ now that neither of them reserves rotation invariance, how do modern CNNs detect spinned images by merely using combinations of conv, pooling, etc layers? $\endgroup$
    – feynman
    Commented Mar 13, 2019 at 9:50
  • $\begingroup$ Rotation invariance is not built in to the individual layers, but it doesn't mean CNNs can't learn it. Probably, there is enough data with lots of variety, and enough layers that can make sense of spun objects. $\endgroup$
    – gunes
    Commented Mar 13, 2019 at 10:07

Not the answer you're looking for? Browse other questions tagged or ask your own question.