Which layer in a CNN able to detect spinned and translated objects?

Question

Does the conv layer, max pooling layer, or anything else do the job? In my opinion, the Conv layer or max pooling layer is able to do the job only when the rotations or translations are not too big.

DeltaIV · Accepted Answer · 2019-03-15 15:32:02Z

Convolutional layers are not equivariant to rotation, and pooling layers only help with invariance to small rotations. "Invariance" of the whole classifier to rotations is not part of the inductive bias, but it's actually learned through heavy data augmentation.

However, for each group action there exists a corresponding group convolution operator which is equivariant to it. This concept is used, for example, in 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data by Weiler, Geiger, Welling, Boomsma and Cohen, 2018, to design layers which are equivariant to 3D rotations:

https://arxiv.org/pdf/1807.02547.pdf

gunes · Accepted Answer · 2019-03-13 06:30:26Z

5

For translational invariance, you can follow the discussion here. In general, pooling layer is the important player in local translational invariance by removing the spatial dimension in, for example, max-pooling. For instance, if an object slightly moves towards some direction, max-pooling still captures the max element and the same output will appear after the pooling. The convolutional layer is actually equivariant in translation.

Neither layers are rotation-invariant. Though, the network can exhibit this behavior if the properties of the data, and the overall architecture permit. A NIPS paper addresses this issue and use Spatial Transformers to improve CNNs invariance to rotation, scale and translation.

answered Mar 13, 2019 at 6:30

gunes

57.9k4 gold badges50 silver badges88 bronze badges

$\begingroup$ now that neither of them reserves rotation invariance, how do modern CNNs detect spinned images by merely using combinations of conv, pooling, etc layers? $\endgroup$
– feynman
Commented Mar 13, 2019 at 9:50
$\begingroup$ Rotation invariance is not built in to the individual layers, but it doesn't mean CNNs can't learn it. Probably, there is enough data with lots of variety, and enough layers that can make sense of spun objects. $\endgroup$
– gunes
Commented Mar 13, 2019 at 10:07

Add a comment |

Stack Exchange Network

Which layer in a CNN able to detect spinned and translated objects?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
neural-networks
conv-neural-network
object-detection
or ask your own question.

Linked

Hot Network Questions

Which layer in a CNN able to detect spinned and translated objects?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged neural-networksconv-neural-networkobject-detection or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
neural-networks
conv-neural-network
object-detection
or ask your own question.