Skip to main content

Showing 1–4 of 4 results for author: Trockman, A

  1. arXiv:2305.09828  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Mimetic Initialization of Self-Attention Layers

    Authors: Asher Trockman, J. Zico Kolter

    Abstract: It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to find reasons for this discrepancy. Surprisingly, we find that simply initializing the weights of self-attention layers so that they "look" more like their pre-… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  2. arXiv:2210.03651  [pdf, other

    cs.CV cs.AI cs.LG

    Understanding the Covariance Structure of Convolutional Filters

    Authors: Asher Trockman, Devin Willmott, J. Zico Kolter

    Abstract: Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study thei… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  3. arXiv:2201.09792  [pdf, other

    cs.CV cs.AI cs.LG

    Patches Are All You Need?

    Authors: Asher Trockman, J. Zico Kolter

    Abstract: Although convolutional networks have been the dominant architecture for vision tasks for many years, recent experiments have shown that Transformer-based models, most notably the Vision Transformer (ViT), may exceed their performance in some settings. However, due to the quadratic runtime of the self-attention layers in Transformers, ViTs require the use of patch embeddings, which group together s… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  4. arXiv:2104.07167  [pdf, other

    cs.LG stat.ML

    Orthogonalizing Convolutional Layers with the Cayley Transform

    Authors: Asher Trockman, J. Zico Kolter

    Abstract: Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heur… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: To appear in ICLR 2021