Skip to main content

Showing 1–50 of 117 results for author: Kolter, J Z

  1. arXiv:2406.14548  [pdf, other

    cs.LG cs.CV

    Consistency Models Made Easy

    Authors: Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter

    Abstract: Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.09358  [pdf, other

    cs.LG

    Understanding Hallucinations in Diffusion Models through Mode Interpolation

    Authors: Sumukh K Aithal, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter

    Abstract: Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" betw… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.19540  [pdf, other

    cs.IT cs.CR

    Computing Low-Entropy Couplings for Large-Support Distributions

    Authors: Samuel Sokota, Dylan Sam, Christian Schroeder de Witt, Spencer Compton, Jakob Foerster, J. Zico Kolter

    Abstract: Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limita… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2404.15146  [pdf, other

    cs.LG cs.CL

    Rethinking LLM Memorization through the Lens of Adversarial Compression

    Authors: Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter

    Abstract: Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we pro… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: https://locuslab.github.io/acr-memorization

  5. arXiv:2404.07177  [pdf, other

    cs.LG

    Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

    Authors: Sachin Goyal, Pratyush Maini, Zachary C. Lipton, Aditi Raghunathan, J. Zico Kolter

    Abstract: Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. In recent times, data curation has gained prominence with several works developing strategies to retain 'high-quality' subsets of 'raw' scraped data. For instance, the LAION public dataset retained only 10% of the total crawled data. However, these strategies are typically developed agnostic of… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Published at CVPR 2024

  6. arXiv:2403.19103  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

    Authors: Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

    Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produc… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  7. arXiv:2403.03772  [pdf, other

    cs.LG cs.DC stat.ML

    AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs

    Authors: Victor Akinwande, J. Zico Kolter

    Abstract: Existing causal discovery methods based on combinatorial optimization or search are slow, prohibiting their application on large-scale datasets. In response, more recent methods attempt to address this limitation by formulating causal discovery as structure learning with continuous optimization but such approaches thus far provide no statistical guarantees. In this paper, we show that by efficient… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted at MLGenX @ ICLR 2024. Open source at https://github.com/Viktour19/culingam

  8. arXiv:2402.17762  [pdf, other

    cs.CL cs.LG

    Massive Activations in Large Language Models

    Authors: Mingjie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu

    Abstract: We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e.g., 100,000 times larger). We call them massive activations. First, we demonstrate the widespread existence of massive activations across various LLMs and characterize their locations. Second, we find their values largely stay constant regardless of the inpu… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Website at https://eric-mingjie.github.io/massive-activations/index.html

  9. arXiv:2402.13410  [pdf, other

    cs.LG stat.ML

    Bayesian Neural Networks with Domain Knowledge Priors

    Authors: Dylan Sam, Rattana Pukdee, Daniel P. Jeong, Yewon Byun, J. Zico Kolter

    Abstract: Bayesian neural networks (BNNs) have recently gained popularity due to their ability to quantify model uncertainty. However, specifying a prior for BNNs that captures relevant domain knowledge is often extremely challenging. In this work, we propose a framework for integrating general forms of domain knowledge (i.e., any knowledge that can be represented by a loss function) into a BNN prior throug… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 17 pages, 4 figures

  10. arXiv:2401.08639  [pdf, other

    cs.CV cs.LG

    One-Step Diffusion Distillation via Deep Equilibrium Models

    Authors: Zhengyang Geng, Ashwini Pokle, J. Zico Kolter

    Abstract: Diffusion models excel at producing high-quality samples but naively require hundreds of iterations, prompting multiple attempts to distill the generation process into a faster network. However, many existing approaches suffer from a variety of challenges: the process for distillation training can be complex, often requiring multiple training stages, and the resulting models perform poorly when ut… ▽ More

    Submitted 12 December, 2023; originally announced January 2024.

    Comments: NeurIPS 2023

  11. arXiv:2401.06890  [pdf, other

    cs.LG

    An Axiomatic Approach to Model-Agnostic Concept Explanations

    Authors: Zhili Feng, Michal Moshkovitz, Dotan Di Castro, J. Zico Kolter

    Abstract: Concept explanation is a popular approach for examining how human-interpretable concepts impact the predictions of a model. However, most existing methods for concept explanations are tailored to specific models. To address this issue, this paper focuses on model-agnostic measures. Specifically, we propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivit… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  12. arXiv:2401.06121  [pdf, other

    cs.LG cs.CL

    TOFU: A Task of Fictitious Unlearning for LLMs

    Authors: Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter

    Abstract: Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they resu… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: https://locuslab.github.io/tofu/

  13. arXiv:2312.00234  [pdf, other

    cs.LG math.NA stat.ML

    Deep Equilibrium Based Neural Operators for Steady-State PDEs

    Authors: Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

    Abstract: Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  14. arXiv:2311.16424  [pdf, other

    cs.LG cs.AI cs.CV

    Manifold Preserving Guided Diffusion

    Authors: Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon

    Abstract: Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework that leverages pretrained diffusion models and off-the-shelf neural networks with minimal additional inference cost for a broad… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  15. arXiv:2311.14885  [pdf, other

    cs.LG

    Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

    Authors: Melrose Roderick, Gaurav Manek, Felix Berkenkamp, J. Zico Kolter

    Abstract: A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or distribution shift, between the dataset and the distribution over states and actions visited by the learned policy. This problem is exacerbated in the fully offline setting. The main approach to correct this shift has been through importance sampling, which leads to high-variance gradients. Other approaches, such as conser… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 10 pages

  16. arXiv:2310.18605  [pdf, other

    cs.LG

    TorchDEQ: A Library for Deep Equilibrium Models

    Authors: Zhengyang Geng, J. Zico Kolter

    Abstract: Deep Equilibrium (DEQ) Models, an emerging class of implicit models that maps inputs to fixed points of neural networks, are of growing interest in the deep learning community. However, training and applying DEQ models is currently done in an ad-hoc fashion, with various techniques spread across the literature. In this work, we systematically revisit DEQs and present TorchDEQ, an out-of-the-box Py… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  17. arXiv:2310.14062  [pdf, other

    cs.LG cs.AI

    On the Neural Tangent Kernel of Equilibrium Models

    Authors: Zhili Feng, J. Zico Kolter

    Abstract: This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model, a practical ``infinite-depth'' architecture which directly computes the infinite-depth limit of a weight-tied network via root-finding. Even though the NTK of a fully-connected neural network can be stochastic if its width and depth both tend to infinity simultaneously, we show that contrarily a DEQ model still… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  18. arXiv:2310.03957  [pdf, other

    cs.LG cs.CV

    Understanding prompt engineering may not require rethinking generalization

    Authors: Victor Akinwande, Yiding Jiang, Dylan Sam, J. Zico Kolter

    Abstract: Zero-shot learning in prompted vision-language models, the practice of crafting prompts to build classifiers without an explicit training process, has achieved impressive performance in many settings. This success presents a seemingly surprising observation: these methods suffer relatively little from overfitting, i.e., when a prompt is manually engineered to achieve low error on a given training… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  19. arXiv:2310.01405  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Representation Engineering: A Top-Down Approach to AI Transparency

    Authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

    Abstract: In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive p… ▽ More

    Submitted 10 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Code is available at https://github.com/andyzoujm/representation-engineering

  20. arXiv:2307.15043  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

    Abstract: Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practic… ▽ More

    Submitted 20 December, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Website: http://llm-attacks.org/

  21. arXiv:2307.09542  [pdf, other

    cs.LG cs.CV

    Can Neural Network Memorization Be Localized?

    Authors: Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang

    Abstract: Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $\textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual lay… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted at ICML 2023

  22. arXiv:2307.04990  [pdf, other

    cs.LG cs.AI

    Monotone deep Boltzmann machines

    Authors: Zhili Feng, Ezra Winston, J. Zico Kolter

    Abstract: Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in or… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  23. arXiv:2307.04317  [pdf, other

    cs.CV cs.LG

    Text Descriptions are Compressive and Invariant Representations for Visual Learning

    Authors: Zhili Feng, Anna Bair, J. Zico Kolter

    Abstract: Modern image classification is based upon directly predicting classes via large discriminative networks, which do not directly contain information about the intuitive visual features that may constitute a classification decision. Recently, work in vision-language models (VLM) such as CLIP has provided ways to specify natural language descriptions of image classes, but typically focuses on providin… ▽ More

    Submitted 30 October, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

  24. arXiv:2307.03132  [pdf, other

    cs.CV cs.CL cs.LG

    T-MARS: Improving Visual Representations by Circumventing Text Feature Learning

    Authors: Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan

    Abstract: Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only im… ▽ More

    Submitted 18 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted to ICLR 2024. Oral at ICCV Datacomp 2023

  25. arXiv:2306.14636  [pdf, other

    cs.CV

    Localized Text-to-Image Generation for Free via Cross Attention Control

    Authors: Yutong He, Ruslan Salakhutdinov, J. Zico Kolter

    Abstract: Despite the tremendous success in text-to-image generative models, localized text-to-image generation (that is, generating objects or features at specific locations in an image while maintaining a consistent overall generation) still requires either explicit training or substantial additional inference time. In this work, we show that localized generation can be achieved by simply controlling cros… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  26. arXiv:2306.14101  [pdf, other

    cs.LG cs.AI

    Language models are weak learners

    Authors: Hariharan Manikandan, Yiding Jiang, J Zico Kolter

    Abstract: A central notion in practical and theoretical machine learning is that of a $\textit{weak learner}$, classifiers that achieve better-than-random performance (on any given distribution over data), even by a small margin. Such weak learners form the practical basis for canonical machine learning methods such as boosting. In this work, we illustrate that prompt-based large language models can operate… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 23 pages, 6 figures

  27. arXiv:2306.11695  [pdf, other

    cs.CL cs.AI cs.LG

    A Simple and Effective Pruning Approach for Large Language Models

    Authors: Mingjie Sun, Zhuang Liu, Anna Bair, J. Zico Kolter

    Abstract: As their size increases, Large Languages Models (LLMs) are natural candidates for network pruning methods: approaches that drop a subset of network weights while striving to preserve performance. Existing methods, however, require either retraining, which is rarely affordable for billion-scale LLMs, or solving a weight reconstruction problem reliant on second-order information, which may also be c… ▽ More

    Submitted 6 May, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: ICLR 2024. Website at https://eric-mingjie.github.io/wanda/home.html

  28. arXiv:2306.05483  [pdf, other

    cs.LG

    On the Importance of Exploration for Generalization in Reinforcement Learning

    Authors: Yiding Jiang, J. Zico Kolter, Roberta Raileanu

    Abstract: Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy plays a key role in its ability to generalize to new environments. Through a series of experiments in a tabular contextual MDP, we show that exploration is helpfu… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  29. arXiv:2306.04793  [pdf, other

    cs.LG stat.ML

    On the Joint Interaction of Models, Data, and Features

    Authors: Yiding Jiang, Christina Baek, J. Zico Kolter

    Abstract: Learning features from data is one of the defining characteristics of deep learning, but our theoretical understanding of the role features play in deep learning is still rudimentary. To address this gap, we introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. With the interaction tensor, we make several key observations a… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  30. arXiv:2305.13546  [pdf, other

    cs.LG cs.AI

    Neural Functional Transformers

    Authors: Allan Zhou, Kaien Yang, Yiding Jiang, Kaylee Burns, Winnie Xu, Samuel Sokota, J. Zico Kolter, Chelsea Finn

    Abstract: The recent success of neural networks as implicit representation of data has driven growing interest in neural functionals: models that can process other neural networks as input by operating directly over their weight spaces. Nevertheless, constructing expressive and efficient neural functional architectures that can handle high-dimensional weight-space objects remains challenging. This paper use… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  31. arXiv:2305.09828  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Mimetic Initialization of Self-Attention Layers

    Authors: Asher Trockman, J. Zico Kolter

    Abstract: It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to find reasons for this discrepancy. Surprisingly, we find that simply initializing the weights of self-attention layers so that they "look" more like their pre-… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  32. arXiv:2304.13138  [pdf, other

    cs.AI cs.LG

    The Update-Equivalence Framework for Decision-Time Planning

    Authors: Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

    Abstract: The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes gr… ▽ More

    Submitted 13 May, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

  33. arXiv:2303.14496  [pdf, other

    cs.LG cs.AI stat.ML

    Learning with Explanation Constraints

    Authors: Rattana Pukdee, Dylan Sam, J. Zico Kolter, Maria-Florina Balcan, Pradeep Ravikumar

    Abstract: As larger deep learning models are hard to interpret, there has been a recent focus on generating explanations of these black-box models. In contrast, we may have apriori explanations of how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of ou… ▽ More

    Submitted 22 December, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2023

  34. arXiv:2303.07675  [pdf, other

    cs.LG cs.SI

    Sinkhorn-Flow: Predicting Probability Mass Flow in Dynamical Systems Using Optimal Transport

    Authors: Mukul Bhutani, J. Zico Kolter

    Abstract: Predicting how distributions over discrete variables vary over time is a common task in time series forecasting. But whereas most approaches focus on merely predicting the distribution at subsequent time steps, a crucial piece of information in many settings is to determine how this probability mass flows between the different elements over time. We propose a new approach to predicting such mass f… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: A prior version of the work appeared in the Optimal Transport Workshop at NeurIPS 2019

  35. arXiv:2303.07320  [pdf, other

    cs.CL cs.LG

    Model-tuning Via Prompts Makes NLP Models Adversarially Robust

    Authors: Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachary C. Lipton, Danish Pruthi

    Abstract: In recent years, NLP practitioners have converged on the following practice: (i) import an off-the-shelf pretrained (masked) language model; (ii) append a multilayer perceptron atop the CLS token's hidden representation (with randomly initialized weights); and (iii) fine-tune the entire model on a downstream task (MLP-FT). This procedure has produced massive gains on standard NLP benchmarks, but t… ▽ More

    Submitted 5 December, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to the EMNLP 2023 Conference

  36. arXiv:2303.00215  [pdf, other

    cs.CV

    Single Image Backdoor Inversion via Robust Smoothed Classifiers

    Authors: Mingjie Sun, J. Zico Kolter

    Abstract: Backdoor inversion, a central step in many backdoor defenses, is a reverse-engineering process to recover the hidden backdoor trigger inserted into a machine learning model. Existing approaches tackle this problem by searching for a backdoor pattern that is able to flip a set of clean images into the target class, while the exact size needed of this support set is rarely investigated. In this work… ▽ More

    Submitted 17 December, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

    Comments: CVPR 2023. v2: improved writing

  37. arXiv:2302.14040  [pdf, other

    cs.LG cs.AI

    Permutation Equivariant Neural Functionals

    Authors: Allan Zhou, Kaien Yang, Kaylee Burns, Adriano Cardace, Yiding Jiang, Samuel Sokota, J. Zico Kolter, Chelsea Finn

    Abstract: This work studies the design of neural networks that can process the weights or gradients of other neural networks, which we refer to as neural functional networks (NFNs). Despite a wide range of potential applications, including learned optimization, processing implicit neural representations, network editing, and policy evaluation, there are few unifying principles for designing effective archit… ▽ More

    Submitted 26 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: To appear in Neural Information Processing Systems (NeurIPS), 2023

  38. arXiv:2301.09159  [pdf, other

    cs.GT cs.AI cs.LG

    Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

    Authors: Samuel Sokota, Ryan D'Orazio, Chun Kai Ling, David J. Wu, J. Zico Kolter, Noam Brown

    Abstract: In their seminal work, Nayyar et al. (2013) showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equili… ▽ More

    Submitted 31 July, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

  39. arXiv:2212.14431  [pdf, other

    cs.GT cs.AI cs.MA

    Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

    Authors: Chun Kai Ling, J. Zico Kolter, Fei Fang

    Abstract: Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no sim… ▽ More

    Submitted 1 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: To appear in AAAI 2023

  40. arXiv:2212.06921  [pdf, other

    cs.LG

    Losses over Labels: Weakly Supervised Learning via Direct Loss Construction

    Authors: Dylan Sam, J. Zico Kolter

    Abstract: Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the data. These weak labels are combined (typically via a graphical model) to form pseudolabels, which are then used to train a downstream model. In this work, we qu… ▽ More

    Submitted 4 October, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 13 pages, 3 figures, AAAI 2023

  41. arXiv:2211.14503  [pdf, other

    cs.LG

    Simple initialization and parametrization of sinusoidal networks via their kernel bandwidth

    Authors: Filipe de Avila Belbute-Peres, J. Zico Kolter

    Abstract: Neural networks with sinusoidal activations have been proposed as an alternative to networks with traditional activation functions. Despite their promise, particularly for learning implicit models, their training behavior is not yet fully understood, leading to a number of empirical design choices that are not well justified. In this work, we first propose a simplified version of such sinusoidal n… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  42. arXiv:2210.15031  [pdf, other

    cs.LG

    Characterizing Datapoints via Second-Split Forgetting

    Authors: Pratyush Maini, Saurabh Garg, Zachary C. Lipton, J. Zico Kolter

    Abstract: Researchers investigating example hardness have increasingly focused on the dynamics by which neural networks learn and forget examples throughout training. Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out.… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  43. arXiv:2210.14889  [pdf, other

    cs.CR cs.AI cs.MM

    Perfectly Secure Steganography Using Minimum Entropy Coupling

    Authors: Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

    Abstract: Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in developing scalable steganogr… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

  44. arXiv:2210.03651  [pdf, other

    cs.CV cs.AI cs.LG

    Understanding the Covariance Structure of Convolutional Filters

    Authors: Asher Trockman, Devin Willmott, J. Zico Kolter

    Abstract: Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study thei… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  45. arXiv:2208.05740  [pdf, other

    cs.LG cs.CR cs.CV math.OC stat.ML

    General Cutting Planes for Bound-Propagation-Based Neural Network Verification

    Authors: Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, J. Zico Kolter

    Abstract: Bound propagation methods, when combined with branch and bound, are among the most effective methods to formally verify properties of deep neural networks such as correctness, robustness, and safety. However, existing works cannot handle the general form of cutting plane constraints widely accepted in traditional solvers, which are crucial for strengthening verifiers with tightened convex relaxati… ▽ More

    Submitted 4 December, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: Accepted by NeurIPS 2022. GCP-CROWN is part of the alpha-beta-CROWN verifier, the VNN-COMP 2022 winner

  46. arXiv:2206.10550  [pdf, other

    cs.LG cs.CR

    (Certified!!) Adversarial Robustness for Free!

    Authors: Nicholas Carlini, Florian Tramer, Krishnamurthy Dj Dvijotham, Leslie Rice, Mingjie Sun, J. Zico Kolter

    Abstract: In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models. To do so, we instantiate the denoised smoothing approach of Salman et al. 2020 by combining a pretrained denoising diffusion probabilistic model and a standard high-accuracy classifier. This allows us to certify 71% accura… ▽ More

    Submitted 6 March, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

  47. arXiv:2206.05825  [pdf, other

    cs.LG cs.AI cs.GT

    A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

    Authors: Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

    Abstract: This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equili… ▽ More

    Submitted 11 April, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

  48. arXiv:2205.06154  [pdf, other

    cs.LG cs.CV

    Smooth-Reduce: Leveraging Patches for Improved Certified Robustness

    Authors: Ameya Joshi, Minh Pham, Minsu Cho, Leonid Boytsov, Filipe Condessa, J. Zico Kolter, Chinmay Hegde

    Abstract: Randomized smoothing (RS) has been shown to be a fast, scalable technique for certifying the robustness of deep neural network classifiers. However, methods based on RS require augmenting data with large amounts of noise, which leads to significant drops in accuracy. We propose a training-free, modified smoothing approach, Smooth-Reduce, that leverages patching and aggregation to provide improved… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  49. arXiv:2204.08442  [pdf, other

    cs.CV cs.AI cs.LG

    Deep Equilibrium Optical Flow Estimation

    Authors: Shaojie Bai, Zhengyang Geng, Yash Savani, J. Zico Kolter

    Abstract: Many recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms by encouraging iterative refinements toward a stable flow estimation. However, these RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. They can converge poorly and thereby suffer from performance degrad… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  50. arXiv:2203.00806  [pdf, other

    cs.RO

    Dojo: A Differentiable Physics Engine for Robotics

    Authors: Taylor A. Howell, Simon Le Cleac'h, Jan Brüdigam, J. Zico Kolter, Mac Schwager, Zachary Manchester

    Abstract: We present Dojo, a differentiable physics engine for robotics that prioritizes stable simulation, accurate contact physics, and differentiability with respect to states, actions, and system parameters. Dojo achieves stable simulation at low sample rates and conserves energy and momentum by employing a variational integrator. A nonlinear complementarity problem with second-order cones for friction… ▽ More

    Submitted 30 March, 2023; v1 submitted 1 March, 2022; originally announced March 2022.