-
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Authors:
Rishabh Ranjan,
Saurabh Garg,
Mrigank Raman,
Carlos Guestrin,
Zachary Chase Lipton
Abstract:
Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical st…
▽ More
Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical study. In particular, we demonstrate a phenomenon that we call post-hoc reversal, where performance trends are reversed after applying these post-hoc transforms. This phenomenon is especially prominent in high-noise settings. For example, while base models overfit badly early in training, both conventional ensembling and SWA favor base models trained for more epochs. Post-hoc reversal can also suppress the appearance of double descent and mitigate mismatches between test loss and test error seen in base models. Based on our findings, we propose post-hoc selection, a simple technique whereby post-hoc metrics inform model development decisions such as early stopping, checkpointing, and broader hyperparameter choices. Our experimental analyses span real-world vision, language, tabular and graph datasets from domains like satellite imaging, language modeling, census prediction and social network analysis. On an LLM instruction tuning dataset, post-hoc selection results in > 1.5x MMLU improvement compared to naive selection. Code is available at https://github.com/rishabh-ranjan/post-hoc-reversal.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Turn Down the Noise: Leveraging Diffusion Models for Test-time Adaptation via Pseudo-label Ensembling
Authors:
Mrigank Raman,
Rohan Shah,
Akash Kannan,
Pranit Chawla
Abstract:
The goal of test-time adaptation is to adapt a source-pretrained model to a continuously changing target domain without relying on any source data. Typically, this is either done by updating the parameters of the model (model adaptation) using inputs from the target domain or by modifying the inputs themselves (input adaptation). However, methods that modify the model suffer from the issue of comp…
▽ More
The goal of test-time adaptation is to adapt a source-pretrained model to a continuously changing target domain without relying on any source data. Typically, this is either done by updating the parameters of the model (model adaptation) using inputs from the target domain or by modifying the inputs themselves (input adaptation). However, methods that modify the model suffer from the issue of compounding noisy updates whereas methods that modify the input need to adapt to every new data point from scratch while also struggling with certain domain shifts. We introduce an approach that leverages a pre-trained diffusion model to project the target domain images closer to the source domain and iteratively updates the model via pseudo-label ensembling. Our method combines the advantages of model and input adaptations while mitigating their shortcomings. Our experiments on CIFAR-10C demonstrate the superiority of our approach, outperforming the strongest baseline by an average of 1.7% across 15 diverse corruptions and surpassing the strongest input adaptation baseline by an average of 18%.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Authors:
Kousik Rajesh,
Mrigank Raman,
Mohammed Asad Karim,
Pranit Chawla
Abstract:
In recent times there has been a surge of multi-modal architectures based on Large Language Models, which leverage the zero shot generation capabilities of LLMs and project image embeddings into the text space and then use the auto-regressive capacity to solve tasks such as VQA, captioning, and image retrieval. We name these architectures as "bridge-architectures" as they project from the image sp…
▽ More
In recent times there has been a surge of multi-modal architectures based on Large Language Models, which leverage the zero shot generation capabilities of LLMs and project image embeddings into the text space and then use the auto-regressive capacity to solve tasks such as VQA, captioning, and image retrieval. We name these architectures as "bridge-architectures" as they project from the image space to the text space. These models deviate from the traditional recipe of training transformer based multi-modal models, which involve using large-scale pre-training and complex multi-modal interactions through co or cross attention. However, the capabilities of bridge architectures have not been tested on complex visual reasoning tasks which require fine grained analysis about the image. In this project, we investigate the performance of these bridge-architectures on the NLVR2 dataset, and compare it to state-of-the-art transformer based architectures. We first extend the traditional bridge architectures for the NLVR2 dataset, by adding object level features to faciliate fine-grained object reasoning. Our analysis shows that adding object level features to bridge architectures does not help, and that pre-training on multi-modal data is key for good performance on complex reasoning tasks such as NLVR2. We also demonstrate some initial results on a recently bridge-architecture, LLaVA, in the zero shot setting and analyze its performance.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Model-tuning Via Prompts Makes NLP Models Adversarially Robust
Authors:
Mrigank Raman,
Pratyush Maini,
J. Zico Kolter,
Zachary C. Lipton,
Danish Pruthi
Abstract:
In recent years, NLP practitioners have converged on the following practice: (i) import an off-the-shelf pretrained (masked) language model; (ii) append a multilayer perceptron atop the CLS token's hidden representation (with randomly initialized weights); and (iii) fine-tune the entire model on a downstream task (MLP-FT). This procedure has produced massive gains on standard NLP benchmarks, but t…
▽ More
In recent years, NLP practitioners have converged on the following practice: (i) import an off-the-shelf pretrained (masked) language model; (ii) append a multilayer perceptron atop the CLS token's hidden representation (with randomly initialized weights); and (iii) fine-tune the entire model on a downstream task (MLP-FT). This procedure has produced massive gains on standard NLP benchmarks, but these models remain brittle, even to mild adversarial perturbations. In this work, we demonstrate surprising gains in adversarial robustness enjoyed by Model-tuning Via Prompts (MVP), an alternative method of adapting to downstream tasks. Rather than appending an MLP head to make output prediction, MVP appends a prompt template to the input, and makes prediction via text infilling/completion. Across 5 NLP datasets, 4 adversarial attacks, and 3 different models, MVP improves performance against adversarial substitutions by an average of 8% over standard methods and even outperforms adversarial training-based state-of-art defenses by 3.5%. By combining MVP with adversarial training, we achieve further improvements in adversarial robustness while maintaining performance on unperturbed examples. Finally, we conduct ablations to investigate the mechanism underlying these gains. Notably, we find that the main causes of vulnerability of MLP-FT can be attributed to the misalignment between pre-training and fine-tuning tasks, and the randomly initialized MLP parameters.
△ Less
Submitted 5 December, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Domain Generalization via Inference-time Label-Preserving Target Projections
Authors:
Prashant Pandey,
Mrigank Raman,
Sumanth Varambally,
Prathosh AP
Abstract:
Generalization of machine learning models trained on a set of source domains on unseen target domains with different statistics, is a challenging problem. While many approaches have been proposed to solve this problem, they only utilize source data during training but do not take advantage of the fact that a single target example is available at the time of inference. Motivated by this, we propose…
▽ More
Generalization of machine learning models trained on a set of source domains on unseen target domains with different statistics, is a challenging problem. While many approaches have been proposed to solve this problem, they only utilize source data during training but do not take advantage of the fact that a single target example is available at the time of inference. Motivated by this, we propose a method that effectively uses the target sample during inference beyond mere classification. Our method has three components - (i) A label-preserving feature or metric transformation on source data such that the source samples are clustered in accordance with their class irrespective of their domain (ii) A generative model trained on the these features (iii) A label-preserving projection of the target point on the source-feature manifold during inference via solving an optimization problem on the input space of the generative model using the learned metric. Finally, the projected target is used in the classifier. Since the projected target feature comes from the source manifold and has the same label as the real target by design, the classifier is expected to perform better on it than the true target. We demonstrate that our method outperforms the state-of-the-art Domain Generalization methods on multiple datasets and tasks.
△ Less
Submitted 18 July, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Centralized active tracking of a Markov chain with unknown dynamics
Authors:
Mrigank Raman,
Ojal Kumar,
Arpan Chattopadhyay
Abstract:
In this paper, selection of an active sensor subset for tracking a discrete time, finite state Markov chain having an unknown transition probability matrix (TPM) is considered. A total of N sensors are available for making observations of the Markov chain, out of which a subset of sensors are activated each time in order to perform reliable estimation of the process. The trade-off is between activ…
▽ More
In this paper, selection of an active sensor subset for tracking a discrete time, finite state Markov chain having an unknown transition probability matrix (TPM) is considered. A total of N sensors are available for making observations of the Markov chain, out of which a subset of sensors are activated each time in order to perform reliable estimation of the process. The trade-off is between activating more sensors to gather more observations for the remote estimation, and restricting sensor usage in order to save energy and bandwidth consumption. The problem is formulated as a constrained minimization problem, where the objective is the long-run averaged mean-squared error (MSE) in estimation, and the constraint is on sensor activation rate. A Lagrangian relaxation of the problem is solved by an artful blending of two tools: Gibbs sampling for MSE minimization and an on-line version of expectation maximization (EM) to estimate the unknown TPM. Finally, the Lagrange multiplier is updated using slower timescale stochastic approximation in order to satisfy the sensor activation rate constraint. The on-line EM algorithm, though adapted from literature, can estimate vector-valued parameters even under time-varying dimension of the sensor observations. Numerical results demonstrate approximately 1 dB better error performance than uniform sensor sampling and comparable error performance (within 2 dB bound) against complete sensor observation. This makes the proposed algorithm amenable to practical implementation.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
Learning Contextualized Knowledge Structures for Commonsense Reasoning
Authors:
Jun Yan,
Mrigank Raman,
Aaron Chan,
Tianyu Zhang,
Ryan Rossi,
Handong Zhao,
Sungchul Kim,
Nedim Lipka,
Xiang Ren
Abstract:
Recently, knowledge graph (KG) augmented models have achieved noteworthy success on various commonsense reasoning tasks. However, KG edge (fact) sparsity and noisy edge extraction/generation often hinder models from obtaining useful knowledge to reason over. To address these issues, we propose a new KG-augmented model: Hybrid Graph Network (HGN). Unlike prior methods, HGN learns to jointly context…
▽ More
Recently, knowledge graph (KG) augmented models have achieved noteworthy success on various commonsense reasoning tasks. However, KG edge (fact) sparsity and noisy edge extraction/generation often hinder models from obtaining useful knowledge to reason over. To address these issues, we propose a new KG-augmented model: Hybrid Graph Network (HGN). Unlike prior methods, HGN learns to jointly contextualize extracted and generated knowledge by reasoning over both within a unified graph structure. Given the task input context and an extracted KG subgraph, HGN is trained to generate embeddings for the subgraph's missing edges to form a "hybrid" graph, then reason over the hybrid graph while filtering out context-irrelevant edges. We demonstrate HGN's effectiveness through considerable performance gains across four commonsense reasoning benchmarks, plus a user study on edge validness and helpfulness.
△ Less
Submitted 4 June, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
Authors:
Mrigank Raman,
Aaron Chan,
Siddhant Agarwal,
Peifeng Wang,
Hansen Wang,
Sungchul Kim,
Ryan Rossi,
Handong Zhao,
Nedim Lipka,
Xiang Ren
Abstract:
Knowledge graphs (KGs) have helped neural models improve performance on various knowledge-intensive tasks, like question answering and item recommendation. By using attention over the KG, such KG-augmented models can also "explain" which KG information was most relevant for making a given prediction. In this paper, we question whether these models are really behaving as we expect. We show that, th…
▽ More
Knowledge graphs (KGs) have helped neural models improve performance on various knowledge-intensive tasks, like question answering and item recommendation. By using attention over the KG, such KG-augmented models can also "explain" which KG information was most relevant for making a given prediction. In this paper, we question whether these models are really behaving as we expect. We show that, through a reinforcement learning policy (or even simple heuristics), one can produce deceptively perturbed KGs, which maintain the downstream performance of the original KG while significantly deviating from the original KG's semantics and structure. Our findings raise doubts about KG-augmented models' ability to reason about KG information and give sensible explanations.
△ Less
Submitted 3 May, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Discrepancy Minimization in Domain Generalization with Generative Nearest Neighbors
Authors:
Prashant Pandey,
Mrigank Raman,
Sumanth Varambally,
Prathosh AP
Abstract:
Domain generalization (DG) deals with the problem of domain shift where a machine learning model trained on multiple-source domains fail to generalize well on a target domain with different statistics. Multiple approaches have been proposed to solve the problem of domain generalization by learning domain invariant representations across the source domains that fail to guarantee generalization on t…
▽ More
Domain generalization (DG) deals with the problem of domain shift where a machine learning model trained on multiple-source domains fail to generalize well on a target domain with different statistics. Multiple approaches have been proposed to solve the problem of domain generalization by learning domain invariant representations across the source domains that fail to guarantee generalization on the shifted target domain. We propose a Generative Nearest Neighbor based Discrepancy Minimization (GNNDM) method which provides a theoretical guarantee that is upper bounded by the error in the labeling process of the target. We employ a Domain Discrepancy Minimization Network (DDMN) that learns domain agnostic features to produce a single source domain while preserving the class labels of the data points. Features extracted from this source domain are learned using a generative model whose latent space is used as a sampler to retrieve the nearest neighbors for the target data points. The proposed method does not require access to the domain labels (a more realistic scenario) as opposed to the existing approaches. Empirically, we show the efficacy of our method on two datasets: PACS and VLCS. Through extensive experimentation, we demonstrate the effectiveness of the proposed method that outperforms several state-of-the-art DG methods.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Porting of eChronos RTOS on RISC-V Architecture
Authors:
Shubhendra Pal Singhal,
M. Sridevi,
N Sathya Narayanan,
M J Shankar Raman
Abstract:
eChronos is a formally verified Real Time Operating System(RTOS) designed for embedded micro-controllers. eChronos was targeted for tightly constrained devices without memory management units. Currently, eChronos is available on proprietary designs like ARM, PowerPC and Intel architectures. eChronos is adopted in safety critical systems like aircraft control system and medical implant devices. eCh…
▽ More
eChronos is a formally verified Real Time Operating System(RTOS) designed for embedded micro-controllers. eChronos was targeted for tightly constrained devices without memory management units. Currently, eChronos is available on proprietary designs like ARM, PowerPC and Intel architectures. eChronos is adopted in safety critical systems like aircraft control system and medical implant devices. eChronos is one of the very few system software not been ported to RISC-V. RISC-V is an open-source Instruction Set Architecture (ISA) that enables new era of processor development. Many standard Operating Systems, software tool chain have migrated to the RISC-V architecture. According to the latest trends, RISC-V is replacing many proprietary chips. As a secure RTOS, it is attractive to port on an open-source ISA. SHAKTI and PicoRV32 are some of the proven open-source RISC-V designs available. Now having a secure RTOS on an open-source hardware design, designed based on an open-source ISA makes it more interesting. In addition to this, the current architectures supported by eChronos are all proprietary designs, and porting eChronos to the RISC-V architecture increases the secure system development as a whole. This paper, presents an idea of porting eChronos on a chip which is open-source and effective, thus reducing the cost of embedded systems. Designing a open-source system that is completely open-source reduces the overall cost, increased the security and can be critically reviewed. This paper explores the design and architecture aspect involved in porting eChronos to RISC-V. The authors have successfully ported eChronos to RISC-V architecture and verified it on spike. The port of RISC-V to eChronos is made available open-source by authors. Along with that, the safe removal of architectural dependencies and subsequent changes in eChronos are also analyzed.
△ Less
Submitted 26 December, 2019; v1 submitted 30 August, 2019;
originally announced August 2019.