-
Learning Patterns from Biological Networks: A Compounded Burr Probability Model
Authors:
Tanujit Chakraborty,
Shraddha M. Naik,
Swarup Chattopadhyay,
Suchismita Das
Abstract:
Complex biological networks, comprising metabolic reactions, gene interactions, and protein interactions, often exhibit scale-free characteristics with power-law degree distributions. However, empirical studies have revealed discrepancies between observed biological network data and ideal power-law fits, highlighting the need for improved modeling approaches. To address this challenge, we propose…
▽ More
Complex biological networks, comprising metabolic reactions, gene interactions, and protein interactions, often exhibit scale-free characteristics with power-law degree distributions. However, empirical studies have revealed discrepancies between observed biological network data and ideal power-law fits, highlighting the need for improved modeling approaches. To address this challenge, we propose a novel family of distributions, building upon the baseline Burr distribution. Specifically, we introduce the compounded Burr (CBurr) distribution, derived from a continuous probability distribution family, enabling flexible and efficient modeling of node degree distributions in biological networks. This study comprehensively investigates the general properties of the CBurr distribution, focusing on parameter estimation using the maximum likelihood method. Subsequently, we apply the CBurr distribution model to large-scale biological network data, aiming to evaluate its efficacy in fitting the entire range of node degree distributions, surpassing conventional power-law distributions and other benchmarks. Through extensive data analysis and graphical illustrations, we demonstrate that the CBurr distribution exhibits superior modeling capabilities compared to traditional power-law distributions. This novel distribution model holds great promise for accurately capturing the complex nature of biological networks and advancing our understanding of their underlying mechanisms.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Model Identifiability for Bivariate Failure Time Data with Competing Risks: Parametric Cause-specific Hazards and Non-parametric Frailty
Authors:
Biswadeep Ghosh,
Anup Dewanji,
Sudipta Das
Abstract:
One of the commonly used approaches to capture dependence in multivariate survival data is through the frailty variables. The identifiability issues should be carefully investigated while modeling multivariate survival with or without competing risks. The use of non-parametric frailty distribution(s) is sometimes preferred for its robustness and flexibility properties. In this paper, we consider m…
▽ More
One of the commonly used approaches to capture dependence in multivariate survival data is through the frailty variables. The identifiability issues should be carefully investigated while modeling multivariate survival with or without competing risks. The use of non-parametric frailty distribution(s) is sometimes preferred for its robustness and flexibility properties. In this paper, we consider modeling of bivariate survival data with competing risks through four different kinds of non-parametric frailty and parametric baseline cause-specific hazard functions to investigate the corresponding model identifiability. We make the common assumption of the frailty mean being equal to unity.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Parametric Analysis of Bivariate Current Status data with Competing risks using Frailty model
Authors:
Biswadeep Ghosh,
Anup Dewanji,
Sudipta Das
Abstract:
Shared and correlated Gamma frailty models are widely used in the literature to model the association in multivariate current status data. In this paper, we have proposed two other new Gamma frailty models, namely shared cause-specific and correlated cause-specific Gamma frailty to capture association in bivariate current status data with competing risks. We have investigated the identifiability o…
▽ More
Shared and correlated Gamma frailty models are widely used in the literature to model the association in multivariate current status data. In this paper, we have proposed two other new Gamma frailty models, namely shared cause-specific and correlated cause-specific Gamma frailty to capture association in bivariate current status data with competing risks. We have investigated the identifiability of the bivariate models with competing risks for each of the four frailty variables. We have considered maximum likelihood estimation of the model parameters. Thorough simulation studies have been performed to study the finite sample behaviour of the estimated parameters. Also, we have analyzed a real data set on hearing loss in two ears using Exponential type and Weibull type cause-specific baseline hazard functions with the four different Gamma frailty variables and compare the fits using AIC.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Jacobi Prior: An Alternative Bayesian Method for Supervised Learning
Authors:
Sourish Das,
Shouvik Sardar
Abstract:
The `Jacobi prior' is an alternative Bayesian method for predictive models. It performs better than well-known methods such as Lasso, Ridge, Elastic Net, and MCMC-based Horse-Shoe Prior, particularly in terms of prediction accuracy and run-time. This method is implemented for Gaussian process classification, adeptly handling a nonlinear decision boundary. The Jacobi prior demonstrates its capabili…
▽ More
The `Jacobi prior' is an alternative Bayesian method for predictive models. It performs better than well-known methods such as Lasso, Ridge, Elastic Net, and MCMC-based Horse-Shoe Prior, particularly in terms of prediction accuracy and run-time. This method is implemented for Gaussian process classification, adeptly handling a nonlinear decision boundary. The Jacobi prior demonstrates its capability to manage partitioned data across global servers, making it highly useful in distributed computing environments. Additionally, we show that the Jacobi prior is more than a hundred times faster than these methods while maintaining similar predictive accuracy. As the method is both fast and accurate, it is advantageous for organisations looking to reduce their environmental impact and meet ESG standards. To demonstrate the effectiveness of the Jacobi prior, we conducted a detailed simulation study with four experiments focusing on statistical consistency, accuracy, and speed. We also present two empirical studies: the first evaluates credit risk by analysing default probability using data from the U.S. Small Business Administration (SBA), and the second uses the Jacobi prior for classifying stars, quasars, and galaxies in a three-class problem using multinomial logit regression on data from the Sloan Digital Sky Survey. Different filters were used as features in this study. All codes and datasets for this paper are available in the following GitHub repository : https://github.com/sourish-cmi/Jacobi-Prior/
△ Less
Submitted 4 June, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Thermometer: Towards Universal Calibration for Large Language Models
Authors:
Maohao Shen,
Subhro Das,
Kristjan Greenewald,
Prasanna Sattigeri,
Gregory Wornell,
Soumya Ghosh
Abstract:
We consider the issue of calibration in large language models (LLM). Recent studies have found that common interventions such as instruction tuning often result in poorly calibrated LLMs. Although calibration is well-explored in traditional applications, calibrating LLMs is uniquely challenging. These challenges stem as much from the severe computational requirements of LLMs as from their versatil…
▽ More
We consider the issue of calibration in large language models (LLM). Recent studies have found that common interventions such as instruction tuning often result in poorly calibrated LLMs. Although calibration is well-explored in traditional applications, calibrating LLMs is uniquely challenging. These challenges stem as much from the severe computational requirements of LLMs as from their versatility, which allows them to be applied to diverse tasks. Addressing these challenges, we propose THERMOMETER, a calibration approach tailored to LLMs. THERMOMETER learns an auxiliary model, given data from multiple tasks, for calibrating a LLM. It is computationally efficient, preserves the accuracy of the LLM, and produces better-calibrated responses for new tasks. Extensive empirical evaluations across various benchmarks demonstrate the effectiveness of the proposed method.
△ Less
Submitted 27 June, 2024; v1 submitted 19 February, 2024;
originally announced March 2024.
-
Binary Gaussian Copula Synthesis: A Novel Data Augmentation Technique to Advance ML-based Clinical Decision Support Systems for Early Prediction of Dialysis Among CKD Patients
Authors:
Hamed Khosravi,
Srinjoy Das,
Abdullah Al-Mamun,
Imtiaz Ahmed
Abstract:
The Center for Disease Control estimates that over 37 million US adults suffer from chronic kidney disease (CKD), yet 9 out of 10 of these individuals are unaware of their condition due to the absence of symptoms in the early stages. It has a significant impact on patients' quality of life, particularly when it progresses to the need for dialysis. Early prediction of dialysis is crucial as it can…
▽ More
The Center for Disease Control estimates that over 37 million US adults suffer from chronic kidney disease (CKD), yet 9 out of 10 of these individuals are unaware of their condition due to the absence of symptoms in the early stages. It has a significant impact on patients' quality of life, particularly when it progresses to the need for dialysis. Early prediction of dialysis is crucial as it can significantly improve patient outcomes and assist healthcare providers in making timely and informed decisions. However, developing an effective machine learning (ML)-based Clinical Decision Support System (CDSS) for early dialysis prediction poses a key challenge due to the imbalanced nature of data. To address this challenge, this study evaluates various data augmentation techniques to understand their effectiveness on real-world datasets. We propose a new approach named Binary Gaussian Copula Synthesis (BGCS). BGCS is tailored for binary medical datasets and excels in generating synthetic minority data that mirrors the distribution of the original data. BGCS enhances early dialysis prediction by outperforming traditional methods in detecting dialysis patients. For the best ML model, Random Forest, BCGS achieved a 72% improvement, surpassing the state-of-the-art augmentation approaches. Also, we present a ML-based CDSS, designed to aid clinicians in making informed decisions. CDSS, which utilizes decision tree models, is developed to improve patient outcomes, identify critical variables, and thereby enable clinicians to make proactive decisions, and strategize treatment plans effectively for CKD patients who are more likely to require dialysis in the near future. Through comprehensive feature analysis and meticulous data preparation, we ensure that the CDSS's dialysis predictions are not only accurate but also actionable, providing a valuable tool in the management and treatment of CKD.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?
Authors:
Maohao Shen,
J. Jon Ryu,
Soumya Ghosh,
Yuheng Bu,
Prasanna Sattigeri,
Subhro Das,
Gregory W. Wornell
Abstract:
This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies…
▽ More
This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e.g., in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity.
△ Less
Submitted 12 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Robust Genomic Prediction and Heritability Estimation using Density Power Divergence
Authors:
Upama Paul Chowdhury,
Susmita Das,
Abhik Ghosh
Abstract:
This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the development of advanced robust statistical models tailored for genomic prediction from single nucleotide polymorphism (SNP) data collected from genome-wide asso…
▽ More
This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the development of advanced robust statistical models tailored for genomic prediction from single nucleotide polymorphism (SNP) data collected from genome-wide association studies (GWAS) in plant and animal breeding and multi-field trials. The manuscript explores the limitations of traditional marker-assisted recurrent selection, highlighting the significance of incorporating all estimated effects of marker loci into the statistical framework and aiming to reduce the high dimensionality of GWAS data while preserving critical information. This paper introduces a new robust statistical framework for genomic prediction, employing one-stage and two-stage linear mixed model analyses along with utilizing the popular robust minimum density power divergence estimator (MDPDE) to estimate genetic effects on phenotypic traits. The study illustrates the superior performance of the proposed MDPDE-based genomic prediction and associated heritability estimation procedures over existing competitors through extensive empirical experiments on artificial datasets and application to a real-life maize breeding dataset. The results showcase the robustness and accuracy of the proposed MDPDE-based approaches, especially in the presence of data contamination, emphasizing their potential applications in improving breeding programs and advancing genomic prediction of phenotyping traits.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
One step closer to unbiased aleatoric uncertainty estimation
Authors:
Wang Zhang,
Ziwen Ma,
Subhro Das,
Tsui-Wei Weng,
Alexandre Megretski,
Luca Daniel,
Lam M. Nguyen
Abstract:
Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address t…
▽ More
Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address this issue, we propose a new estimation method by actively de-noising the observed data. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.
△ Less
Submitted 20 December, 2023; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Applying Pre-Trained Deep-Learning Model on Wrist Angel Data -- An Analysis Plan
Authors:
Harald Vilhelm Skat-Rørdam,
Mia Hang Knudsen,
Simon Nørby Knudsen,
Nicole Nadine Lønfeldt,
Sneha Das,
Line Katrine Harder Clemmensen
Abstract:
We aim to investigate if we can improve predictions of stress caused by OCD symptoms using pre-trained models, and present our statistical analysis plan in this paper. With the methods presented in this plan, we aim to avoid bias from data knowledge and thereby strengthen our hypotheses and findings. The Wrist Angel study, which this statistical analysis plan concerns, contains data from nine part…
▽ More
We aim to investigate if we can improve predictions of stress caused by OCD symptoms using pre-trained models, and present our statistical analysis plan in this paper. With the methods presented in this plan, we aim to avoid bias from data knowledge and thereby strengthen our hypotheses and findings. The Wrist Angel study, which this statistical analysis plan concerns, contains data from nine participants, between 8 and 17 years old, diagnosed with obsessive-compulsive disorder (OCD). The data was obtained by an Empatica E4 wristband, which the participants wore during waking hours for 8 weeks. The purpose of the study is to assess the feasibility of predicting the in-the-wild OCD events captured during this period. In our analysis, we aim to investigate if we can improve predictions of stress caused by OCD symptoms, and to do this we have created a pre-trained model, trained on four open-source data for stress prediction. We intend to apply this pre-trained model to the Wrist Angel data by fine-tuning, thereby utilizing transfer learning. The pre-trained model is a convolutional neural network that uses blood volume pulse, heart rate, electrodermal activity, and skin temperature as time series windows to predict OCD events. Furthermore, using accelerometer data, another model filters physical activity to further improve performance, given that physical activity is physiologically similar to stress. By evaluating various ways of applying our model (fine-tuned, non-fine-tuned, pre-trained, non-pre-trained, and with or without activity classification), we contextualize the problem such that it can be assessed if transfer learning is a viable strategy in this domain.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Concurrent Density Estimation with Wasserstein Autoencoders: Some Statistical Insights
Authors:
Anish Chakrabarty,
Arkaprabha Basu,
Swagatam Das
Abstract:
Variational Autoencoders (VAEs) have been a pioneering force in the realm of deep generative models. Amongst its legions of progenies, Wasserstein Autoencoders (WAEs) stand out in particular due to the dual offering of heightened generative quality and a strong theoretical backbone. WAEs consist of an encoding and a decoding network forming a bottleneck with the prime objective of generating new s…
▽ More
Variational Autoencoders (VAEs) have been a pioneering force in the realm of deep generative models. Amongst its legions of progenies, Wasserstein Autoencoders (WAEs) stand out in particular due to the dual offering of heightened generative quality and a strong theoretical backbone. WAEs consist of an encoding and a decoding network forming a bottleneck with the prime objective of generating new samples resembling the ones it was catered to. In the process, they aim to achieve a target latent representation of the encoded data. Our work is an attempt to offer a theoretical understanding of the machinery behind WAEs. From a statistical viewpoint, we pose the problem as concurrent density estimation tasks based on neural network-induced transformations. This allows us to establish deterministic upper bounds on the realized errors WAEs commit. We also analyze the propagation of these stochastic errors in the presence of adversaries. As a result, both the large sample properties of the reconstructed distribution and the resilience of WAE models are explored.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means
Authors:
Supratik Basu,
Jyotishka Ray Choudhury,
Debolina Paul,
Swagatam Das
Abstract:
Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These…
▽ More
Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Accelerating material discovery with a threshold-driven hybrid acquisition policy-based Bayesian optimization
Authors:
Ahmed Shoyeb Raihan,
Hamed Khosravi,
Srinjoy Das,
Imtiaz Ahmed
Abstract:
Advancements in materials play a crucial role in technological progress. However, the process of discovering and developing materials with desired properties is often impeded by substantial experimental costs, extensive resource utilization, and lengthy development periods. To address these challenges, modern approaches often employ machine learning (ML) techniques such as Bayesian Optimization (B…
▽ More
Advancements in materials play a crucial role in technological progress. However, the process of discovering and developing materials with desired properties is often impeded by substantial experimental costs, extensive resource utilization, and lengthy development periods. To address these challenges, modern approaches often employ machine learning (ML) techniques such as Bayesian Optimization (BO), which streamline the search for optimal materials by iteratively selecting experiments that are most likely to yield beneficial results. However, traditional BO methods, while beneficial, often struggle with balancing the trade-off between exploration and exploitation, leading to sub-optimal performance in material discovery processes. This paper introduces a novel Threshold-Driven UCB-EI Bayesian Optimization (TDUE-BO) method, which dynamically integrates the strengths of Upper Confidence Bound (UCB) and Expected Improvement (EI) acquisition functions to optimize the material discovery process. Unlike the classical BO, our method focuses on efficiently navigating the high-dimensional material design space (MDS). TDUE-BO begins with an exploration-focused UCB approach, ensuring a comprehensive initial sweep of the MDS. As the model gains confidence, indicated by reduced uncertainty, it transitions to the more exploitative EI method, focusing on promising areas identified earlier. The UCB-to-EI switching policy dictated guided through continuous monitoring of the model uncertainty during each step of sequential sampling results in navigating through the MDS more efficiently while ensuring rapid convergence. The effectiveness of TDUE-BO is demonstrated through its application on three different material datasets, showing significantly better approximation and optimization performance over the EI and UCB-based BO methods in terms of the RMSE scores and convergence efficiency, respectively.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Classroom Community amid Covid-19: A Mixed-Methods Study of Undergraduate Students in Introductory Mathematics and Statistics
Authors:
Shira Viel,
Maria Tackett,
Sarwari Das,
Joseph Choo
Abstract:
A strong sense of classroom community is associated with many positive learning outcomes and is a critical contributor to undergraduate students' persistence in STEM, particularly for women and students of color. This chapter describes a mixed-methods investigation into the relationship between classroom community and course attributes in introductory undergraduate mathematics and statistics cours…
▽ More
A strong sense of classroom community is associated with many positive learning outcomes and is a critical contributor to undergraduate students' persistence in STEM, particularly for women and students of color. This chapter describes a mixed-methods investigation into the relationship between classroom community and course attributes in introductory undergraduate mathematics and statistics courses, mediated by student demographics. The project was motivated by and conducted amid the Covid-19 pandemic: data were collected from online courses in the 2021-21 academic year and from hybrid and in-person courses in the 2021-22 academic year. Quantitative data was gathered from both students and instructors and analyzed using structural equation modeling. The primary instrument was the validated Classroom Community Scale - Short Form. These quantitative results are complemented and contextualized by thematic and textual analyses of focus group data, gathered using a newly developed protocol piloted during the 2021-22 academic year. All data comes from a highly selective private university in the United States. Preliminary practical implications of the study include the value of synchronous participation in fostering connectedness and the importance of attending to students' personal identities in understanding their experiences of belonging.
△ Less
Submitted 21 December, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Orderings of extremes among dependent extended Weibull random variables
Authors:
Ramkrishna Jyoti Samanta,
Sangita Das,
N. Balakrishnan
Abstract:
In this work, we consider two sets of dependent variables $\{X_{1},\ldots,X_{n}\}$ and $\{Y_{1},\ldots,Y_{n}\}$, where $X_{i}\sim EW(α_{i},λ_{i},k_{i})$ and $Y_{i}\sim EW(β_{i},μ_{i},l_{i})$, for $i=1,\ldots, n$, which are coupled by Archimedean copulas having different generators. Also, let $N_{1}$ and $N_{2}$ be two non-negative integer-valued random variables, independent of $X_{i}'$s and…
▽ More
In this work, we consider two sets of dependent variables $\{X_{1},\ldots,X_{n}\}$ and $\{Y_{1},\ldots,Y_{n}\}$, where $X_{i}\sim EW(α_{i},λ_{i},k_{i})$ and $Y_{i}\sim EW(β_{i},μ_{i},l_{i})$, for $i=1,\ldots, n$, which are coupled by Archimedean copulas having different generators. Also, let $N_{1}$ and $N_{2}$ be two non-negative integer-valued random variables, independent of $X_{i}'$s and $Y_{i}'$s, respectively. We then establish different inequalities between two extremes, namely, $X_{1:n}$ and $Y_{1:n}$ and $X_{n:n}$ and $Y_{n:n}$, in terms of the usual stochastic, star, Lorenz, hazard rate, reversed hazard rate and dispersive orders. We also establish some ordering results between $X_{1:{N_{1}}}$ and $Y_{1:{N_{2}}}$ and $X_{N_{1}:{N_{1}}}$ and $Y_{N_{2}:{N_{2}}}$ in terms of the usual stochastic order. Several examples and counterexamples are presented for illustrating all the results established here. Some of the results here extend the existing results of Barmalzan et al. (2020).
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Conditional expectation using compactification operators
Authors:
Suddhasattwa Das
Abstract:
The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactificatio…
▽ More
The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.
△ Less
Submitted 8 January, 2024; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Non-asymptotic System Identification for Linear Systems with Nonlinear Policies
Authors:
Yingying Li,
Tianpeng Zhang,
Subhro Das,
Jeff Shamma,
Na Li
Abstract:
This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input const…
▽ More
This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input constraints. In this paper, we provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies as long as the generated state and action trajectories are bounded. This significantly generalizes the existing non-asymptotic guarantees for linear system identification, which usually consider i.i.d. random inputs or linear policies. Interestingly, our error bound is consistent with that for linear policies with respect to the dependence on the trajectory length, system dimensions, and excitation levels. Lastly, we demonstrate the applications of our results by safe learning with robust model predictive control and provide numerical analysis.
△ Less
Submitted 17 June, 2023;
originally announced June 2023.
-
Blocked Gibbs sampler for hierarchical Dirichlet processes
Authors:
Snigdha Das,
Yabo Niu,
Yang Ni,
Bani K. Mallick,
Debdeep Pati
Abstract:
Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and is known to have a linear complexity with the sample size. A recently dev…
▽ More
Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and is known to have a linear complexity with the sample size. A recently developed slice sampler allows for efficient blocked updates of the parameters, but is shown to be statistically unstable in our article. We develop a blocked Gibbs sampler to sample from the posterior distribution of HDP, which produces statistically stable results, is highly scalable with respect to sample size, and is shown to have good mixing. The heart of the construction is to endow the shared concentration parameter with an appropriately chosen gamma prior that allows us to break the dependence of the shared mixing proportions and permits independent updates of certain log-concave random variables in a block. En route, we develop an efficient rejection sampler for these random variables leveraging piece-wise tangent-line approximations.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Using Geographic Location-based Public Health Features in Survival Analysis
Authors:
Navid Seidi,
Ardhendu Tripathy,
Sajal K. Das
Abstract:
Time elapsed till an event of interest is often modeled using the survival analysis methodology, which estimates a survival score based on the input features. There is a resurgence of interest in developing more accurate prediction models for time-to-event prediction in personalized healthcare using modern tools such as neural networks. Higher quality features and more frequent observations improv…
▽ More
Time elapsed till an event of interest is often modeled using the survival analysis methodology, which estimates a survival score based on the input features. There is a resurgence of interest in developing more accurate prediction models for time-to-event prediction in personalized healthcare using modern tools such as neural networks. Higher quality features and more frequent observations improve the predictions for a patient, however, the impact of including a patient's geographic location-based public health statistics on individual predictions has not been studied. This paper proposes a complementary improvement to survival analysis models by incorporating public health statistics in the input features. We show that including geographic location-based public health information results in a statistically significant improvement in the concordance index evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset containing nationwide cancer incidence data. The improvement holds for both the standard Cox proportional hazards model and the state-of-the-art Deep Survival Machines model. Our results indicate the utility of geographic location-based public health features in survival analysis.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Optimal selection of the starting lineup for a football team
Authors:
Soudeep Deb,
Shubhabrata Das
Abstract:
The success of a football team depends on various individual skills and performances of the selected players as well as how cohesively they perform. We propose a two-stage process for selecting optimal playing eleven of a football team from its pool of available players. In the first stage a LASSO-induced modified multinomial logistic regression model is derived to analyse the probabilities of the…
▽ More
The success of a football team depends on various individual skills and performances of the selected players as well as how cohesively they perform. We propose a two-stage process for selecting optimal playing eleven of a football team from its pool of available players. In the first stage a LASSO-induced modified multinomial logistic regression model is derived to analyse the probabilities of the three possible outcomes. The model considers strengths of the players in the team as well as those of the opponent, home advantage, and also the effects of individual players and player combinations beyond the recorded performances of these players. In the second stage, a GRASP-type meta-heuristic is implemented for the team selection which maximises its probability of winning. The work is illustrated with English Premier League data from 2008/09 to 2015/16. The application demonstrates that the model in the first stage furnishes valuable insights about the deciding factors for different teams whereas the optimisation steps can be effectively used to determine the best possible starting lineup under various circumstances. We propose a measure of efficiency in team selection by the team management and analyse the performance of the teams on this front.
△ Less
Submitted 12 April, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Prevalence and Major Risk Factors of Non-communicable Diseases: A Machine Learning based Cross-Sectional Study
Authors:
Mrinmoy Roy,
Anica Tasnim Protity,
Srabonti Das,
Porarthi Dhar
Abstract:
Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was pre…
▽ More
Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was present in 83.56% of all participants. CVD was more common in male participants. Consequently, male participants had a higher blood pressure distribution than females. Diabetes mellitus (DM), on the other hand, did not have a gender-based inclination. Both CVD and DM had an age-based progression. Our study showed that chronic respiratory illness was more frequent in middle-aged participants than in younger or elderly individuals. Based on the data, every one in five hospitalized patients was obese. We analyzed the co-morbidities and found that 31.5% of the population has only one NCD, 30.1% has two NCDs, and 38.3% has more than two NCDs. Besides, 86.25% of all diabetic patients had cardiovascular issues. All thyroid patients in our study had CVD. Using a t-test, we found a relationship between CKD and thyroid (p-value 0.061). Males under 35 years have a statistically significant relationship between thyroid and chronic respiratory diseases (p-value 0.018). We also found an association between DM and CKD among patients over 65 (p-value 0.038). Moreover, there has been a statistically significant relationship between CKD and Thyroid (P < 0.05) for those below 35 and 35-65. We used a two-way ANOVA test to find the statistically significant interaction of heart issues and chronic respiratory illness, in combination with diabetes. The combination of DM and RTI also affected CKD in male patients over 65 years old.
△ Less
Submitted 18 May, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Variance-reduced Clipping for Non-convex Optimization
Authors:
Amirhossein Reisizadeh,
Haochuan Li,
Subhro Das,
Ali Jadbabaie
Abstract:
Gradient clipping is a standard training technique used in deep learning applications such as large-scale language modeling to mitigate exploding gradients. Recent experimental studies have demonstrated a fairly special behavior in the smoothness of the training objective along its trajectory when trained with gradient clipping. That is, the smoothness grows with the gradient norm. This is in clea…
▽ More
Gradient clipping is a standard training technique used in deep learning applications such as large-scale language modeling to mitigate exploding gradients. Recent experimental studies have demonstrated a fairly special behavior in the smoothness of the training objective along its trajectory when trained with gradient clipping. That is, the smoothness grows with the gradient norm. This is in clear contrast to the well-established assumption in folklore non-convex optimization, a.k.a. $L$--smoothness, where the smoothness is assumed to be bounded by a constant $L$ globally. The recently introduced $(L_0,L_1)$--smoothness is a more relaxed notion that captures such behavior in non-convex optimization. In particular, it has been shown that under this relaxed smoothness assumption, SGD with clipping requires $O(ε^{-4})$ stochastic gradient computations to find an $ε$--stationary solution. In this paper, we employ a variance reduction technique, namely SPIDER, and demonstrate that for a carefully designed learning rate, this complexity is improved to $O(ε^{-3})$ which is order-optimal. Our designed learning rate comprises the clipping technique to mitigate the growing smoothness. Moreover, when the objective function is the average of $n$ components, we improve the existing $O(nε^{-2})$ bound on the stochastic gradient complexity to $O(\sqrt{n} ε^{-2} + n)$, which is order-optimal as well. In addition to being theoretically optimal, SPIDER with our designed parameters demonstrates comparable empirical performance against variance-reduced methods such as SVRG and SARAH in several vision tasks.
△ Less
Submitted 2 June, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Monitoring the risk of a tailings dam collapse through spectral analysis of satellite InSAR time-series data
Authors:
Sourav Das,
Anuradha Priyadarshana,
Stephen Grebby
Abstract:
Slope failures possess destructive power that can cause significant damage to both life and infrastructure. Monitoring slopes prone to instabilities is therefore critical in mitigating the risk posed by their failure. The purpose of slope monitoring is to detect precursory signs of stability issues, such as changes in the rate of displacement with which a slope is deforming. This information can t…
▽ More
Slope failures possess destructive power that can cause significant damage to both life and infrastructure. Monitoring slopes prone to instabilities is therefore critical in mitigating the risk posed by their failure. The purpose of slope monitoring is to detect precursory signs of stability issues, such as changes in the rate of displacement with which a slope is deforming. This information can then be used to predict the timing or probability of an imminent failure in order to provide an early warning. In this study, a more objective, statistical-learning algorithm is proposed to detect and characterise the risk of a slope failure, based on spectral analysis of serially correlated displacement time series data. The algorithm is applied to satellite-based interferometric synthetic radar (InSAR) displacement time series data to retrospectively analyse the risk of the 2019 Brumadinho tailings dam collapse in Brazil. Two potential risk milestones are identified and signs of a definitive but emergent risk (27 February 2018 to 26 August 2018) and imminent risk of collapse of the tailings dam (27 June 2018 to 24 December 2018) are detected by the algorithm. Importantly, this precursory indication of risk of failure is detected as early as at least five months prior to the dam collapse on 25 January 2019. The results of this study demonstrate that the combination of spectral methods and second order statistical properties of InSAR displacement time series data can reveal signs of a transition into an unstable deformation regime, and that this algorithm can provide sufficient early warning that could help mitigate catastrophic slope failures.
△ Less
Submitted 3 February, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Model-Based and Model-Free point prediction algorithms for locally stationary random fields
Authors:
Srinjoy Das,
Yiwen Zhang,
Dimitris N. Politis
Abstract:
The Model-free Prediction Principle has been successfully applied to general regression problems, as well as problems involving stationary and locally stationary time series. In this paper we demonstrate how Model-Free Prediction can be applied to handle random fields that are only locally stationary, i.e., they can be assumed to be stationary only across a limited part over their entire region of…
▽ More
The Model-free Prediction Principle has been successfully applied to general regression problems, as well as problems involving stationary and locally stationary time series. In this paper we demonstrate how Model-Free Prediction can be applied to handle random fields that are only locally stationary, i.e., they can be assumed to be stationary only across a limited part over their entire region of definition. We construct one-step-ahead point predictors and compare the performance of Model-free to Model-based prediction using models that incorporate a trend and/or heteroscedasticity. Both aspects of the paper, Model-free and Model-based, are novel in the context of random fields that are locally (but not globally) stationary. We demonstrate the application of our Model-based and Model-free point prediction methods to synthetic data as well as images from the CIFAR-10 dataset and in the latter case show that our best Model-free point prediction results outperform those obtained using Model-based prediction.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Effective City Planning: A Data Driven Analysis of Infrastructure and Citizen Feedback in Bangalore
Authors:
Srishti Mishra,
Srinjoy Das
Abstract:
Leveraging civic data, divided into 3 categories spending, infrastructure and citizen feedback, can present a clear picture of the priorities, performance, and pain-points of a city. Data driven insights highlight the current issues faced by citizens as well as disparity between government spending and quality of work, and can aid in providing effective solutions. City infrastructure; footpaths, l…
▽ More
Leveraging civic data, divided into 3 categories spending, infrastructure and citizen feedback, can present a clear picture of the priorities, performance, and pain-points of a city. Data driven insights highlight the current issues faced by citizens as well as disparity between government spending and quality of work, and can aid in providing effective solutions. City infrastructure; footpaths, lighting, and parks, describe the living quality of citizens and can be compared to the annual spending in these sectors to track effectiveness. Analyzing complaints ensures citizen feedback is taken into account during both long-term planning and in short-term solutions to pinpoint critical areas of improvement. Integrating an analysis loop and data driven dashboards can help in improving performance of municipal corporations, while adding transparency between citizens and the city officials. In the paper, constituency rankings across the city infrastructure indicated a low importance towards greenery in terms of Parks, where each constituency has less than 2% of their area as a park. As populations in these areas are already high and increasing, this is likely to worsen in the coming years. Comparing the results with complaints, surprisingly the rankings of footpaths in constituencies were contrary to the number of complaints in these constituencies, with high ranking constituencies receiving the highest number of complaints, which would require further analysis. In terms of street lights, the areas with low quality lighting were associated with a large number of complaints from citizens, indicating that action needs to be taken immediately. Overall, a text analysis of complaints across constituencies reflected the everyday struggles of the city with the top keywords 'roads' and 'vehicles', followed by 'footpaths' and 'garbage', which are both critical problems in Bangalore City today.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Hierarchical Bayes estimation of small area means using statistical linkage of disparate data sources
Authors:
Soumojit Das,
Partha Lahiri
Abstract:
We propose a Bayesian approach to estimate finite population means for small areas. The proposed methodology improves on the traditional sample survey methods because, unlike the traditional methods, our proposed method borrows strength from multiple data sources. Our approach is fundamentally different from the existing small area Bayesian approach to the finite population sampling, which typical…
▽ More
We propose a Bayesian approach to estimate finite population means for small areas. The proposed methodology improves on the traditional sample survey methods because, unlike the traditional methods, our proposed method borrows strength from multiple data sources. Our approach is fundamentally different from the existing small area Bayesian approach to the finite population sampling, which typically assumes a hierarchical model for all units of the finite population. We assume such model only for the units of the finite population in which the outcome variable is observed; because for these units, the assumed model can be checked using existing statistical tools. Modeling unobserved units of the finite population is challenging because the assumed model cannot be checked in the absence of data on the outcome variable. To make reasonable modeling assumptions, we propose to form several cells for each small area using factors that potentially influence the outcome variable of interest. This strategy is expected to bring some degree of homogeneity within a given cell and also among cells from different small areas that are constructed with the same factor level combination. Instead of modeling true probabilities for unobserved individual units, we assume that population means of cells with the same combination of factor levels are identical across small areas and the population mean of true probabilities for a cell is identical to the mean of true values for the observed units in that cell. We apply our proposed methodology to a real-life COVID-19 survey, linking information from multiple disparate data sources to estimate vaccine-hesitancy rates (proportions) for 50 US states and Washington, D.C. (small areas). We also provide practical ways of model selection that can be applied to a wider class of models under similar setting but for a diverse range of scientific problems.
△ Less
Submitted 30 April, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Neural Greedy Pursuit for Feature Selection
Authors:
Sandipan Das,
Alireza M. Javid,
Prakash Borpatra Gohain,
Yonina C. Eldar,
Saikat Chatterjee
Abstract:
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. The features are selected one by one sequentially, in an iterative loss minimization procedure. We use neural networks as predictors in the algorithm to compute the loss and hence, we refer to our method as neural greedy pursuit (NGP). NGP is efficient in selecting $N$ featu…
▽ More
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. The features are selected one by one sequentially, in an iterative loss minimization procedure. We use neural networks as predictors in the algorithm to compute the loss and hence, we refer to our method as neural greedy pursuit (NGP). NGP is efficient in selecting $N$ features when $N \ll P$, and it provides a notion of feature importance in a descending order following the sequential selection procedure. We experimentally show that NGP provides better performance than several feature selection methods such as DeepLIFT and Drop-one-out loss. In addition, we experimentally show a phase transition behavior in which perfect selection of all $N$ features without false positives is possible when the training data size exceeds a threshold.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
On Convergence of Gradient Descent Ascent: A Tight Local Analysis
Authors:
Haochuan Li,
Farzan Farnia,
Subhro Das,
Ali Jadbabaie
Abstract:
Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{\mathbf{x}} \max_{\mathbf{y}} f(\mathbf{x};\mathbf{y})$ where $f$ is strongly-concave in $\mathbf{y}$ and possibly nonconvex in $\mathbf{x}$, (Lin et a…
▽ More
Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{\mathbf{x}} \max_{\mathbf{y}} f(\mathbf{x};\mathbf{y})$ where $f$ is strongly-concave in $\mathbf{y}$ and possibly nonconvex in $\mathbf{x}$, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio $η_{\mathbf{y}}/η_{\mathbf{x}}=Θ(κ^2)$ where $η_{\mathbf{x}}$ and $η_{\mathbf{y}}$ are the stepsizes for $\mathbf{x}$ and $\mathbf{y}$ and $κ$ is the condition number for $\mathbf{y}$. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the \emph{local convergence} of general \emph{nonconvex-nonconcave} minimax problems. We demonstrate that a stepsize ratio of $Θ(κ)$ is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where $κ$ is the local condition number for $\mathbf{y}$. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Diverse Counterfactual Explanations for Anomaly Detection in Time Series
Authors:
Deborah Sulem,
Michele Donini,
Muhammad Bilal Zafar,
Francois-Xavier Aubet,
Jan Gasthaus,
Tim Januschowski,
Sanjiv Das,
Krishnaram Kenthapadi,
Cedric Archambeau
Abstract:
Data-driven methods that detect anomalies in times series data are ubiquitous in practice, but they are in general unable to provide helpful explanations for the predictions they make. In this work we propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates a set of diverse counterfactual examples, i.e, mu…
▽ More
Data-driven methods that detect anomalies in times series data are ubiquitous in practice, but they are in general unable to provide helpful explanations for the predictions they make. In this work we propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates a set of diverse counterfactual examples, i.e, multiple perturbed versions of the original time series that are not considered anomalous by the detection model. Since the magnitude of the perturbations is limited, these counterfactuals represent an ensemble of inputs similar to the original time series that the model would deem normal. Our algorithm is applicable to any differentiable anomaly detection model. We investigate the value of our method on univariate and multivariate real-world datasets and two deep-learning-based anomaly detection models, under several explainability criteria previously proposed in other data domains such as Validity, Plausibility, Closeness and Diversity. We show that our algorithm can produce ensembles of counterfactual examples that satisfy these criteria and thanks to a novel type of visualisation, can convey a richer interpretation of a model's internal mechanism than existing methods. Moreover, we design a sparse variant of our method to improve the interpretability of counterfactual explanations for high-dimensional time series anomalies. In this setting, our explanation is localised on only a few dimensions and can therefore be communicated more efficiently to the model's user.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
State-of-the-Art Review of Design of Experiments for Physics-Informed Deep Learning
Authors:
Sourav Das,
Solomon Tesfamariam
Abstract:
This paper presents a comprehensive review of the design of experiments used in the surrogate models. In particular, this study demonstrates the necessity of the design of experiment schemes for the Physics-Informed Neural Network (PINN), which belongs to the supervised learning class. Many complex partial differential equations (PDEs) do not have any analytical solution; only numerical methods ar…
▽ More
This paper presents a comprehensive review of the design of experiments used in the surrogate models. In particular, this study demonstrates the necessity of the design of experiment schemes for the Physics-Informed Neural Network (PINN), which belongs to the supervised learning class. Many complex partial differential equations (PDEs) do not have any analytical solution; only numerical methods are used to solve the equations, which is computationally expensive. In recent decades, PINN has gained popularity as a replacement for numerical methods to reduce the computational budget. PINN uses physical information in the form of differential equations to enhance the performance of the neural networks. Though it works efficiently, the choice of the design of experiment scheme is important as the accuracy of the predicted responses using PINN depends on the training data. In this study, five different PDEs are used for numerical purposes, i.e., viscous Burger's equation, Shrödinger equation, heat equation, Allen-Cahn equation, and Korteweg-de Vries equation. A comparative study is performed to establish the necessity of the selection of a DoE scheme. It is seen that the Hammersley sampling-based PINN performs better than other DoE sample strategies.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification
Authors:
Saptarshi Chakraborty,
Debolina Paul,
Swagatam Das
Abstract:
The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusion…
▽ More
The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusions that underpin each individual contribution may no longer be valid. To meet these challenges coherently, in this study, we offer a unified robust framework that includes a broad variety of linear prediction problems on a Hilbert space, coupled with a generic class of loss functions. Notably, we do not require any assumptions on the distribution of the outlying data points ($\mathcal{O}$) nor the compactness of the support of the inlying ones ($\mathcal{I}$). Under mild conditions on the dual norm, we show that for misspecification level $ε$, these estimators achieve an error rate of $O(\max\left\{|\mathcal{O}|^{1/2}n^{-1/2}, |\mathcal{I}|^{1/2}n^{-1} \right\}+ε)$, matching the best-known rates in literature. This rate is slightly slower than the classical rates of $O(n^{-1/2})$, indicating that we need to pay a price in terms of error rates to obtain robust estimates. Additionally, we show that this rate can be improved to achieve so-called "fast rates" under additional assumptions.
△ Less
Submitted 11 March, 2022; v1 submitted 6 January, 2022;
originally announced January 2022.
-
Selective Regression Under Fairness Criteria
Authors:
Abhin Shah,
Yuheng Bu,
Joshua Ka-Wing Lee,
Subhro Das,
Rameswar Panda,
Prasanna Sattigeri,
Gregory W. Wornell
Abstract:
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu…
▽ More
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
△ Less
Submitted 14 July, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Uniform Concentration Bounds toward a Unified Framework for Robust Clustering
Authors:
Debolina Paul,
Saptarshi Chakraborty,
Swagatam Das,
Jason Xu
Abstract:
Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal m…
▽ More
Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal manner can result in ad hoc methods, and the limited theoretical results supporting each individual contribution may no longer hold. Toward addressing these issues in a principled way, this paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures. In particular, we present a rigorous theoretical treatment within a Median-of-Means (MoM) estimation framework, showing that it subsumes several popular $k$-means variants. In addition to unifying existing methods, we derive uniform concentration bounds that complete their analyses, and bridge these results to the MoM framework via Dudley's chaining arguments. Importantly, we neither require any assumptions on the distribution of the outlying observations nor on the relative number of observations $n$ to features $p$. We establish strong consistency and an error rate of $O(n^{-1/2})$ under mild conditions, surpassing the best-known results in the literature. The methods are empirically validated thoroughly on real and synthetic datasets.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency
Authors:
Anish Chakrabarty,
Swagatam Das
Abstract:
The introduction of Variational Autoencoders (VAE) has been marked as a breakthrough in the history of representation learning models. Besides having several accolades of its own, VAE has successfully flagged off a series of inventions in the form of its immediate successors. Wasserstein Autoencoder (WAE), being an heir to that realm carries with it all of the goodness and heightened generative pr…
▽ More
The introduction of Variational Autoencoders (VAE) has been marked as a breakthrough in the history of representation learning models. Besides having several accolades of its own, VAE has successfully flagged off a series of inventions in the form of its immediate successors. Wasserstein Autoencoder (WAE), being an heir to that realm carries with it all of the goodness and heightened generative promises, matching even the generative adversarial networks (GANs). Needless to say, recent years have witnessed a remarkable resurgence in statistical analyses of the GANs. Similar examinations for Autoencoders, however, despite their diverse applicability and notable empirical performance, remain largely absent. To close this gap, in this paper, we investigate the statistical properties of WAE. Firstly, we provide statistical guarantees that WAE achieves the target distribution in the latent space, utilizing the Vapnik Chervonenkis (VC) theory. The main result, consequently ensures the regeneration of the input distribution, harnessing the potential offered by Optimal Transport of measures under the Wasserstein metric. This study, in turn, hints at the class of distributions WAE can reconstruct after suffering a compression in the form of a latent law.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
Tuning Confidence Bound for Stochastic Bandits with Bandit Distance
Authors:
Xinyu Zhang,
Srinjoy Das,
Ken Kreutz-Delgado
Abstract:
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance tuning (UCB-DT) formulation enables improved performance as measured by expected regret by preventing the MAB algorithm from focusing on non-optimal bandits which…
▽ More
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance tuning (UCB-DT) formulation enables improved performance as measured by expected regret by preventing the MAB algorithm from focusing on non-optimal bandits which is a well-known deficiency of standard UCB. "Distance tuning" of the standard UCB is done using a proposed distance measure, which we call bandit distance, that is parameterizable and which therefore can be optimized to control the transition rate from exploration to exploitation based on problem requirements. We empirically demonstrate increased performance of UCB-DT versus many existing state-of-the-art methods which use the UCB formulation for the MAB problem. Our contribution also includes the development of a conceptual tool called the "Exploration Bargain Point" which gives insights into the tradeoffs between exploration and exploitation. We argue that the Exploration Bargain Point provides an intuitive perspective that is useful for comparatively analyzing the performance of UCB-based methods.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Kernel distance measures for time series, random fields and other structured data
Authors:
Srinjoy Das,
Hrushikesh Mhaskar,
Alexander Cloninger
Abstract:
This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is construct…
▽ More
This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy (MMD) distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
PAC Mode Estimation using PPR Martingale Confidence Sequences
Authors:
Shubham Anand Jain,
Rohan Shah,
Sanit Gupta,
Denil Mehta,
Inderjeet Jayakumar Nair,
Jian Vora,
Sushil Khyalia,
Sourav Das,
Vinay J. Ribeiro,
Shivaram Kalyanakrishnan
Abstract:
We consider the problem of correctly identifying the \textit{mode} of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is tackled very well by prior-posterio…
▽ More
We consider the problem of correctly identifying the \textit{mode} of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is tackled very well by prior-posterior-ratio (PPR) martingale confidence sequences \citep{waudby-ramdas-ppr}, we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. To begin, we show that the "one-versus-one" principle to generalise from $K = 2$ to $K \geq 2$ classes is more efficient than the "one-versus-rest" alternative. We then prove that our resulting stopping rule, denoted PPR-1v1, is asymptotically optimal (as the mistake probability is taken to $0$). PPR-1v1 is parameter-free and computationally light, and incurs significantly fewer samples than competitors even in the non-asymptotic regime. We demonstrate its gains in two practical applications of sampling: election forecasting and verification of smart contracts in blockchains.
△ Less
Submitted 11 April, 2022; v1 submitted 10 September, 2021;
originally announced September 2021.
-
Querying multiple sets of $p$-values through composed hypothesis testing
Authors:
Tristan Mary-Huard,
Sarmistha Das,
Indranil Mukhopadhyay,
Stéphane Robin
Abstract:
Motivation: Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as sets of p-values resulting from previous analyses, that need to be combined in a flexible way to explore complex hypotheses, while guaranteeing a low proportion of false discoveries…
▽ More
Motivation: Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as sets of p-values resulting from previous analyses, that need to be combined in a flexible way to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. Results: We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task, and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classify the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated on simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. Availability: The QCH methodology is implemented in the qch R package hosted on CRAN.
△ Less
Submitted 1 December, 2021; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Some new ordering results on stochastic comparisons of second largest order statistics from independent and interdependent heterogeneous distributions
Authors:
Sangita Das,
Suchandan Kayal
Abstract:
The second-largest order statistic is of special importance in reliability theory since it represents the time to failure of a $2$-out-of-$n$ system. Consider two $2$-out-of-$n$ systems with heterogeneous random lifetimes. The lifetimes are assumed to follow heterogeneous general exponentiated location-scale models. In this communication, the usual stochastic and reversed hazard rate orders betwee…
▽ More
The second-largest order statistic is of special importance in reliability theory since it represents the time to failure of a $2$-out-of-$n$ system. Consider two $2$-out-of-$n$ systems with heterogeneous random lifetimes. The lifetimes are assumed to follow heterogeneous general exponentiated location-scale models. In this communication, the usual stochastic and reversed hazard rate orders between the systems' lifetimes are established under two cases. For the case of independent random lifetimes, the usual stochastic order and the reversed hazard rate order between the second-largest order statistics are obtained by using the concept of vector majorization and related orders. For the dependent case, the conditions under which the usual stochastic order between the second-largest order statistics holds are investigated. To illustrate the theoretical findings, some special cases of the exponentiated location-scale model are considered.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
On comparison of the second-order statistics from independent and interdependent exponentiated location-scale distributed random variables
Authors:
Sangita Das,
Suchandan Kayal
Abstract:
Consider two batches of independent or interdependent exponentiated location-scale distributed heterogeneous random variables. This article investigates ordering results for the second-order statistics from these batches when a vector of parameters is switched to another vector of parameters in the specified model. Sufficient conditions for the usual stochastic order and the hazard rate order are…
▽ More
Consider two batches of independent or interdependent exponentiated location-scale distributed heterogeneous random variables. This article investigates ordering results for the second-order statistics from these batches when a vector of parameters is switched to another vector of parameters in the specified model. Sufficient conditions for the usual stochastic order and the hazard rate order are derived. Some applications of the established results are presented.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Forecasting Elections from Partial Information Using a Bayesian Model for a Multinomial Sequence of Data
Authors:
Soudeep Deb,
Rishideep Roy,
Shubhabrata Das
Abstract:
Predicting the winner of an election is of importance to multiple stakeholders. To formulate the problem, we consider an independent sequence of categorical data with a finite number of possible outcomes in each. The data is assumed to be observed in batches, each of which is based on a large number of such trials and can be modeled via multinomial distributions. We postulate that the multinomial…
▽ More
Predicting the winner of an election is of importance to multiple stakeholders. To formulate the problem, we consider an independent sequence of categorical data with a finite number of possible outcomes in each. The data is assumed to be observed in batches, each of which is based on a large number of such trials and can be modeled via multinomial distributions. We postulate that the multinomial probabilities of the categories vary randomly depending on batches. The challenge is to predict accurately on cumulative data based on data up to a few batches as early as possible. On the theoretical front, we first derive sufficient conditions of asymptotic normality of the estimates of the multinomial cell probabilities and present corresponding suitable transformations. Then, in a Bayesian framework, we consider hierarchical priors using multivariate normal and inverse Wishart distributions and establish the posterior convergence. The desired inference is arrived at using these results and ensuing Gibbs sampling. The methodology is demonstrated with election data from two different settings -- one from India and the other from the United States of America. Additional insights of the effectiveness of the proposed methodology are attained through a simulation study.
△ Less
Submitted 21 February, 2024; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Robust Principal Component Analysis: A Median of Means Approach
Authors:
Debolina Paul,
Saptarshi Chakraborty,
Swagatam Das
Abstract:
Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philoso…
▽ More
Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.
△ Less
Submitted 20 July, 2023; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm
Authors:
Saptarshi Chakraborty,
Debolina Paul,
Swagatam Das
Abstract:
Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase…
▽ More
Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase and only a handful of features contain useful information about the cluster structure of the data. We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance and thus, extending the merits of mean shift to high-dimensional data. The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity. In addition, the proposed method comes with rigorous theoretical convergence guarantees and a convergence rate of at least a cubic order. The efficacy of our proposal is thoroughly assessed through experimental comparison against baseline and state-of-the-art clustering methods on synthetic as well as real-world datasets.
△ Less
Submitted 10 May, 2021; v1 submitted 20 December, 2020;
originally announced December 2020.
-
Ordering results of extreme order statistics from multiple-outlier scale models with dependence
Authors:
Sangita Das,
Suchandan Kayal
Abstract:
In this paper, we focus on stochastic comparisons of extreme order statistics stemming from multiple-outlier scale models with dependence. Archimedean copula is used to model dependence structure among nonnegative random variables. Sufficient conditions are obtained for comparison of the largest order statistics in the sense of the usual stochastic, reversed hazard rate, star and Lorenz orders. Th…
▽ More
In this paper, we focus on stochastic comparisons of extreme order statistics stemming from multiple-outlier scale models with dependence. Archimedean copula is used to model dependence structure among nonnegative random variables. Sufficient conditions are obtained for comparison of the largest order statistics in the sense of the usual stochastic, reversed hazard rate, star and Lorenz orders. The smallest order statistics are also compared with respect to the usual stochastic, hazard rate, star and Lorenz orders. To illustrate the theoretical establishments, some examples are provided.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
Kernel k-Means, By All Means: Algorithms and Strong Consistency
Authors:
Debolina Paul,
Saptarshi Chakraborty,
Swagatam Das,
Jason Xu
Abstract:
Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions t…
▽ More
Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings. Called Kernel Power $k$-Means, our algorithm makes use of majorization-minimization (MM) to better solve this non-convex problem. We show the method implicitly performs annealing in kernel feature space while retaining efficient, closed-form updates, and we rigorously characterize its convergence properties both from computational and statistical points of view. In particular, we characterize the large sample behavior of the proposed method by establishing strong consistency guarantees. Its merits are thoroughly validated on a suite of simulated datasets and real data benchmarks that feature non-linear and multi-view separation.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Boosting House Price Predictions using Geo-Spatial Network Embedding
Authors:
Sarkar Snigdha Sarathi Das,
Mohammed Eunus Ali,
Yuan-Fang Li,
Yong-Bin Kang,
Timos Sellis
Abstract:
Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict hous…
▽ More
Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict house prices. Perceiving the effect of spatial dependence on house prices, some later works focused on introducing spatial regression models for improving prediction performance. However, they fail to take into account the geo-spatial context of the neighborhood amenities such as how close a house is to a train station, or a highly-ranked school, or a shopping center. Such contextual information may play a vital role in users' interests in a house and thereby has a direct influence on its price. In this paper, we propose to leverage the concept of graph neural networks to capture the geo-spatial context of the neighborhood of a house. In particular, we present a novel method, the Geo-Spatial Network Embedding (GSNE), that learns the embeddings of houses and various types of Points of Interest (POIs) in the form of multipartite networks, where the houses and the POIs are represented as attributed nodes and the relationships between them as edges. Extensive experiments with a large number of regression techniques show that the embeddings produced by our proposed GSNE technique consistently and significantly improve the performance of the house price prediction task regardless of the downstream regression model.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis
Authors:
Sankha Subhra Mullick,
Shounak Datta,
Sourish Gunesh Dhekane,
Swagatam Das
Abstract:
Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing ins…
▽ More
Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set. In light of these conditions, under the effect of class imbalance, we theoretically analyze four indices commonly used for evaluating binary classifiers and five popular indices for multi-class classifiers. For indices violating any of the conditions, we also suggest remedial modification and normalization. We further investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes. Simulation studies are performed on high dimensional deep representations of subset of the ImageNet dataset using four state-of-the-art classifiers tailored for handling class imbalance. Finally, based on our theoretical findings and empirical evidence, we recommend the appropriate indices that should be used to evaluate the performance of classifiers in presence of class-imbalance.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Statistical Mechanical Analysis of Neural Network Pruning
Authors:
Rupam Acharyya,
Ankani Chattoraj,
Boyu Zhang,
Shouman Das,
Daniel Stefankovic
Abstract:
Deep learning architectures with a huge number of parameters are often compressed using pruning techniques to ensure computational efficiency of inference during deployment. Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a…
▽ More
Deep learning architectures with a huge number of parameters are often compressed using pruning techniques to ensure computational efficiency of inference during deployment. Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a teacher-student framework and derive their generalization error (GE) bounds. It has been shown that Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. Using GE bounds in the aforementioned setup we provide theoretical guarantees for their empirical observations. Another consistent finding in literature is that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters. We use our theoretical setup to prove this finding and show that even the baseline random edge pruning method performs better than the DPP node pruning method. We also validate this empirically on real datasets.
△ Less
Submitted 11 June, 2021; v1 submitted 30 June, 2020;
originally announced June 2020.
-
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm
Authors:
Orlando Romero,
Subhro Das,
Pin-Yu Chen,
Sérgio Pequito
Abstract:
Out of the recent advances in systems and control (S\&C)-based analysis of optimization algorithms, not enough work has been specifically dedicated to machine learning (ML) algorithms and its applications. This paper addresses this gap by illustrating how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimizat…
▽ More
Out of the recent advances in systems and control (S\&C)-based analysis of optimization algorithms, not enough work has been specifically dedicated to machine learning (ML) algorithms and its applications. This paper addresses this gap by illustrating how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM). Following first principles from dynamical systems stability theory, conditions for convergence of MAP-EM are developed. Furthermore, if additional assumptions are met, we show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S\&C approach. The convergence guarantees in this paper effectively expand the set of sufficient conditions for EM applications, thereby demonstrating the potential of similar S\&C-based convergence analysis of other ML algorithms.
△ Less
Submitted 12 February, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Distributional Individual Fairness in Clustering
Authors:
Nihesh Anderson,
Suman K. Bera,
Syamantak Das,
Yang Liu
Abstract:
In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population…
▽ More
In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population. However, to the best of our knowledge, the alternative viewpoint of individual fairness, introduced by Dwork et al. (ITCS 2012) in the context of classification, has not been considered for clustering so far. Similar to Dwork et al., we adopt the individual fairness notion which mandates that similar individuals should be treated similarly for clustering problems. We use the notion of $f$-divergence as a measure of statistical similarity that significantly generalizes the ones used by Dwork et al. We introduce a framework for assigning individuals, embedded in a metric space, to probability distributions over a bounded number of cluster centers. The objective is to ensure (a) low cost of clustering in expectation and (b) individuals that are close to each other in a given fairness space are mapped to statistically similar distributions.
We provide an algorithm for clustering with $p$-norm objective ($k$-center, $k$-means are special cases) and individual fairness constraints with provable approximation guarantee. We extend this framework to include both group fairness and individual fairness inside the protected groups. Finally, we observe conditions under which individual fairness implies group fairness. We present extensive experimental evidence that justifies the effectiveness of our approach.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.