Skip to main content

Showing 1–50 of 120 results for author: Das, S

  1. arXiv:2407.04465  [pdf, ps, other

    stat.AP cs.SI physics.data-an

    Learning Patterns from Biological Networks: A Compounded Burr Probability Model

    Authors: Tanujit Chakraborty, Shraddha M. Naik, Swarup Chattopadhyay, Suchismita Das

    Abstract: Complex biological networks, comprising metabolic reactions, gene interactions, and protein interactions, often exhibit scale-free characteristics with power-law degree distributions. However, empirical studies have revealed discrepancies between observed biological network data and ideal power-law fits, highlighting the need for improved modeling approaches. To address this challenge, we propose… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.01631  [pdf, ps, other

    stat.ME math.ST

    Model Identifiability for Bivariate Failure Time Data with Competing Risks: Parametric Cause-specific Hazards and Non-parametric Frailty

    Authors: Biswadeep Ghosh, Anup Dewanji, Sudipta Das

    Abstract: One of the commonly used approaches to capture dependence in multivariate survival data is through the frailty variables. The identifiability issues should be carefully investigated while modeling multivariate survival with or without competing risks. The use of non-parametric frailty distribution(s) is sometimes preferred for its robustness and flexibility properties. In this paper, we consider m… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2405.05773  [pdf, other

    stat.ME stat.AP

    Parametric Analysis of Bivariate Current Status data with Competing risks using Frailty model

    Authors: Biswadeep Ghosh, Anup Dewanji, Sudipta Das

    Abstract: Shared and correlated Gamma frailty models are widely used in the literature to model the association in multivariate current status data. In this paper, we have proposed two other new Gamma frailty models, namely shared cause-specific and correlated cause-specific Gamma frailty to capture association in bivariate current status data with competing risks. We have investigated the identifiability o… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    MSC Class: 62F03; 62F10; 62N02; 62N03

  4. arXiv:2404.11345  [pdf, other

    stat.ME

    Jacobi Prior: An Alternative Bayesian Method for Supervised Learning

    Authors: Sourish Das, Shouvik Sardar

    Abstract: The `Jacobi prior' is an alternative Bayesian method for predictive models. It performs better than well-known methods such as Lasso, Ridge, Elastic Net, and MCMC-based Horse-Shoe Prior, particularly in terms of prediction accuracy and run-time. This method is implemented for Gaussian process classification, adeptly handling a nonlinear decision boundary. The Jacobi prior demonstrates its capabili… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 29 pages, 10 figures

    MSC Class: 62 ACM Class: I.5; I.6

  5. arXiv:2403.08819  [pdf, other

    cs.LG cs.CL stat.ML

    Thermometer: Towards Universal Calibration for Large Language Models

    Authors: Maohao Shen, Subhro Das, Kristjan Greenewald, Prasanna Sattigeri, Gregory Wornell, Soumya Ghosh

    Abstract: We consider the issue of calibration in large language models (LLM). Recent studies have found that common interventions such as instruction tuning often result in poorly calibrated LLMs. Although calibration is well-explored in traditional applications, calibrating LLMs is uniquely challenging. These challenges stem as much from the severe computational requirements of LLMs as from their versatil… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 February, 2024; originally announced March 2024.

    Comments: Camera ready version for ICML 2024

  6. arXiv:2403.00965  [pdf

    stat.AP cs.AI cs.LG

    Binary Gaussian Copula Synthesis: A Novel Data Augmentation Technique to Advance ML-based Clinical Decision Support Systems for Early Prediction of Dialysis Among CKD Patients

    Authors: Hamed Khosravi, Srinjoy Das, Abdullah Al-Mamun, Imtiaz Ahmed

    Abstract: The Center for Disease Control estimates that over 37 million US adults suffer from chronic kidney disease (CKD), yet 9 out of 10 of these individuals are unaware of their condition due to the absence of symptoms in the early stages. It has a significant impact on patients' quality of life, particularly when it progresses to the need for dialysis. Early prediction of dialysis is crucial as it can… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  7. arXiv:2402.06160  [pdf, other

    cs.LG stat.ML

    Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

    Authors: Maohao Shen, J. Jon Ryu, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, Gregory W. Wornell

    Abstract: This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies… ▽ More

    Submitted 12 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 29 pages, 12 figures

  8. arXiv:2401.07344  [pdf, other

    stat.ME q-bio.GN stat.AP

    Robust Genomic Prediction and Heritability Estimation using Density Power Divergence

    Authors: Upama Paul Chowdhury, Susmita Das, Abhik Ghosh

    Abstract: This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the development of advanced robust statistical models tailored for genomic prediction from single nucleotide polymorphism (SNP) data collected from genome-wide asso… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Under Review

  9. arXiv:2312.10469  [pdf, other

    cs.LG stat.ML

    One step closer to unbiased aleatoric uncertainty estimation

    Authors: Wang Zhang, Ziwen Ma, Subhro Das, Tsui-Wei Weng, Alexandre Megretski, Luca Daniel, Lam M. Nguyen

    Abstract: Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address t… ▽ More

    Submitted 20 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

  10. arXiv:2312.09052  [pdf, other

    stat.AP stat.OT

    Applying Pre-Trained Deep-Learning Model on Wrist Angel Data -- An Analysis Plan

    Authors: Harald Vilhelm Skat-Rørdam, Mia Hang Knudsen, Simon Nørby Knudsen, Nicole Nadine Lønfeldt, Sneha Das, Line Katrine Harder Clemmensen

    Abstract: We aim to investigate if we can improve predictions of stress caused by OCD symptoms using pre-trained models, and present our statistical analysis plan in this paper. With the methods presented in this plan, we aim to avoid bias from data knowledge and thereby strengthen our hypotheses and findings. The Wrist Angel study, which this statistical analysis plan concerns, contains data from nine part… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Statistical Analysis Plan, 11 pages

  11. arXiv:2312.06591  [pdf, other

    stat.ML cs.LG

    Concurrent Density Estimation with Wasserstein Autoencoders: Some Statistical Insights

    Authors: Anish Chakrabarty, Arkaprabha Basu, Swagatam Das

    Abstract: Variational Autoencoders (VAEs) have been a pioneering force in the realm of deep generative models. Amongst its legions of progenies, Wasserstein Autoencoders (WAEs) stand out in particular due to the dual offering of heightened generative quality and a strong theoretical backbone. WAEs consist of an encoding and a decoding network forming a bottleneck with the prime objective of generating new s… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  12. arXiv:2311.15384  [pdf, other

    stat.ML cs.LG stat.ME

    Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means

    Authors: Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das

    Abstract: Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  13. arXiv:2311.09591  [pdf

    cs.LG cond-mat.mtrl-sci stat.ML

    Accelerating material discovery with a threshold-driven hybrid acquisition policy-based Bayesian optimization

    Authors: Ahmed Shoyeb Raihan, Hamed Khosravi, Srinjoy Das, Imtiaz Ahmed

    Abstract: Advancements in materials play a crucial role in technological progress. However, the process of discovering and developing materials with desired properties is often impeded by substantial experimental costs, extensive resource utilization, and lengthy development periods. To address these challenges, modern approaches often employ machine learning (ML) techniques such as Bayesian Optimization (B… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  14. arXiv:2309.11739  [pdf

    stat.OT

    Classroom Community amid Covid-19: A Mixed-Methods Study of Undergraduate Students in Introductory Mathematics and Statistics

    Authors: Shira Viel, Maria Tackett, Sarwari Das, Joseph Choo

    Abstract: A strong sense of classroom community is associated with many positive learning outcomes and is a critical contributor to undergraduate students' persistence in STEM, particularly for women and students of color. This chapter describes a mixed-methods investigation into the relationship between classroom community and course attributes in introductory undergraduate mathematics and statistics cours… ▽ More

    Submitted 21 December, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  15. Orderings of extremes among dependent extended Weibull random variables

    Authors: Ramkrishna Jyoti Samanta, Sangita Das, N. Balakrishnan

    Abstract: In this work, we consider two sets of dependent variables $\{X_{1},\ldots,X_{n}\}$ and $\{Y_{1},\ldots,Y_{n}\}$, where $X_{i}\sim EW(α_{i},λ_{i},k_{i})$ and $Y_{i}\sim EW(β_{i},μ_{i},l_{i})$, for $i=1,\ldots, n$, which are coupled by Archimedean copulas having different generators. Also, let $N_{1}$ and $N_{2}$ be two non-negative integer-valued random variables, independent of $X_{i}'$s and… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  16. arXiv:2306.10592  [pdf, other

    stat.ML cs.LG math.FA math.PR

    Conditional expectation using compactification operators

    Authors: Suddhasattwa Das

    Abstract: The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactificatio… ▽ More

    Submitted 8 January, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

    MSC Class: 46E27; 46E22; 62G07; 62G05

    Journal ref: Applied and Computational Harmonic Analysis, 2024

  17. arXiv:2306.10369  [pdf, other

    math.OC eess.SY stat.ML

    Non-asymptotic System Identification for Linear Systems with Nonlinear Policies

    Authors: Yingying Li, Tianpeng Zhang, Subhro Das, Jeff Shamma, Na Li

    Abstract: This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input const… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  18. arXiv:2304.09945  [pdf, other

    stat.CO

    Blocked Gibbs sampler for hierarchical Dirichlet processes

    Authors: Snigdha Das, Yabo Niu, Yang Ni, Bani K. Mallick, Debdeep Pati

    Abstract: Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and is known to have a linear complexity with the sample size. A recently dev… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  19. Using Geographic Location-based Public Health Features in Survival Analysis

    Authors: Navid Seidi, Ardhendu Tripathy, Sajal K. Das

    Abstract: Time elapsed till an event of interest is often modeled using the survival analysis methodology, which estimates a survival score based on the input features. There is a resurgence of interest in developing more accurate prediction models for time-to-event prediction in personalized healthcare using modern tools such as neural networks. Higher quality features and more frequent observations improv… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Journal ref: 2023 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2023, 80-91

  20. arXiv:2303.12385  [pdf, ps, other

    stat.AP stat.ME

    Optimal selection of the starting lineup for a football team

    Authors: Soudeep Deb, Shubhabrata Das

    Abstract: The success of a football team depends on various individual skills and performances of the selected players as well as how cohesively they perform. We propose a two-stage process for selecting optimal playing eleven of a football team from its pool of available players. In the first stage a LASSO-induced modified multinomial logistic regression model is derived to analyse the probabilities of the… ▽ More

    Submitted 12 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  21. arXiv:2303.04808  [pdf

    q-bio.QM cs.LG stat.AP

    Prevalence and Major Risk Factors of Non-communicable Diseases: A Machine Learning based Cross-Sectional Study

    Authors: Mrinmoy Roy, Anica Tasnim Protity, Srabonti Das, Porarthi Dhar

    Abstract: Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was pre… ▽ More

    Submitted 18 May, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: 25 pages, 10 figures, 3 tables

  22. arXiv:2303.00883  [pdf, other

    cs.LG math.OC stat.ML

    Variance-reduced Clipping for Non-convex Optimization

    Authors: Amirhossein Reisizadeh, Haochuan Li, Subhro Das, Ali Jadbabaie

    Abstract: Gradient clipping is a standard training technique used in deep learning applications such as large-scale language modeling to mitigate exploding gradients. Recent experimental studies have demonstrated a fairly special behavior in the smoothness of the training objective along its trajectory when trained with gradient clipping. That is, the smoothness grows with the gradient norm. This is in clea… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  23. arXiv:2302.00781  [pdf, other

    stat.ME stat.AP stat.ML

    Monitoring the risk of a tailings dam collapse through spectral analysis of satellite InSAR time-series data

    Authors: Sourav Das, Anuradha Priyadarshana, Stephen Grebby

    Abstract: Slope failures possess destructive power that can cause significant damage to both life and infrastructure. Monitoring slopes prone to instabilities is therefore critical in mitigating the risk posed by their failure. The purpose of slope monitoring is to detect precursory signs of stability issues, such as changes in the rate of displacement with which a slope is deforming. This information can t… ▽ More

    Submitted 3 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

  24. arXiv:2212.03079  [pdf, other

    stat.ME

    Model-Based and Model-Free point prediction algorithms for locally stationary random fields

    Authors: Srinjoy Das, Yiwen Zhang, Dimitris N. Politis

    Abstract: The Model-free Prediction Principle has been successfully applied to general regression problems, as well as problems involving stationary and locally stationary time series. In this paper we demonstrate how Model-Free Prediction can be applied to handle random fields that are only locally stationary, i.e., they can be assumed to be stationary only across a limited part over their entire region of… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:1712.02383

  25. arXiv:2211.03126  [pdf

    cs.CY stat.AP

    Effective City Planning: A Data Driven Analysis of Infrastructure and Citizen Feedback in Bangalore

    Authors: Srishti Mishra, Srinjoy Das

    Abstract: Leveraging civic data, divided into 3 categories spending, infrastructure and citizen feedback, can present a clear picture of the priorities, performance, and pain-points of a city. Data driven insights highlight the current issues faced by citizens as well as disparity between government spending and quality of work, and can aid in providing effective solutions. City infrastructure; footpaths, l… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 5 pages, Technical Article, Report originally written in 2018

  26. arXiv:2210.04980  [pdf, other

    stat.ME

    Hierarchical Bayes estimation of small area means using statistical linkage of disparate data sources

    Authors: Soumojit Das, Partha Lahiri

    Abstract: We propose a Bayesian approach to estimate finite population means for small areas. The proposed methodology improves on the traditional sample survey methods because, unlike the traditional methods, our proposed method borrows strength from multiple data sources. Our approach is fundamentally different from the existing small area Bayesian approach to the finite population sampling, which typical… ▽ More

    Submitted 30 April, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

  27. Neural Greedy Pursuit for Feature Selection

    Authors: Sandipan Das, Alireza M. Javid, Prakash Borpatra Gohain, Yonina C. Eldar, Saikat Chatterjee

    Abstract: We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. The features are selected one by one sequentially, in an iterative loss minimization procedure. We use neural networks as predictors in the algorithm to compute the loss and hence, we refer to our method as neural greedy pursuit (NGP). NGP is efficient in selecting $N$ featu… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Journal ref: 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1-7

  28. arXiv:2207.00957  [pdf, other

    math.OC cs.LG stat.ML

    On Convergence of Gradient Descent Ascent: A Tight Local Analysis

    Authors: Haochuan Li, Farzan Farnia, Subhro Das, Ali Jadbabaie

    Abstract: Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{\mathbf{x}} \max_{\mathbf{y}} f(\mathbf{x};\mathbf{y})$ where $f$ is strongly-concave in $\mathbf{y}$ and possibly nonconvex in $\mathbf{x}$, (Lin et a… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: Accepted by ICML 2022

  29. arXiv:2203.11103  [pdf, other

    cs.LG stat.ML

    Diverse Counterfactual Explanations for Anomaly Detection in Time Series

    Authors: Deborah Sulem, Michele Donini, Muhammad Bilal Zafar, Francois-Xavier Aubet, Jan Gasthaus, Tim Januschowski, Sanjiv Das, Krishnaram Kenthapadi, Cedric Archambeau

    Abstract: Data-driven methods that detect anomalies in times series data are ubiquitous in practice, but they are in general unable to provide helpful explanations for the predictions they make. In this work we propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates a set of diverse counterfactual examples, i.e, mu… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: 24 pages, 11 figures

  30. arXiv:2202.06416  [pdf, other

    stat.ML cs.LG

    State-of-the-Art Review of Design of Experiments for Physics-Informed Deep Learning

    Authors: Sourav Das, Solomon Tesfamariam

    Abstract: This paper presents a comprehensive review of the design of experiments used in the surrogate models. In particular, this study demonstrates the necessity of the design of experiment schemes for the Physics-Informed Neural Network (PINN), which belongs to the supervised learning class. Many complex partial differential equations (PDEs) do not have any analytical solution; only numerical methods ar… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

  31. arXiv:2201.01973  [pdf, other

    stat.ML cs.LG math.ST

    Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification

    Authors: Saptarshi Chakraborty, Debolina Paul, Swagatam Das

    Abstract: The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusion… ▽ More

    Submitted 11 March, 2022; v1 submitted 6 January, 2022; originally announced January 2022.

  32. arXiv:2110.15403  [pdf, other

    cs.LG stat.ML

    Selective Regression Under Fairness Criteria

    Authors: Abhin Shah, Yuheng Bu, Joshua Ka-Wing Lee, Subhro Das, Rameswar Panda, Prasanna Sattigeri, Gregory W. Wornell

    Abstract: Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu… ▽ More

    Submitted 14 July, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

  33. arXiv:2110.14148  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Uniform Concentration Bounds toward a Unified Framework for Robust Clustering

    Authors: Debolina Paul, Saptarshi Chakraborty, Swagatam Das, Jason Xu

    Abstract: Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal m… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: To appear (spotlight) in the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021

  34. arXiv:2110.03995  [pdf, ps, other

    stat.ML cs.LG

    Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency

    Authors: Anish Chakrabarty, Swagatam Das

    Abstract: The introduction of Variational Autoencoders (VAE) has been marked as a breakthrough in the history of representation learning models. Besides having several accolades of its own, VAE has successfully flagged off a series of inventions in the form of its immediate successors. Wasserstein Autoencoder (WAE), being an heir to that realm carries with it all of the goodness and heightened generative pr… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted for Spotlight Presentation at NeurIPS 2021

  35. arXiv:2110.02690  [pdf, ps, other

    stat.ML cs.LG

    Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

    Authors: Xinyu Zhang, Srinjoy Das, Ken Kreutz-Delgado

    Abstract: We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance tuning (UCB-DT) formulation enables improved performance as measured by expected regret by preventing the MAB algorithm from focusing on non-optimal bandits which… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  36. arXiv:2109.14752  [pdf, other

    stat.ML cs.LG

    Kernel distance measures for time series, random fields and other structured data

    Authors: Srinjoy Das, Hrushikesh Mhaskar, Alexander Cloninger

    Abstract: This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is construct… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

  37. arXiv:2109.05047  [pdf, other

    stat.ME math.ST stat.AP stat.ML

    PAC Mode Estimation using PPR Martingale Confidence Sequences

    Authors: Shubham Anand Jain, Rohan Shah, Sanit Gupta, Denil Mehta, Inderjeet Jayakumar Nair, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, Shivaram Kalyanakrishnan

    Abstract: We consider the problem of correctly identifying the \textit{mode} of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is tackled very well by prior-posterio… ▽ More

    Submitted 11 April, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

  38. Querying multiple sets of $p$-values through composed hypothesis testing

    Authors: Tristan Mary-Huard, Sarmistha Das, Indranil Mukhopadhyay, Stéphane Robin

    Abstract: Motivation: Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as sets of p-values resulting from previous analyses, that need to be combined in a flexible way to explore complex hypotheses, while guaranteeing a low proportion of false discoveries… ▽ More

    Submitted 1 December, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

  39. arXiv:2104.08611  [pdf, ps, other

    math.ST stat.ME

    Some new ordering results on stochastic comparisons of second largest order statistics from independent and interdependent heterogeneous distributions

    Authors: Sangita Das, Suchandan Kayal

    Abstract: The second-largest order statistic is of special importance in reliability theory since it represents the time to failure of a $2$-out-of-$n$ system. Consider two $2$-out-of-$n$ systems with heterogeneous random lifetimes. The lifetimes are assumed to follow heterogeneous general exponentiated location-scale models. In this communication, the usual stochastic and reversed hazard rate orders betwee… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 23 pages, 7 figures

    MSC Class: 60E15; 90B25

  40. arXiv:2104.08525  [pdf, ps, other

    math.ST stat.ME

    On comparison of the second-order statistics from independent and interdependent exponentiated location-scale distributed random variables

    Authors: Sangita Das, Suchandan Kayal

    Abstract: Consider two batches of independent or interdependent exponentiated location-scale distributed heterogeneous random variables. This article investigates ordering results for the second-order statistics from these batches when a vector of parameters is switched to another vector of parameters in the specified model. Sufficient conditions for the usual stochastic order and the hazard rate order are… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 27 pages, 4 figures

    MSC Class: 60E15; 90B25

  41. Forecasting Elections from Partial Information Using a Bayesian Model for a Multinomial Sequence of Data

    Authors: Soudeep Deb, Rishideep Roy, Shubhabrata Das

    Abstract: Predicting the winner of an election is of importance to multiple stakeholders. To formulate the problem, we consider an independent sequence of categorical data with a finite number of possible outcomes in each. The data is assumed to be observed in batches, each of which is based on a large number of such trials and can be modeled via multinomial distributions. We postulate that the multinomial… ▽ More

    Submitted 21 February, 2024; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: 36 pages including a coverpage, 4 figures, 13 tables including 4 in the appendix

    MSC Class: 62H10; 62P99; 65C05 ACM Class: G.3

  42. arXiv:2102.03403  [pdf, other

    stat.ML cs.LG math.ST

    Robust Principal Component Analysis: A Median of Means Approach

    Authors: Debolina Paul, Saptarshi Chakraborty, Swagatam Das

    Abstract: Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philoso… ▽ More

    Submitted 20 July, 2023; v1 submitted 5 February, 2021; originally announced February 2021.

  43. arXiv:2012.10929  [pdf, other

    cs.LG stat.ML

    Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm

    Authors: Saptarshi Chakraborty, Debolina Paul, Swagatam Das

    Abstract: Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase… ▽ More

    Submitted 10 May, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

    Comments: To appear at the 35-th AAAI Conference on Artificial Intelligence, February 2-9, 2021

  44. arXiv:2012.08257  [pdf, other

    math.ST stat.ME stat.OT

    Ordering results of extreme order statistics from multiple-outlier scale models with dependence

    Authors: Sangita Das, Suchandan Kayal

    Abstract: In this paper, we focus on stochastic comparisons of extreme order statistics stemming from multiple-outlier scale models with dependence. Archimedean copula is used to model dependence structure among nonnegative random variables. Sufficient conditions are obtained for comparison of the largest order statistics in the sense of the usual stochastic, reversed hazard rate, star and Lorenz orders. Th… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: 25 pages, 6 figures

    MSC Class: 60E15; 62G30; 60K10

  45. arXiv:2011.06461  [pdf, other

    stat.ML cs.LG

    Kernel k-Means, By All Means: Algorithms and Strong Consistency

    Authors: Debolina Paul, Saptarshi Chakraborty, Swagatam Das, Jason Xu

    Abstract: Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions t… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  46. arXiv:2009.00254  [pdf, ps, other

    cs.LG stat.ML

    Boosting House Price Predictions using Geo-Spatial Network Embedding

    Authors: Sarkar Snigdha Sarathi Das, Mohammed Eunus Ali, Yuan-Fang Li, Yong-Bin Kang, Timos Sellis

    Abstract: Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict hous… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

    Comments: 23 pages, 5 figures, 5 tables

  47. Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis

    Authors: Sankha Subhra Mullick, Shounak Datta, Sourish Gunesh Dhekane, Swagatam Das

    Abstract: Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing ins… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: Published in Pattern Recognition (Elsevier)

    Journal ref: Pattern Recognition, 102, p.107197 (2020)

  48. arXiv:2006.16617  [pdf, other

    cs.LG stat.ML

    Statistical Mechanical Analysis of Neural Network Pruning

    Authors: Rupam Acharyya, Ankani Chattoraj, Boyu Zhang, Shouman Das, Daniel Stefankovic

    Abstract: Deep learning architectures with a huge number of parameters are often compressed using pruning techniques to ensure computational efficiency of inference during deployment. Despite multitude of empirical advances, there is a lack of theoretical understanding of the effectiveness of different pruning methods. We inspect different pruning techniques under the statistical mechanics formulation of a… ▽ More

    Submitted 11 June, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: Authors Ankani Chattoraj and Boyu Zhang made an equal contribution

  49. arXiv:2006.12690  [pdf, other

    cs.LG math.OC stat.ML

    A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm

    Authors: Orlando Romero, Subhro Das, Pin-Yu Chen, Sérgio Pequito

    Abstract: Out of the recent advances in systems and control (S\&C)-based analysis of optimization algorithms, not enough work has been specifically dedicated to machine learning (ML) algorithms and its applications. This paper addresses this gap by illustrating how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimizat… ▽ More

    Submitted 12 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

  50. arXiv:2006.12589  [pdf, other

    cs.LG cs.DS stat.ML

    Distributional Individual Fairness in Clustering

    Authors: Nihesh Anderson, Suman K. Bera, Syamantak Das, Yang Liu

    Abstract: In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.