Newest 'statistics+machine-learning' Questions

0 votes

0 answers

22 views

Does probability flow ODE trajectory (in the context of diffusion models) represents a bijective mapping between any distribution to a gaussian? [closed]

I have read several papers about diffusion models in the context of deep learning. especially this one As explained in the paper, by learning the score function $(\nabla \log(p_t(x)))$, probability ...

saleh

113

asked Jul 5 at 7:54

0 votes

0 answers

21 views

Sample complexity bounds of $L_S(h)$

Fix $\mathscr{H} \subset \mathscr{Y}^\mathscr{X}$ and a loss $\ell : \hat{Y} \times Y \to [0,1]$. Fix $S \in (\mathscr{X} \times \mathscr{Y})^{2m}$. Assume for now that $S$ is not random. Suppose we ...

isaac

41

asked Jul 2 at 19:42

0 votes

0 answers

25 views

Harmonizing Classification and Regression [closed]

I have recently been encountering explanations of classification and regression which start with discrete label values as defining the former and continuous label values as defining the latter. I have ...

user10478

1,912

asked Jun 29 at 4:07

1 vote

0 answers

67 views

Relation between values of $ξ_i$ and $\alpha_i$ in SVM?

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning&...

hasanghaforian

485

asked Jun 24 at 18:26

0 votes

0 answers

10 views

Paired bootstrap test p-value formula in binary classification

Background For a binary classification task, let $M(A, Z)$ denote an evaluation metric, such as accuracy, for classifier $A$ and test examples $Z.$ Then, let $$ \delta(Z) = M(A, Z) - M(B, Z) $$ denote ...

sunspots

802

asked Jun 23 at 19:50

0 votes

0 answers

39 views

least squares minimum test error solution

assume we want to learn a model $y=x^T \beta + \varepsilon $ where $\beta \in \mathbb{R}^d$ is constant $ x \in \mathbb{R}^d$ is the input vector with Gaussian distribution $\mathcal{N}(0,\Sigma_x)$ ...

Elad Elmakias

165

asked Jun 22 at 17:12

2 votes

0 answers

20 views

Would like to validate whether the AUC equation is correct or not

I found a paper "Chapi, Kamran, et al. "A novel hybrid artificial intelligence approach for flood susceptibility assessment." Environmental modelling & software 95 (2017): 229-245&...

Simon

95

asked Jun 7 at 3:37

0 votes

1 answer

16 views

Understanding the Reasoning Behind the Growth Function $m_{\mathcal{H}}(N)=2^N$ for Convex Sets

I am currently reading Learning from Data by Abu-Mostafa et al. and I am struggling to understand the reasoning behind the growth function $m_{\mathcal{H}}(N)=2^N$ for convex sets. Here is the ...

bruno

425

asked May 30 at 10:53

0 votes

1 answer

36 views

Estimating the conditional entropy of a discrete variable conditioning on continuous variable

I am doing a machine learning project and I am trying to select the best features by computing their mutual information and select the ones with the highest information gain. I was looking at this ...

Ishigami

1,655

asked May 24 at 17:59

0 votes

0 answers

32 views

How to Upper Bound the Spectral Norm of $\left(XX^T\right)^{-1}\left(XX^T\right)^{-1}X$?

I have an observation matrix $ X \in \mathbb{R}^{n \times n}$. Considering $XX^T$, this matrix can be seen as a correlation matrix between individuals, so it generally has elements close to the ...

Tool

1

asked May 16 at 3:06

1 vote

1 answer

41 views

How to expand the double integral in variational objective function?

I am reading John Paisley's lecture note on variational inference. In lecture 6 p.3, he wrote the objective function as follows: Latex: $$ \mathcal{L}(a', b', \mu', \Sigma') = \int_{0}^{\infty} \int_{...

doraemon

135

asked May 14 at 2:19

0 votes

0 answers

21 views

How to understand likelihood function bayesian

$\mathcal{N}(W^T \cdot X, \beta^{-1})$ This is the likelihood distribution for Bayesian linear regression, right? So, the thing is, if I'm doing batch mode Bayesian regression, then: Weights (W): Size:...

Need_MathHelp

1,237

asked May 12 at 12:03

2 votes

1 answer

34 views

How to derive likelihood function

I have been struggling a lot with the concept of likelihood and I'd really appreciate it if someone could verify if my understanding is correct and give input. If I understand this correcly, we pick ...

Need_MathHelp

1,237

asked May 11 at 13:43

0 votes

0 answers

22 views

Bayesian linear regression about finding the likelihood

Pick a single data point $(x,t)$ and calculate and plot the likelihood for this single data point across all $w$ in your parameter space $(w_0 \times w_1)$ (for a single data point it is a univariate ...

User

59

asked May 9 at 19:55

1 vote

0 answers

36 views

Bayes classifiers with cost of misclassification

A minimum ECM classifier disciminate the features $\underline{x}$ to belong to class $t$ ($\delta(\underline{x}) = t$) if $\forall j \ne t$: $$\sum_{k\ne t} c(t|k) f_k(\underline{x})p_k \le \sum_{k\ne ...

BiasedBayes

11

asked Apr 30 at 13:18

2 votes

1 answer

39 views

Bayesian Inference Intractability

When looking at Bayesian posteriors $$ p(z \mid x) = \frac{p(x \mid z)p(z)}{\int p(x \mid z')p(z')dz'} $$ The denominator commonly intractable. I understand this is due to the possibility of high ...

Lehmann

331

asked Apr 14 at 18:03

5 votes

1 answer

209 views

Rigorous Mathematical foundations of Machine Learning / Deep Learning / Neural Networks

I am an Engineering Graduate (with a strong background in Probability/Measure Theory, Linear Algebra and Calculus) wanting to dig deep into Deep Learning and Neural Networks, and I'm looking for ...

Michel H

322

asked Apr 10 at 7:26

1 vote

1 answer

54 views

How to visualize conditional maximum likelihood estimation?

In Probabilistic Machine Learning (Murphy, 2022, p. 8) I'm stuck in this part: 1.2.1.6 Maximum likelihood estimation When fitting probabilistic models, it is common to use the negative log ...

filip augusto

121

asked Apr 9 at 2:59

0 votes

0 answers

18 views

Express the regularized weight in ridge regression in terms of the linear regression solution .

We would like to minimize the quantity $E_{in}(\vec{w})=\frac{1}{N}\sum_{i=1}^N(\vec{w}^{T}\vec{x_n}-y_n)^2$ under the constraint $\vec{w}^T\Gamma^T\Gamma\vec{w}\leq C$ where $\Gamma$ is a matrix, $C$ ...

lcthaha

1

asked Mar 30 at 17:01

0 votes

0 answers

31 views

Expected squared Error (bagging)

I'm studying from a Deep Learning book (Ian Goodfellow et al). At page 256 the text explains that, considering a set of $k$ regression models, each produces an error $ϵ_i$ for every example, drawn ...

Federico Mondaini

49

asked Mar 29 at 11:43

2 votes

1 answer

59 views

Expected value and variance of Sigmoid and SiLU on a normally distributed random variable for variational approximation

I am trying to apply Assumed Density Filtering (ADF) according to the paper Lightweight Probabilistic Deep Networks to my own model, and I need to implement the variational approximation layer of ...

Mr Amoeba

23

asked Mar 19 at 3:53

0 votes

1 answer

23 views

Logistic map type function with controlled steepness on either side

I am looking for a function that must have the following requirements: $f(1) = f(-1) = 0$ $f(x) > 0, \forall x \in (-1,1)$ $f$ is differentiable. Additionally, I would like it to be ...

arao

3

asked Mar 11 at 15:33

0 votes

1 answer

33 views

Maximum Likelihood - Information Matrix Identity Derivation

I try to derive the information matrix equality for the Poisson distribution with the log-Likelihood: $$\mathcal{L}(\lambda; x_1, x_2, \ldots, x_n) = \sum_{i=1}^{n} \left[-\lambda + x_i \log(\lambda) -...

Marlon Brando

213

asked Mar 7 at 8:41

1 vote

0 answers

72 views

Is every convex cone a subset of a half space?

I have come across a proof for this statement(link to paper at end), which however I do not understand. It makes use of Lemma 5.5, which I've also included. Lemma 1. The interior of the complement of ...

user1141170

11

asked Feb 28 at 14:38

3 votes

0 answers

68 views

A self-proof of Vapnik - Chervonenkis theorem

Theorem: For every $\varepsilon >0$, with the probability greater than $1-\varepsilon$ \begin{align*} R_p(\hat{g}_{n,\mathcal{G}}) - R_{p}(g^*_{p,\mathcal{G}}) \le 2 \sqrt{\dfrac{2V_{\mathcal{G}...

Eto

57

asked Feb 25 at 23:07

0 votes

0 answers

9 views

Methods for Efficient Feature Aggregation Maintaining Prediction Accuracy

Given a high-dimensional dataset X containing potentially redundant features, how can we efficiently aggregate and/or select features to achieve accurate prediction of target variable Y while reducing ...

Mohit Anand

1

asked Feb 23 at 15:35

0 votes

1 answer

34 views

Why does a shift in the probability distribution over a label space naturally trigger a shift in the distribution over input space?

Can anyone explain this statement? "Firstly, let’s define the input space as X (sensory observations) and the label space as Y (semantic categories). The data distribution is represented by the ...

StackExchanger

1

asked Feb 22 at 21:00

0 votes

0 answers

45 views

When to use chi square law for confidence intervals with mahalanobis distance?

So right now i'm reading this paper: Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentation, available here: https://arxiv.org/abs/2208.03217 In brief, ...

levo-str

1

asked Feb 22 at 15:32

0 votes

1 answer

67 views

Understanding Friedman’s H-statistic

In "Interpretable Machine Learning: A Guide For Making Black Box Models Explainable", I found the following for Friedman's H-statistic: $$PD_{jk}(x_j, x_k) = PD_j (x_j) + PD_k (x_k),$$ where ...

SineOfTheTimes

51

asked Feb 11 at 21:05

0 votes

0 answers

19 views

Gaussian Kernel outputs a 2*m feature map?

I am currently writing my masters thesis on the Double Descent Curve in Neural Networks and as I was doing some research, I came across the paper "On the Double Descent of Random Features Models ...

Silvio

21

asked Feb 11 at 11:49

0 votes

0 answers

116 views

Posterior probabilities in a GMM

This is a statistics/probability question formulated in the context of machine learning (problem 6.17 in Bishop's 'Deep Learning' book). We are modelling the conditional distribution $p(\mathbf{t}|\...

Mat Dyl

511

asked Feb 1 at 18:24

1 vote

0 answers

37 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{X}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...

rick

41

asked Jan 24 at 2:34

0 votes

0 answers

68 views

expectation of the product of Gaussian kernels and their input

I was wondering if anybody knows how to solve: $$\mathbb{E}{\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})}\left[ (\mathbf{x}{i} - \mathbf{z})(\mathbf{x}{j} - \mathbf{z})^{\top} \exp\left( - (\...

wsz_fantasy

1,690

asked Jan 9 at 22:14

0 votes

0 answers

61 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...

Tanishq Kumar

681

asked Dec 31, 2023 at 7:43

1 vote

1 answer

61 views

Interpreting a concentration inequality

In the following paper I am slightly confused about the way they use a concentration inequality derived in Lemma A1. In Lemma A1, under the assumption that $(n ,p)$ satisfies $\log p/n^{1/4} \to 0$ as ...

WeakLearner

6,096

asked Dec 6, 2023 at 18:44

1 vote

0 answers

50 views

OLS and Conditional Expectation Assumption

dbzadnen khiari

11

asked Dec 2, 2023 at 12:41

0 votes

0 answers

46 views

Loss Equation for Training DPMs vs DDPMs

I'm currently trying to wrap my head around the training loss functions for DPMs and how they vary from DDPMs, however there are differences in how the papers describe the processes, making it ...

Tomas Premoli Muniagurria

55

asked Nov 23, 2023 at 17:33

0 votes

0 answers

50 views

right inverse of a linear, bounded, nonnegative, self-adjoint and trace-class operator on closed subspace of a separable Hilbert space

This question is related to Corollary 3 of the paper: Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces by Kenji Fukumizu, et al. Basicly they first defined the ...

allen i

361

asked Nov 17, 2023 at 4:02

0 votes

0 answers

22 views

How to prove the relation involving difference between value functions of two different policies and the sum of advantage function over time?

In reinforcement learning, how do you prove the following relation between the difference in value functions of two policies? The value function $V^\pi(s)$ represents the expected cumulative reward ...

Michael

1

asked Nov 13, 2023 at 16:04

0 votes

0 answers

23 views

Does marginal likelihood on the training set always weakly increase for GPs when adding new features, irrespective of the kernel/hyperparams?

Ive recently been introducing myself to Gaussian Processes. In Bayesian linear regression, one would expect that when adding new features, the likelihood on the training set would weakly increase due ...

Qazaz

41

asked Nov 3, 2023 at 16:23

0 votes

0 answers

9 views

What determins the length of the length of confidence interval for mean response

Considering a simple OLS predictor $\hat\beta = (X^TX)^{-1}X^TY$, where $X$ is the design matrix and $Y$ is the response. Given a new observation's covariate $x$, I can estimate the mean response ...

maskeran

573

asked Oct 31, 2023 at 20:00

0 votes

0 answers

22 views

Bias and Variance Decomposition to find an optimal tuning parameter - Population vs Empirical

I have a question regarding to this bias and variance decomposition in The Elements of Statistical Learning. In chapter 7.2, it mentioned $\operatorname{Err}\left(x_0\right)=$ $$E\left[\left(Y-\hat{f}\...

maskeran

573

asked Oct 31, 2023 at 18:16

0 votes

0 answers

26 views

Generic Chaining/Dudley's integral for supremum of average of indicator random variables in sup norm metric space?

I want to use generic chaining/ Dudley's integral to bind the below stochastic process \mathbb E\sup_{tin T}\frac{1}{n}\sum_{i=1}^nX_{i,t} where X_{i,t} takes value either 1 or 0 (binary random ...

Ankita Ghosh

1

asked Oct 31, 2023 at 17:22

2 votes

1 answer

50 views

What is the Rademacher complexity of kernelbased hypotheses with offset?

Let $\mathcal{H}=\{ x\rightarrow \langle \mathbf{w},\Phi(x)\rangle+b : \| \mathbf{w}\|_{\mathbb{H}} \le \Lambda, b\in \mathbb{R}\}$ be a function family, where $\Phi$ is a feature mapping, and $\...

Harry

699

asked Oct 17, 2023 at 11:03

0 votes

0 answers

21 views

Relation between VC-dimension and pseudeo-dimension

I am thinking about the relation between the VC-dimension and the pseudo-dimension, and confused about them. Let $H$ be a family of real-valued functions. We can define a function $c(h,t):x\rightarrow ...

Harry

699

asked Oct 17, 2023 at 7:05

0 votes

0 answers

16 views

Does Gaussian Process solve specified distribution drift problem?

When I was reading about this lecture, the concept of posterior predictive distribution is introduced as the following $$ P(Y \mid D, X)=\int_{\mathbf{w}} P(Y, \mathbf{w} \mid D, X) d \mathbf{w}=\int_{...

maskeran

573

asked Oct 16, 2023 at 17:03

1 vote

0 answers

367 views

What is the correct formula for Within Cluster Sum of Squares

I am studying clustering with K-Means algorithm and I got stumbled in the "inertia", or "within cluster sum of squares" part. First I would appreciate if anyone could explain me ...

Artur Juan Dantas

11

asked Oct 12, 2023 at 16:20

1 vote

1 answer

187 views

Proof that $KL(p||q) =0 \iff p(x) = q(x)$

I am studying Machine Learning, and I came across a proof in section 1.61 in Bishop's textbook Pattern Recognition & Machine Learning which I couldn't quite understand. The claim is that $KL(p||q) ...

Nikolay

13

asked Oct 4, 2023 at 17:40

0 votes

1 answer

46 views

Interpreting notation for binary loss function

I am taking a advanced machine learning class, and in the class notes I noticed a notation that I did not recognize. It is the notation for binary loss function (Please note that it is not binary ...

Redwanul Sourav

151

asked Oct 2, 2023 at 19:11

0 votes

0 answers

36 views

Show bagging helps under squared-error loss

This question is about chapter 8.7 Bagging from Element of Statistical Learning (ESL) textbook. Assume our training observations $\left(x_i, y_i\right), i=1, \ldots, N$ are independently drawn from a ...

maskeran

573

asked Oct 2, 2023 at 18:54

All Questions

Related Tags