Skip to main content

All Questions

0 votes
0 answers
22 views

Does probability flow ODE trajectory (in the context of diffusion models) represents a bijective mapping between any distribution to a gaussian? [closed]

I have read several papers about diffusion models in the context of deep learning. especially this one As explained in the paper, by learning the score function $(\nabla \log(p_t(x)))$, probability ...
saleh's user avatar
  • 113
0 votes
0 answers
21 views

Sample complexity bounds of $L_S(h)$

Fix $\mathscr{H} \subset \mathscr{Y}^\mathscr{X}$ and a loss $\ell : \hat{Y} \times Y \to [0,1]$. Fix $S \in (\mathscr{X} \times \mathscr{Y})^{2m}$. Assume for now that $S$ is not random. Suppose we ...
isaac's user avatar
  • 41
0 votes
0 answers
25 views

Harmonizing Classification and Regression [closed]

I have recently been encountering explanations of classification and regression which start with discrete label values as defining the former and continuous label values as defining the latter. I have ...
user10478's user avatar
  • 1,912
1 vote
0 answers
67 views

Relation between values of $ξ_i$ and $\alpha_i$ in SVM?

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning&...
hasanghaforian's user avatar
0 votes
0 answers
10 views

Paired bootstrap test p-value formula in binary classification

Background For a binary classification task, let $M(A, Z)$ denote an evaluation metric, such as accuracy, for classifier $A$ and test examples $Z.$ Then, let $$ \delta(Z) = M(A, Z) - M(B, Z) $$ denote ...
sunspots's user avatar
  • 802
0 votes
0 answers
39 views

least squares minimum test error solution

assume we want to learn a model $y=x^T \beta + \varepsilon $ where $\beta \in \mathbb{R}^d$ is constant $ x \in \mathbb{R}^d$ is the input vector with Gaussian distribution $\mathcal{N}(0,\Sigma_x)$ ...
Elad Elmakias's user avatar
2 votes
0 answers
20 views

Would like to validate whether the AUC equation is correct or not

I found a paper "Chapi, Kamran, et al. "A novel hybrid artificial intelligence approach for flood susceptibility assessment." Environmental modelling & software 95 (2017): 229-245&...
Simon's user avatar
  • 95
0 votes
1 answer
16 views

Understanding the Reasoning Behind the Growth Function $m_{\mathcal{H}}(N)=2^N$ for Convex Sets

I am currently reading Learning from Data by Abu-Mostafa et al. and I am struggling to understand the reasoning behind the growth function $m_{\mathcal{H}}(N)=2^N$ for convex sets. Here is the ...
bruno's user avatar
  • 425
0 votes
1 answer
36 views

Estimating the conditional entropy of a discrete variable conditioning on continuous variable

I am doing a machine learning project and I am trying to select the best features by computing their mutual information and select the ones with the highest information gain. I was looking at this ...
Ishigami's user avatar
  • 1,655
0 votes
0 answers
32 views

How to Upper Bound the Spectral Norm of $\left(XX^T\right)^{-1}\left(XX^T\right)^{-1}X$?

I have an observation matrix $ X \in \mathbb{R}^{n \times n}$. Considering $XX^T$, this matrix can be seen as a correlation matrix between individuals, so it generally has elements close to the ...
Tool's user avatar
  • 1
1 vote
1 answer
41 views

How to expand the double integral in variational objective function?

I am reading John Paisley's lecture note on variational inference. In lecture 6 p.3, he wrote the objective function as follows: Latex: $$ \mathcal{L}(a', b', \mu', \Sigma') = \int_{0}^{\infty} \int_{...
doraemon's user avatar
  • 135
0 votes
0 answers
21 views

How to understand likelihood function bayesian

$\mathcal{N}(W^T \cdot X, \beta^{-1})$ This is the likelihood distribution for Bayesian linear regression, right? So, the thing is, if I'm doing batch mode Bayesian regression, then: Weights (W): Size:...
Need_MathHelp's user avatar
2 votes
1 answer
34 views

How to derive likelihood function

I have been struggling a lot with the concept of likelihood and I'd really appreciate it if someone could verify if my understanding is correct and give input. If I understand this correcly, we pick ...
Need_MathHelp's user avatar
0 votes
0 answers
22 views

Bayesian linear regression about finding the likelihood

Pick a single data point $(x,t)$ and calculate and plot the likelihood for this single data point across all $w$ in your parameter space $(w_0 \times w_1)$ (for a single data point it is a univariate ...
User's user avatar
  • 59
1 vote
0 answers
36 views

Bayes classifiers with cost of misclassification

A minimum ECM classifier disciminate the features $\underline{x}$ to belong to class $t$ ($\delta(\underline{x}) = t$) if $\forall j \ne t$: $$\sum_{k\ne t} c(t|k) f_k(\underline{x})p_k \le \sum_{k\ne ...
BiasedBayes's user avatar
2 votes
1 answer
39 views

Bayesian Inference Intractability

When looking at Bayesian posteriors $$ p(z \mid x) = \frac{p(x \mid z)p(z)}{\int p(x \mid z')p(z')dz'} $$ The denominator commonly intractable. I understand this is due to the possibility of high ...
Lehmann's user avatar
  • 331
5 votes
1 answer
209 views

Rigorous Mathematical foundations of Machine Learning / Deep Learning / Neural Networks

I am an Engineering Graduate (with a strong background in Probability/Measure Theory, Linear Algebra and Calculus) wanting to dig deep into Deep Learning and Neural Networks, and I'm looking for ...
Michel H's user avatar
  • 322
1 vote
1 answer
54 views

How to visualize conditional maximum likelihood estimation?

In Probabilistic Machine Learning (Murphy, 2022, p. 8) I'm stuck in this part: 1.2.1.6 Maximum likelihood estimation When fitting probabilistic models, it is common to use the negative log ...
filip augusto's user avatar
0 votes
0 answers
18 views

Express the regularized weight in ridge regression in terms of the linear regression solution .

We would like to minimize the quantity $E_{in}(\vec{w})=\frac{1}{N}\sum_{i=1}^N(\vec{w}^{T}\vec{x_n}-y_n)^2$ under the constraint $\vec{w}^T\Gamma^T\Gamma\vec{w}\leq C$ where $\Gamma$ is a matrix, $C$ ...
lcthaha's user avatar
0 votes
0 answers
31 views

Expected squared Error (bagging)

I'm studying from a Deep Learning book (Ian Goodfellow et al). At page 256 the text explains that, considering a set of $k$ regression models, each produces an error $ϵ_i$​ for every example, drawn ...
Federico Mondaini's user avatar
2 votes
1 answer
59 views

Expected value and variance of Sigmoid and SiLU on a normally distributed random variable for variational approximation

I am trying to apply Assumed Density Filtering (ADF) according to the paper Lightweight Probabilistic Deep Networks to my own model, and I need to implement the variational approximation layer of ...
Mr Amoeba's user avatar
0 votes
1 answer
23 views

Logistic map type function with controlled steepness on either side

I am looking for a function that must have the following requirements: $f(1) = f(-1) = 0$ $f(x) > 0, \forall x \in (-1,1)$ $f$ is differentiable. Additionally, I would like it to be ...
arao's user avatar
  • 3
0 votes
1 answer
33 views

Maximum Likelihood - Information Matrix Identity Derivation

I try to derive the information matrix equality for the Poisson distribution with the log-Likelihood: $$\mathcal{L}(\lambda; x_1, x_2, \ldots, x_n) = \sum_{i=1}^{n} \left[-\lambda + x_i \log(\lambda) -...
Marlon Brando's user avatar
1 vote
0 answers
72 views

Is every convex cone a subset of a half space?

I have come across a proof for this statement(link to paper at end), which however I do not understand. It makes use of Lemma 5.5, which I've also included. Lemma 1. The interior of the complement of ...
user1141170's user avatar
3 votes
0 answers
68 views

A self-proof of Vapnik - Chervonenkis theorem

Theorem: For every $\varepsilon >0$, with the probability greater than $1-\varepsilon$ \begin{align*} R_p(\hat{g}_{n,\mathcal{G}}) - R_{p}(g^*_{p,\mathcal{G}}) \le 2 \sqrt{\dfrac{2V_{\mathcal{G}...
Eto's user avatar
  • 57
0 votes
0 answers
9 views

Methods for Efficient Feature Aggregation Maintaining Prediction Accuracy

Given a high-dimensional dataset X containing potentially redundant features, how can we efficiently aggregate and/or select features to achieve accurate prediction of target variable Y while reducing ...
Mohit Anand's user avatar
0 votes
1 answer
34 views

Why does a shift in the probability distribution over a label space naturally trigger a shift in the distribution over input space?

Can anyone explain this statement? "Firstly, let’s define the input space as X (sensory observations) and the label space as Y (semantic categories). The data distribution is represented by the ...
StackExchanger's user avatar
0 votes
0 answers
45 views

When to use chi square law for confidence intervals with mahalanobis distance?

So right now i'm reading this paper: Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentation, available here: https://arxiv.org/abs/2208.03217 In brief, ...
levo-str's user avatar
0 votes
1 answer
67 views

Understanding Friedman’s H-statistic

In "Interpretable Machine Learning: A Guide For Making Black Box Models Explainable", I found the following for Friedman's H-statistic: $$PD_{jk}(x_j, x_k) = PD_j (x_j) + PD_k (x_k),$$ where ...
SineOfTheTimes's user avatar
0 votes
0 answers
19 views

Gaussian Kernel outputs a 2*m feature map?

I am currently writing my masters thesis on the Double Descent Curve in Neural Networks and as I was doing some research, I came across the paper "On the Double Descent of Random Features Models ...
Silvio's user avatar
  • 21
0 votes
0 answers
116 views

Posterior probabilities in a GMM

This is a statistics/probability question formulated in the context of machine learning (problem 6.17 in Bishop's 'Deep Learning' book). We are modelling the conditional distribution $p(\mathbf{t}|\...
Mat Dyl's user avatar
  • 511
1 vote
0 answers
37 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{X}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
rick's user avatar
  • 41
0 votes
0 answers
68 views

expectation of the product of Gaussian kernels and their input

I was wondering if anybody knows how to solve: $$\mathbb{E}{\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})}\left[ (\mathbf{x}{i} - \mathbf{z})(\mathbf{x}{j} - \mathbf{z})^{\top} \exp\left( - (\...
wsz_fantasy's user avatar
  • 1,690
0 votes
0 answers
61 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
Tanishq Kumar's user avatar
1 vote
1 answer
61 views

Interpreting a concentration inequality

In the following paper I am slightly confused about the way they use a concentration inequality derived in Lemma A1. In Lemma A1, under the assumption that $(n ,p)$ satisfies $\log p/n^{1/4} \to 0$ as ...
WeakLearner's user avatar
  • 6,096
1 vote
0 answers
50 views

OLS and Conditional Expectation Assumption

In OLS we assume that Given the model : $Y|X = F(X) + U|X $ Where U is the residuals ,we then ASSUME $E(U|X) = 0$ in order to have $prediction = F(X) = E(Y|X)$ . So the $E(U|X) = 0$ is an assumption ...
dbzadnen khiari's user avatar
0 votes
0 answers
46 views

Loss Equation for Training DPMs vs DDPMs

I'm currently trying to wrap my head around the training loss functions for DPMs and how they vary from DDPMs, however there are differences in how the papers describe the processes, making it ...
Tomas Premoli Muniagurria's user avatar
0 votes
0 answers
50 views

right inverse of a linear, bounded, nonnegative, self-adjoint and trace-class operator on closed subspace of a separable Hilbert space

This question is related to Corollary 3 of the paper: Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces by Kenji Fukumizu, et al. Basicly they first defined the ...
allen i's user avatar
  • 361
0 votes
0 answers
22 views

How to prove the relation involving difference between value functions of two different policies and the sum of advantage function over time?

In reinforcement learning, how do you prove the following relation between the difference in value functions of two policies? The value function $V^\pi(s)$ represents the expected cumulative reward ...
Michael's user avatar
0 votes
0 answers
23 views

Does marginal likelihood on the training set always weakly increase for GPs when adding new features, irrespective of the kernel/hyperparams?

Ive recently been introducing myself to Gaussian Processes. In Bayesian linear regression, one would expect that when adding new features, the likelihood on the training set would weakly increase due ...
Qazaz's user avatar
  • 41
0 votes
0 answers
9 views

What determins the length of the length of confidence interval for mean response

Considering a simple OLS predictor $\hat\beta = (X^TX)^{-1}X^TY$, where $X$ is the design matrix and $Y$ is the response. Given a new observation's covariate $x$, I can estimate the mean response ...
maskeran's user avatar
  • 573
0 votes
0 answers
22 views

Bias and Variance Decomposition to find an optimal tuning parameter - Population vs Empirical

I have a question regarding to this bias and variance decomposition in The Elements of Statistical Learning. In chapter 7.2, it mentioned $\operatorname{Err}\left(x_0\right)=$ $$E\left[\left(Y-\hat{f}\...
maskeran's user avatar
  • 573
0 votes
0 answers
26 views

Generic Chaining/Dudley's integral for supremum of average of indicator random variables in sup norm metric space?

I want to use generic chaining/ Dudley's integral to bind the below stochastic process \mathbb E\sup_{tin T}\frac{1}{n}\sum_{i=1}^nX_{i,t} where X_{i,t} takes value either 1 or 0 (binary random ...
Ankita Ghosh's user avatar
2 votes
1 answer
50 views

What is the Rademacher complexity of kernelbased hypotheses with offset?

Let $\mathcal{H}=\{ x\rightarrow \langle \mathbf{w},\Phi(x)\rangle+b : \| \mathbf{w}\|_{\mathbb{H}} \le \Lambda, b\in \mathbb{R}\}$ be a function family, where $\Phi$ is a feature mapping, and $\...
Harry's user avatar
  • 699
0 votes
0 answers
21 views

Relation between VC-dimension and pseudeo-dimension

I am thinking about the relation between the VC-dimension and the pseudo-dimension, and confused about them. Let $H$ be a family of real-valued functions. We can define a function $c(h,t):x\rightarrow ...
Harry's user avatar
  • 699
0 votes
0 answers
16 views

Does Gaussian Process solve specified distribution drift problem?

When I was reading about this lecture, the concept of posterior predictive distribution is introduced as the following $$ P(Y \mid D, X)=\int_{\mathbf{w}} P(Y, \mathbf{w} \mid D, X) d \mathbf{w}=\int_{...
maskeran's user avatar
  • 573
1 vote
0 answers
367 views

What is the correct formula for Within Cluster Sum of Squares

I am studying clustering with K-Means algorithm and I got stumbled in the "inertia", or "within cluster sum of squares" part. First I would appreciate if anyone could explain me ...
Artur Juan Dantas's user avatar
1 vote
1 answer
187 views

Proof that $KL(p||q) =0 \iff p(x) = q(x)$

I am studying Machine Learning, and I came across a proof in section 1.61 in Bishop's textbook Pattern Recognition & Machine Learning which I couldn't quite understand. The claim is that $KL(p||q) ...
Nikolay's user avatar
  • 13
0 votes
1 answer
46 views

Interpreting notation for binary loss function

I am taking a advanced machine learning class, and in the class notes I noticed a notation that I did not recognize. It is the notation for binary loss function (Please note that it is not binary ...
Redwanul Sourav's user avatar
0 votes
0 answers
36 views

Show bagging helps under squared-error loss

This question is about chapter 8.7 Bagging from Element of Statistical Learning (ESL) textbook. Assume our training observations $\left(x_i, y_i\right), i=1, \ldots, N$ are independently drawn from a ...
maskeran's user avatar
  • 573

15 30 50 per page
1
2 3 4 5
14