Newest 'statistics+machine-learning' Questions - Page 3

1 vote

0 answers

25 views

Bayesian classifier and linear regression on dummy variables

EDIT: I am finally not so sure about one thing: they say regression but not linear regression. I may have misunderstood the whole paragraph. In the book Elements of Statistical Learning, (Hastie-...

Plop

2,719

asked Oct 7, 2022 at 13:45

3 votes

0 answers

98 views

Normalizing Flow Penalization

I am looking to fit a normalizing flow, specifically a Masked Autoregressive Flow model. However, this model leads to high variance on lower dimensional, less complex data. I am using a neural network ...

user2793618

49

asked Oct 4, 2022 at 3:29

1 vote

0 answers

13 views

How is this R squared calculated in context to clusteiring?

I was reading the paper "Consistent Individualized Feature Attribution for Tree Ensembles" by Scott Lundberg et al and cannot understand how the calculation for the $R^2$ works here - see ...

Penguines

11

asked Sep 29, 2022 at 20:23

0 votes

2 answers

74 views

How can I perform polynomial regression with this dataset?

I have the training set $\{(0,0), (1,1), (2,1), (1,2)\}$. I want to find the best quadratic polynomial of the form $f(x) = a + bx + cx^2$ that minimizes the sum of squared error between $y$ and the ...

jem do

185

asked Sep 26, 2022 at 22:18

1 vote

1 answer

125 views

Binary classification problem

This problem is in the context of binary classification. Let $f_\omega (t) = \mathcal I[\sin (\omega t) \geq 0]$, and $\mathcal F = \{ f_{\omega} : \omega \in R\}$, $t\in\mathbb R$. For any given $m = ...

Keio203

561

asked Sep 22, 2022 at 18:24

0 votes

1 answer

721 views

What does w.p. mean in formulas?

I'm checking Facebook's paper about Prophet algorithm. I don't understand a part of formula, "w.p.". It's hard to search on Google. Could anyone help me understand this? https://peerj.com/...

dmjy

103

asked Sep 22, 2022 at 13:35

0 votes

0 answers

70 views

Question about KL divergence and this formula.

I was reading papers : https://arxiv.org/pdf/1503.03585.pdf for understanding https://arxiv.org/pdf/2006.11239.pdf (this paper is about denoising diffusion model) But I can't figure how the author ...

NeverneverNever

25

asked Sep 18, 2022 at 22:18

1 vote

0 answers

33 views

How to calculate the unifrom entropy or VC dimension of the following class of functions?

When dealing with U process I meet with such a uniform entropy to calculate. For any $\eta>0$, function class $\mathcal{F}$ containing functions $f=\left(f_{i, j}\right)_{1 \leq i \neq j \leq n}: \...

leslie zhang

69

asked Sep 11, 2022 at 13:15

2 votes

1 answer

81 views

What does this math notation mean? $\min_{i} \|x_{i}\|$ [closed]

$\qquad\min_{i} \|x_{i}\|$ I'm doing some machine learning problems and I ran into this notation. I don't understand what "mini" means in this case. Is it the smallest element in the norm of ...

hagendatz1113

39

asked Aug 28, 2022 at 22:30

0 votes

0 answers

34 views

How do I stop my normals from spreading out under perturbation?

In many AI generative models you start with a so-called latent vector $\mathbf{z}$ of high dimension $d$ such that each $z_i \in \mathcal{N}(0,1)$. I'd like to randomly perturb this distribution in ...

Hooked

6,697

asked Aug 27, 2022 at 17:08

0 votes

1 answer

244 views

clarification about the kl divergence between a continuous and a discrete distribution

I was reading this blog post on bayesian neural networks, where the author shows that if we use as a variational distribution a product of delta function, then minimizing the loss function of a BNN is ...

Alucard

284

asked Aug 20, 2022 at 13:23

4 votes

0 answers

390 views

Multiclass Linear Discriminant Analysis

This question is based on the Multiclass Linear Discriminant Analysis (MLDA) describe in Lectures slides by Olga Veksler, which is a generalization of Fisher's Linear Discriminant. My use in MLDA is ...

Triceratops

332

asked Aug 9, 2022 at 12:17

0 votes

0 answers

213 views

Training, validation, and test dataset and i.i.d. assumption

I wonder whether we should distinguish the validation and test dataset based on i.i.d. assumption. According to the statistical learning theory, the i.i.d. assumption is required to affect ...

Minsik Seo

107

asked Jul 18, 2022 at 3:57

0 votes

0 answers

54 views

Likelihood in MAP and MLE for linear regression

In MAP estimation for linear regression task, the posterior of the weight given the data is written as $p(w|X,Y)=\frac{p(Y|X,w)p(w)}{p(Y|X)}$, why the likelihood is not $p(X,Y|w)$? From my ...

William Lin

115

asked Jul 15, 2022 at 9:21

0 votes

0 answers

45 views

Find variance of random vector

I have a question about random vectors A random vector (X, Y ) has a continuous distribution with a density function $$f(x, y) = \begin{cases}c · x &\mbox{for}& 0 ≤ x ≤ 2, \max(0, 1 − x) ≤ y ≤ ...

user1074987

25

asked Jul 9, 2022 at 12:40

2 votes

2 answers

173 views

Mathematical notation in a machine learning problem, majority rule

(I apologise that the title may be a bit confusing and I don't know if this is the right community to ask my question.) This is a mathematical notation problem in the field of machine learning. A ...

rrchtr

23

asked Jul 6, 2022 at 16:54

1 vote

1 answer

57 views

Understanding a Simple Proof with Integrals

In this machine learning paper, the following lemma is stated (and proven in the Appendix A, cf. page 11): Lemma A.1 For random variables $X$, $Y$ and function $f(x, y)$ under suitable regularity ...

Hermi

702

asked Jul 3, 2022 at 21:07

0 votes

1 answer

71 views

Can someone clarify the use of the width of the ellipsoid regarding Mahalanobis distance?

My knowledge of Math is limited. I was looking up Mahalanobis distance out of curiosity, after seeing a reference. From Wikipedia: Mahalanobis distance - Intuitive explanation Putting this on a ...

user1072269

asked Jun 29, 2022 at 12:40

0 votes

0 answers

58 views

Using Chi-squared Tests for Feature Selection with Big Data?

When dealing with non-linear data, since the reliability of chi-squared tests diminishes with the number of samples, is it reasonable to divide a large dataset into sample spaces of, say, 20 or 40, ...

midmath

63

asked Jun 16, 2022 at 3:05

1 vote

0 answers

21 views

Determining Viability of Chi-squared for Feature Selection

How does one determine the likelihood chi-squared is accurately determining features during feature selection of categorical data? To summarize the rest of this post, the chi-squared test doesn't ...

midmath

63

asked Jun 16, 2022 at 1:23

2 votes

1 answer

186 views

Closure of balls in Reproducing Kernel Hilbert Space (RKHS)

Let $X \subset \mathbb{R}^m$ be compact, and $k: X\times X \rightarrow \mathbb{R}$ be a universal kernel function, in the sense that the corresponding RKHS $\mathcal{H}_k$ is dense in $C(X)$ under the ...

masala

23

asked Jun 3, 2022 at 20:38

0 votes

1 answer

124 views

How to combine various measures into a single measure?

So I'm trying to understand the intuition behind the accepted answer here which is used to combine several scores into a single score. Namely, this part: ...

aqibjr1

237

asked May 27, 2022 at 10:25

1 vote

0 answers

70 views

preventing rare extreme values in linear regression prediction

I am trying to train a model with a lot of input variables using linear regression. For technical reasons, my training data is obtained from a simulation that closely but not perfectly mirrors the ...

poisonDartFrog

51

asked May 22, 2022 at 18:59

1 vote

0 answers

36 views

Area Under Precision-Recall and Area Under ROC curve for different amount of observations

I am doing a research and thus comparing some algorithms for binary classification. Worth to mention that, the data set is highly imbalanced i.e., the minority class is only 0.2%. Notation: Area Under ...

Gaussen

81

asked May 17, 2022 at 7:57

4 votes

1 answer

183 views

Can Machine Learning models be considered as "Approximate Dynamic Programming"?

In the context of certain statistical/machine learning models, such as models that are trying to estimate "optimal policies" (e.g. reinforcement learning) - can we consider these models as &...

stats_noob

3,268

asked May 10, 2022 at 15:56

0 votes

0 answers

48 views

Complexity of Lebesgue measurable spaces

Consider a discrete finite set $\Omega=X\times Y \in \mathbb{R}^{m\times n}$ for finite $m,n$. Let $(\Omega,\Sigma,\mu)$ be the measure space. ($\Sigma$ is the power set and $\mu$ is $\sigma$-finite ...

rookie

1,728

asked May 5, 2022 at 8:14

0 votes

0 answers

21 views

"Manipulating" Normal Distributions

I am reading the following book https://algorithmsbook.com/optimization/files/optimization.pdf at page 281: I am trying to understand how to manipulate the matrix terms to verify the following 2 ...

stats_noob

3,268

asked May 4, 2022 at 4:36

2 votes

2 answers

1k views

Relationship Between Bayesian Optimization and Gaussian Process

In Bayesian Optimization, the function (i.e. objective function) that we are trying to optimize is modelled using some surrogate function - this surrogate function usually turns out to be a Gaussian ...

stats_noob

3,268

asked May 4, 2022 at 2:39

0 votes

1 answer

151 views

Box-Muller Transformation: Polar Coordinates Interpretation

I am aware that the Box-Muller transform leverages polar coordinates to arrive at the final transformations by plotting two uniform random variables, $(u, v)$ in the Cartesian plane. I have not seen ...

TipsyMath

1

asked Apr 13, 2022 at 2:19

3 votes

1 answer

182 views

Convergence of $M$-estimators when the argmin is not unique

Let $(X_i)_{1\le i\le n}$ be i.i.d. random variables taking values in a compact set $\mathcal X\subseteq \mathbb R^d$, and let $\mathcal P_n = \mathcal P_n(\cdot\mid X_1,\ldots,X_n)$ and $\mathcal P$ ...

Stratos supports the strike

4,730

asked Mar 29, 2022 at 23:33

1 vote

0 answers

43 views

Reducing variance in linear regression

While reading The Elements of Statistical Learning the author states that by shriking the coefficients of a liinear regression you raise the bias while lowering the variance and thus, sometimes, ...

Guilherme Takata

121

asked Mar 29, 2022 at 13:32

1 vote

1 answer

73 views

Parameter estimation in linear model - why standard deviation of parameter increases as X matrix gets wider?

Intro Let $Y = X\beta + \epsilon$ where $X$ is randomly generated data from normal distribution fitted into $n \times m$ matrix and $\epsilon$ is a vector of normal random errors. Say that first 5 ...

Brzoskwinia

241

asked Mar 14, 2022 at 20:21

2 votes

1 answer

209 views

Are Machine Learning Optimization Problem ever Categorized as "P" or "NP"?

In the context of Computer Science and Optimization, I have heard that different problems can be classified using the "P vs NP" framework. Essentially, there is a hierarchy of problems based ...

stats_noob

3,268

asked Mar 13, 2022 at 19:46

0 votes

1 answer

99 views

Deriving the Bayes Optimal Classifier (Mitchell, Machine Learning)

I am trying to recreate the Bayes Optimal Classifier result given in Machine Learning textbook by Mitchell. Below, I've added the desired result from the text and my work. I think I've taken the right ...

takeaseat123

3

asked Mar 1, 2022 at 0:41

0 votes

1 answer

81 views

Mean squared error minimization

I'm studying machine learning right now and I have find to following exercise: We define the mean squared error of a number $x \in \mathbb{R}$ , where $a_{1}...,a_{n} \in \mathbb{R}$ $$f(x)= \frac{1}{...

Herrpeter

1,324

asked Jan 19, 2022 at 10:25

0 votes

0 answers

130 views

Textbook recommendation on rigorous machine learning results

I am looking for textbook(s) in machine learning theory that satisfies the following: The text should be graduate level. It assumes all undergraduate level mathematics and early graduate level of ...

温泽海

2,497

asked Jan 12, 2022 at 15:14

1 vote

0 answers

35 views

How to tell geometrically/graphically the statistical properties of a ring distribution?

I pulled this distribution of a 2D random variable from page 5: https://arxiv.org/pdf/1606.05908.pdf I want to know: How can we infer the covariance matrix from the plot? How can we infer the ...

user3180

729

asked Jan 7, 2022 at 7:46

1 vote

0 answers

23 views

On the bounds of estimated conditional correlations and a follow-up question on the inferred properties of underlying structural parameters.

Framework It is assumed that the data is Gaussian and follows the following structural equation model/additive noise model $$ Y = \sum_{j=1}^{p} X_j \theta_j + \epsilon$$ $$ ||\theta||_0=s<<n $$...

Jorge de la Cal

11

asked Dec 10, 2021 at 16:40

1 vote

1 answer

113 views

Can't we just use PCA to solve the problem of Linear Regression

From my intuitive understanding till now If I have let's say a set of 2D points, then performing the PCA will give me the direction that reduces the variance along one direction drastically right. But ...

Abhishek Mittal

15

asked Nov 22, 2021 at 7:05

1 vote

0 answers

35 views

Why are latent spaces able to learn representations - autoencoder?

As the title states, why are latent spaces even able to intelligently learn representations? There's no guarantee that we learn the most important features since it's all done automatically in ...

user2793618

49

asked Nov 19, 2021 at 23:37

1 vote

0 answers

51 views

An efficient stopping rule to determine the sign of the mean of an i.i.d. sequence of random variables.

Do there exist a family of measurable functions $(f_t^\delta)_{t \in \mathbb{N}, \delta \in (0,1)}$ and constants $C,c>0$ such that, for each $t \in \mathbb{N}$ and $\delta \in (0,1)$ we have that $...

Bob

5,783

asked Nov 17, 2021 at 16:21

0 votes

1 answer

28 views

Problem with conditional expected value

In the book Element of Statistical Learning, the author says that the Expected Prediction Error, for an arbitrary test point $x_0$ is: $$EPE(x_0) = E_{y_0 | x_0}E_\mathcal{T}(y_0 -\hat{y}_0)^2$$ where ...

Federico Mondaini

49

asked Nov 10, 2021 at 17:22

2 votes

3 answers

397 views

Implementing multiclass logistic regression from scratch

This is a sequel to a previous question about implementing binary logistic regression from scratch. Background knowledge: To train a logistic regression model for a classification problem with $K$ ...

littleO

52.5k

asked Nov 8, 2021 at 23:31

1 vote

1 answer

356 views

How does the bias weight $w_0$ get computed during ridge regression?

I am given a full-rank feature matrix $\mathbf{X}$ to which I am supposed to provide a closed form solution for the weights $\hat{\mathbf{w}}_{ridge}$ of a ridge-regression optimization problem. The ...

Nero

73

asked Nov 6, 2021 at 18:27

3 votes

1 answer

196 views

Implementing binary logistic regression from scratch

Background knowledge: To train a logistic regression model for a classification problem with two classes (called class $0$ and class $1$), we are given a training dataset consisting of feature vectors ...

littleO

52.5k

asked Nov 5, 2021 at 21:27

6 votes

1 answer

251 views

Proof of $\frac{1}{n}\mathrm{E} \left[ \| \mathbf{X}\mathbf{\hat{w}} - \mathbf{X}\mathbf{w}^{*} \|^{2}_{2} \right] = \sigma^{2}\frac{d}{n}$

I am trying to find a proof for the MSE of a linear regression: \begin{gather} \frac{1}{n}\mathrm{E} \left[ \| \mathbf{X}\mathbf{\hat{w}} - \mathbf{X}\mathbf{w}^{*} \|^{2}_{2} \right] = \sigma^{2}\...

Nero

73

asked Oct 28, 2021 at 13:31

4 votes

1 answer

123 views

How to compute the dual of an optimization problem defined on a function space?

I am interested in one result in the first version of the paper titled "On the Margin Theory of Feedforward Neural Networks" by Colin Wei, Jason D. Lee, Qiang Liu and Tengyu Ma. In Equation ...

Stratos supports the strike

4,730

asked Oct 21, 2021 at 18:28

2 votes

1 answer

1k views

Difference between EPE and MSE

In the book ESL (Element of Statistical Learning), the author introduces the EPE (Expected prediction Error) and the MSE (Mean Squared Error). I know that the EPE is defined as: $$EPE(f)=E(Y-f(X))^2$$ ...

Federico Mondaini

49

asked Oct 21, 2021 at 10:50

2 votes

1 answer

173 views

Growth function $\tau_{\mathcal{H}}(m)$ lower bound

I have been working on this problem for a long time and I would like some help. They ask me to find for each $ n $ a hypothesis class $ \mathcal {H} \subset \{\pm 1 \}^{\mathbb {N}} $ with $ n $ ...

bravoralph

171

asked Oct 21, 2021 at 10:08

1 vote

1 answer

103 views

Generalization in Neural Networks: Can one Impose Conditions on the Data?

There is a well-developed theory on generalization bounds for deep neural networks, using VC dimensions and Rademacher Complexities. They work for any underlying "true" distribution $\...

Claudio Moneo

2,188

asked Oct 15, 2021 at 19:58

All Questions

Related Tags