All Questions
Tagged with learning-theory pr.probability
37
questions
3
votes
2
answers
314
views
Minimax optimal multiple hypothesis test
Let us consider the following two-player game
between Chooser and Guesser.
There is a finite set $\Omega$
and $k$ probability distributions
on $\Omega$, denoted by $
\mathcal{P}
=\{P_1,\ldots,P_k\}
$.
...
3
votes
1
answer
100
views
When does the optimal model exist in learning theory?
In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
4
votes
0
answers
140
views
Known relations between mutual information and covering number?
This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
1
vote
2
answers
216
views
Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class
Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
0
votes
1
answer
91
views
Is it reasonable to consider the subgaussian property of the logarithm of the Gaussian pdf?
Let $Y$ denote a Gaussian random variable characterized by a mean $\mu$ and a variance $\sigma^2$. Consider $N$ independent and identically distributed (i.i.d.) copies of $Y$, denoted as $Y_1, Y_2, \...
1
vote
1
answer
189
views
Rademacher complexity for a family of bounded, nondecreasing functions?
Let $\{\phi_k\}_{k=1}^K$ be a family of functions mapping from an interval $[a, b]$ to $[-1, 1]$.
That is, $\phi_k \colon[ a,b] \to [-1, 1]$ are nondecreasing maps on some finite interval $[a, b] \...
1
vote
1
answer
163
views
Tight upper-bounds for the Gaussian width of intersection of intersection of hyper-ellipsoid and unit-ball
Let $\Lambda$ be a positive-definite matrix of size $n$ and let $R \ge 0$, which may depend on $n$. Consider the set $S := \{x \in \mathbb R^n \mid \|x\|_2 \le R,\,\|x\|_{\Lambda^{-1}} \le 1\}$ where $...
1
vote
0
answers
24
views
Minimax statistical estimation of proximal transform $\mbox{prox}_g(\theta_0)$, from linear model data $y_i := x_i^\top \theta_0 + \epsilon_i$
tl;dr: My question pertains the subject of minimax estimation theory (mathematical statistics), in the context of linear regression.
Given a vector $\theta_0 \in \mathbb R^d$, consider the linear ...
1
vote
0
answers
368
views
Conditions for equivalence of RKHS norm and $L^2(P)$ norm
Let $K$ be a psd kernel on an abstract space $X$ and let $H_K$ be the induced Reproducing Kernel Hilbert Space (RKHS). Let $P$ be a probability measure on $X$ such that $H_K \subseteq L^2(P_X)$ and ...
0
votes
0
answers
95
views
Verification of a certain computation of VC dimension
Disclaimer: I'm not very familiar with the concept of VC dimensions and how to manipulate such objects. I'd be grateful if expects on the subject (learning theory, probability), could kindly proof ...
0
votes
1
answer
203
views
VC dimension of a certain derived class of binary functions
Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
1
vote
1
answer
201
views
Rademacher complexity of function class $(x,y) \mapsto 1[|yf(x)-\alpha| \ge \beta]$ in terms of $\alpha$, $\beta$, and Rademacher complexity of $F$
Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
0
votes
0
answers
171
views
Upper-bound for bracketing number in terms of VC-dimension
Let $P$ be a probability distribution on a measurable space $\mathcal X$ (e.g;, some euclidean $\mathbb R^m$) and let $F$ be a class of funciton $f:\mathcal X \to \mathbb R$. Given, $f_1,f_2 \in F$, ...
1
vote
0
answers
95
views
$L_1$ convergence rates for multivariate kernel density estimation
Let $X$ be a random variable on $\mathbb R^d$ with probability density function $f$, and let $X_1,\ldots,X_n$ of $X$ be $n$ iid copies of $X$. Given a bandwidth parameter $h=h_n > 0$ and a kernel $...
4
votes
0
answers
161
views
Convergence rates for kernel empirical risk minimization, i.e empirical risk minimization (ERM) with kernel density estimation (KDE)
Let $\Theta$ be an open subset of some $\mathbb R^m$ and let $P$ be a probability distribution on $\mathbb R^d$ with density $f$ in a Sobolev space $W_p^s(\mathbb R^d)$, i.e all derivatives of $f$ ...