All Questions
Tagged with learning-theory st.statistics
44
questions
2
votes
0
answers
64
views
How to naturally define an output space with certain properties
Consider the following regression problem $v=A(u) + \varepsilon$
for some operator $A:\mathcal{U} \rightarrow \mathcal{V}$ and some function spaces $\mathcal{U},\mathcal{V}$, mapping from $\mathcal{X}$...
3
votes
2
answers
314
views
Minimax optimal multiple hypothesis test
Let us consider the following two-player game
between Chooser and Guesser.
There is a finite set $\Omega$
and $k$ probability distributions
on $\Omega$, denoted by $
\mathcal{P}
=\{P_1,\ldots,P_k\}
$.
...
3
votes
1
answer
100
views
When does the optimal model exist in learning theory?
In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
7
votes
2
answers
448
views
Upper bound on VC-dimension of partitioned class
Fix $n,k\in \mathbb{N}_+$.
Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the ...
4
votes
0
answers
140
views
Known relations between mutual information and covering number?
This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
1
vote
0
answers
79
views
Approximation of continuous function by multilayer Relu neural network
For continuous/holder function $f$ defined on a compact set K, a fix $L$ and $m_1,m_2,\dots,m_L$, can we find a multilayer Relu fully connected network g with depth $L$ and each $i$-th layer has width ...
1
vote
2
answers
216
views
Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class
Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
2
votes
1
answer
84
views
VC-based risk bounds for classifiers on finite set
Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...
1
vote
1
answer
189
views
Rademacher complexity for a family of bounded, nondecreasing functions?
Let $\{\phi_k\}_{k=1}^K$ be a family of functions mapping from an interval $[a, b]$ to $[-1, 1]$.
That is, $\phi_k \colon[ a,b] \to [-1, 1]$ are nondecreasing maps on some finite interval $[a, b] \...
1
vote
1
answer
163
views
Tight upper-bounds for the Gaussian width of intersection of intersection of hyper-ellipsoid and unit-ball
Let $\Lambda$ be a positive-definite matrix of size $n$ and let $R \ge 0$, which may depend on $n$. Consider the set $S := \{x \in \mathbb R^n \mid \|x\|_2 \le R,\,\|x\|_{\Lambda^{-1}} \le 1\}$ where $...
1
vote
0
answers
24
views
Minimax statistical estimation of proximal transform $\mbox{prox}_g(\theta_0)$, from linear model data $y_i := x_i^\top \theta_0 + \epsilon_i$
tl;dr: My question pertains the subject of minimax estimation theory (mathematical statistics), in the context of linear regression.
Given a vector $\theta_0 \in \mathbb R^d$, consider the linear ...
1
vote
0
answers
368
views
Conditions for equivalence of RKHS norm and $L^2(P)$ norm
Let $K$ be a psd kernel on an abstract space $X$ and let $H_K$ be the induced Reproducing Kernel Hilbert Space (RKHS). Let $P$ be a probability measure on $X$ such that $H_K \subseteq L^2(P_X)$ and ...
0
votes
0
answers
95
views
Verification of a certain computation of VC dimension
Disclaimer: I'm not very familiar with the concept of VC dimensions and how to manipulate such objects. I'd be grateful if expects on the subject (learning theory, probability), could kindly proof ...
1
vote
1
answer
201
views
Rademacher complexity of function class $(x,y) \mapsto 1[|yf(x)-\alpha| \ge \beta]$ in terms of $\alpha$, $\beta$, and Rademacher complexity of $F$
Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
0
votes
0
answers
171
views
Upper-bound for bracketing number in terms of VC-dimension
Let $P$ be a probability distribution on a measurable space $\mathcal X$ (e.g;, some euclidean $\mathbb R^m$) and let $F$ be a class of funciton $f:\mathcal X \to \mathbb R$. Given, $f_1,f_2 \in F$, ...