Questions tagged [learning-theory]
This tag is used for questions that are related with following branches: Statistical learning theory, Machine learning, Vapnik–Chervonenkis theory (VC theory) and all other branches that are studied and applied in the area of learning theory that involves various kinds of mathematics.
100
questions
2
votes
0
answers
64
views
How to naturally define an output space with certain properties
Consider the following regression problem $v=A(u) + \varepsilon$
for some operator $A:\mathcal{U} \rightarrow \mathcal{V}$ and some function spaces $\mathcal{U},\mathcal{V}$, mapping from $\mathcal{X}$...
0
votes
0
answers
63
views
VC dimension of full-dimensional closed polyhedral cone in $\mathbb R^d$
Consider a fixed set of vectors $\{x_i\}_{i\in[n]}$ in $\mathbb R^d$ and closed polyhedral cone $C = \{w \in \mathbb R^d : w^\top x_i \geq 0, \forall i \in [n]\}$ with full dimension i.e. $C$ contains ...
3
votes
2
answers
314
views
Minimax optimal multiple hypothesis test
Let us consider the following two-player game
between Chooser and Guesser.
There is a finite set $\Omega$
and $k$ probability distributions
on $\Omega$, denoted by $
\mathcal{P}
=\{P_1,\ldots,P_k\}
$.
...
3
votes
1
answer
100
views
When does the optimal model exist in learning theory?
In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
1
vote
0
answers
64
views
Is learning easy in balls where all candidates hypotheses agree on the query?
Let $\mathcal{H}$ be ahypothesis class, $h\in \mathcal{H}$ be a function a model that maps an input space $\mathcal{X}$ to $\{0,1\}$, and $\epsilon > 0$, let $\mathcal{D}$ denotes the ...
7
votes
2
answers
448
views
Upper bound on VC-dimension of partitioned class
Fix $n,k\in \mathbb{N}_+$.
Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the ...
4
votes
0
answers
140
views
Known relations between mutual information and covering number?
This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
2
votes
1
answer
57
views
Non-linear transforms of RKHS question
I was reading the paper Norm Inequalities in Nonlinear Transforms (referenced in this question) but ran into difficulties, so I was wondering if anyone could help?
I think I follow the paper until I ...
56
votes
10
answers
8k
views
A clear map of mathematical approaches to Artificial Intelligence
I have recently become interested in Machine Learning and AI as a student of theoretical physics and mathematics, and have gone through some of the recommended resources dealing with statistical ...
1
vote
0
answers
79
views
Approximation of continuous function by multilayer Relu neural network
For continuous/holder function $f$ defined on a compact set K, a fix $L$ and $m_1,m_2,\dots,m_L$, can we find a multilayer Relu fully connected network g with depth $L$ and each $i$-th layer has width ...
0
votes
0
answers
29
views
The hardness of active learning with fixed budget
I have been looking for theoretical papers studying this question of the fundamental hardness of PAC active learning algorithms. I found a few papers studying the problem from a fixed perspective (...
1
vote
2
answers
216
views
Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class
Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
0
votes
1
answer
91
views
Is it reasonable to consider the subgaussian property of the logarithm of the Gaussian pdf?
Let $Y$ denote a Gaussian random variable characterized by a mean $\mu$ and a variance $\sigma^2$. Consider $N$ independent and identically distributed (i.i.d.) copies of $Y$, denoted as $Y_1, Y_2, \...
2
votes
1
answer
84
views
VC-based risk bounds for classifiers on finite set
Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...
1
vote
1
answer
189
views
Rademacher complexity for a family of bounded, nondecreasing functions?
Let $\{\phi_k\}_{k=1}^K$ be a family of functions mapping from an interval $[a, b]$ to $[-1, 1]$.
That is, $\phi_k \colon[ a,b] \to [-1, 1]$ are nondecreasing maps on some finite interval $[a, b] \...
1
vote
0
answers
116
views
Distribution-free learning vs distribution-dependent learning
I came across some papers studying the problem of distribution-free learning, and I am interested in knowing the exact definition of distribution-free learning.
I have searched some literature:
In ...
4
votes
0
answers
120
views
Progress on "Un-Alching" ML?
So, a couple of years ago I watched both Ali Rahimi's NIPS speech "Machine Learning is Alchemy",
(where he talks about how the field lacks a solid, overarching, theoretical foundation) and ...
1
vote
1
answer
163
views
Tight upper-bounds for the Gaussian width of intersection of intersection of hyper-ellipsoid and unit-ball
Let $\Lambda$ be a positive-definite matrix of size $n$ and let $R \ge 0$, which may depend on $n$. Consider the set $S := \{x \in \mathbb R^n \mid \|x\|_2 \le R,\,\|x\|_{\Lambda^{-1}} \le 1\}$ where $...
1
vote
0
answers
24
views
Minimax statistical estimation of proximal transform $\mbox{prox}_g(\theta_0)$, from linear model data $y_i := x_i^\top \theta_0 + \epsilon_i$
tl;dr: My question pertains the subject of minimax estimation theory (mathematical statistics), in the context of linear regression.
Given a vector $\theta_0 \in \mathbb R^d$, consider the linear ...
2
votes
0
answers
264
views
Covering/Bracketing number of monotone functions on $\mathbb{R}$ with uniformly bounded derivatives
I am interested in the $\| \cdot \|_{\infty}$-norm bracketing number or covering number of some collection of distribution functions on $\mathbb{R}$.
Let $\mathcal{F}$ consist of all distribution ...
1
vote
0
answers
368
views
Conditions for equivalence of RKHS norm and $L^2(P)$ norm
Let $K$ be a psd kernel on an abstract space $X$ and let $H_K$ be the induced Reproducing Kernel Hilbert Space (RKHS). Let $P$ be a probability measure on $X$ such that $H_K \subseteq L^2(P_X)$ and ...
1
vote
1
answer
74
views
How far from a sparse parity function can a function be and still look like such a function on small sets?
Let $\mathbb F_2^n$ denote the set of binary vectors of length $n$. A $k$-sparse parity function is a linear function $h:\mathbb F_2^n\to\mathbb F_2$ of the form $h(x)=u\cdot x$ for some $u$ of ...
0
votes
0
answers
34
views
Normalizing a parameter in a regression
I am thinking about the possibility of making a parameter in my regression, let's say the $\lambda$ in a ridge regression, somehow, inside a range : $\lambda \in [0,1]$. Do you have any ideas how I ...
0
votes
0
answers
95
views
Verification of a certain computation of VC dimension
Disclaimer: I'm not very familiar with the concept of VC dimensions and how to manipulate such objects. I'd be grateful if expects on the subject (learning theory, probability), could kindly proof ...
0
votes
1
answer
203
views
VC dimension of a certain derived class of binary functions
Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
1
vote
1
answer
201
views
Rademacher complexity of function class $(x,y) \mapsto 1[|yf(x)-\alpha| \ge \beta]$ in terms of $\alpha$, $\beta$, and Rademacher complexity of $F$
Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
0
votes
0
answers
171
views
Upper-bound for bracketing number in terms of VC-dimension
Let $P$ be a probability distribution on a measurable space $\mathcal X$ (e.g;, some euclidean $\mathbb R^m$) and let $F$ be a class of funciton $f:\mathcal X \to \mathbb R$. Given, $f_1,f_2 \in F$, ...
1
vote
0
answers
95
views
$L_1$ convergence rates for multivariate kernel density estimation
Let $X$ be a random variable on $\mathbb R^d$ with probability density function $f$, and let $X_1,\ldots,X_n$ of $X$ be $n$ iid copies of $X$. Given a bandwidth parameter $h=h_n > 0$ and a kernel $...
4
votes
0
answers
161
views
Convergence rates for kernel empirical risk minimization, i.e empirical risk minimization (ERM) with kernel density estimation (KDE)
Let $\Theta$ be an open subset of some $\mathbb R^m$ and let $P$ be a probability distribution on $\mathbb R^d$ with density $f$ in a Sobolev space $W_p^s(\mathbb R^d)$, i.e all derivatives of $f$ ...
4
votes
2
answers
312
views
Bounds on the number of samples needed to learn a real valued function class
Let us see Theorem 6.8 in this book, https://www.cs.huji.ac.il/w~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf
It gives us a lowerbound (and also an ...