Skip to main content

Questions tagged [learning-theory]

This tag is used for questions that are related with following branches: Statistical learning theory, Machine learning, Vapnik–Chervonenkis theory (VC theory) and all other branches that are studied and applied in the area of learning theory that involves various kinds of mathematics.

1 vote
0 answers
116 views

Distribution-free learning vs distribution-dependent learning

I came across some papers studying the problem of distribution-free learning, and I am interested in knowing the exact definition of distribution-free learning. I have searched some literature: In ...
yinan's user avatar
  • 11
4 votes
0 answers
120 views

Progress on "Un-Alching" ML?

So, a couple of years ago I watched both Ali Rahimi's NIPS speech "Machine Learning is Alchemy", (where he talks about how the field lacks a solid, overarching, theoretical foundation) and ...
dicaes's user avatar
  • 41
1 vote
1 answer
163 views

Tight upper-bounds for the Gaussian width of intersection of intersection of hyper-ellipsoid and unit-ball

Let $\Lambda$ be a positive-definite matrix of size $n$ and let $R \ge 0$, which may depend on $n$. Consider the set $S := \{x \in \mathbb R^n \mid \|x\|_2 \le R,\,\|x\|_{\Lambda^{-1}} \le 1\}$ where $...
dohmatob's user avatar
  • 6,824
1 vote
0 answers
24 views

Minimax statistical estimation of proximal transform $\mbox{prox}_g(\theta_0)$, from linear model data $y_i := x_i^\top \theta_0 + \epsilon_i$

tl;dr: My question pertains the subject of minimax estimation theory (mathematical statistics), in the context of linear regression. Given a vector $\theta_0 \in \mathbb R^d$, consider the linear ...
dohmatob's user avatar
  • 6,824
2 votes
0 answers
264 views

Covering/Bracketing number of monotone functions on $\mathbb{R}$ with uniformly bounded derivatives

I am interested in the $\| \cdot \|_{\infty}$-norm bracketing number or covering number of some collection of distribution functions on $\mathbb{R}$. Let $\mathcal{F}$ consist of all distribution ...
masala's user avatar
  • 93
1 vote
0 answers
368 views

Conditions for equivalence of RKHS norm and $L^2(P)$ norm

Let $K$ be a psd kernel on an abstract space $X$ and let $H_K$ be the induced Reproducing Kernel Hilbert Space (RKHS). Let $P$ be a probability measure on $X$ such that $H_K \subseteq L^2(P_X)$ and ...
dohmatob's user avatar
  • 6,824
1 vote
1 answer
74 views

How far from a sparse parity function can a function be and still look like such a function on small sets?

Let $\mathbb F_2^n$ denote the set of binary vectors of length $n$. A $k$-sparse parity function is a linear function $h:\mathbb F_2^n\to\mathbb F_2$ of the form $h(x)=u\cdot x$ for some $u$ of ...
Jack M's user avatar
  • 633
0 votes
0 answers
34 views

Normalizing a parameter in a regression

I am thinking about the possibility of making a parameter in my regression, let's say the $\lambda$ in a ridge regression, somehow, inside a range : $\lambda \in [0,1]$. Do you have any ideas how I ...
SUMQXDT's user avatar
0 votes
0 answers
95 views

Verification of a certain computation of VC dimension

Disclaimer: I'm not very familiar with the concept of VC dimensions and how to manipulate such objects. I'd be grateful if expects on the subject (learning theory, probability), could kindly proof ...
dohmatob's user avatar
  • 6,824
0 votes
1 answer
203 views

VC dimension of a certain derived class of binary functions

Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
dohmatob's user avatar
  • 6,824
1 vote
1 answer
201 views

Rademacher complexity of function class $(x,y) \mapsto 1[|yf(x)-\alpha| \ge \beta]$ in terms of $\alpha$, $\beta$, and Rademacher complexity of $F$

Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...
dohmatob's user avatar
  • 6,824
0 votes
0 answers
171 views

Upper-bound for bracketing number in terms of VC-dimension

Let $P$ be a probability distribution on a measurable space $\mathcal X$ (e.g;, some euclidean $\mathbb R^m$) and let $F$ be a class of funciton $f:\mathcal X \to \mathbb R$. Given, $f_1,f_2 \in F$, ...
dohmatob's user avatar
  • 6,824
1 vote
0 answers
95 views

$L_1$ convergence rates for multivariate kernel density estimation

Let $X$ be a random variable on $\mathbb R^d$ with probability density function $f$, and let $X_1,\ldots,X_n$ of $X$ be $n$ iid copies of $X$. Given a bandwidth parameter $h=h_n > 0$ and a kernel $...
dohmatob's user avatar
  • 6,824
4 votes
0 answers
161 views

Convergence rates for kernel empirical risk minimization, i.e empirical risk minimization (ERM) with kernel density estimation (KDE)

Let $\Theta$ be an open subset of some $\mathbb R^m$ and let $P$ be a probability distribution on $\mathbb R^d$ with density $f$ in a Sobolev space $W_p^s(\mathbb R^d)$, i.e all derivatives of $f$ ...
dohmatob's user avatar
  • 6,824
4 votes
2 answers
312 views

Bounds on the number of samples needed to learn a real valued function class

Let us see Theorem 6.8 in this book, https://www.cs.huji.ac.il/w~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf It gives us a lowerbound (and also an ...
Student's user avatar
  • 555

15 30 50 per page
1
2
3 4 5
7