Skip to main content

All Questions

3 votes
1 answer
100 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
rick's user avatar
  • 179
7 votes
2 answers
448 views

Upper bound on VC-dimension of partitioned class

Fix $n,k\in \mathbb{N}_+$. Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the ...
Math_Newbie's user avatar
4 votes
0 answers
140 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
Tanishq Kumar's user avatar
2 votes
1 answer
57 views

Non-linear transforms of RKHS question

I was reading the paper Norm Inequalities in Nonlinear Transforms (referenced in this question) but ran into difficulties, so I was wondering if anyone could help? I think I follow the paper until I ...
Mat's user avatar
  • 41
56 votes
10 answers
8k views

A clear map of mathematical approaches to Artificial Intelligence

I have recently become interested in Machine Learning and AI as a student of theoretical physics and mathematics, and have gone through some of the recommended resources dealing with statistical ...
1 vote
0 answers
79 views

Approximation of continuous function by multilayer Relu neural network

For continuous/holder function $f$ defined on a compact set K, a fix $L$ and $m_1,m_2,\dots,m_L$, can we find a multilayer Relu fully connected network g with depth $L$ and each $i$-th layer has width ...
Hao Yu's user avatar
  • 781
1 vote
2 answers
216 views

Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
dohmatob's user avatar
  • 6,824
2 votes
1 answer
84 views

VC-based risk bounds for classifiers on finite set

Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...
Math_Newbie's user avatar
1 vote
0 answers
116 views

Distribution-free learning vs distribution-dependent learning

I came across some papers studying the problem of distribution-free learning, and I am interested in knowing the exact definition of distribution-free learning. I have searched some literature: In ...
yinan's user avatar
  • 11
4 votes
0 answers
120 views

Progress on "Un-Alching" ML?

So, a couple of years ago I watched both Ali Rahimi's NIPS speech "Machine Learning is Alchemy", (where he talks about how the field lacks a solid, overarching, theoretical foundation) and ...
dicaes's user avatar
  • 41
2 votes
0 answers
264 views

Covering/Bracketing number of monotone functions on $\mathbb{R}$ with uniformly bounded derivatives

I am interested in the $\| \cdot \|_{\infty}$-norm bracketing number or covering number of some collection of distribution functions on $\mathbb{R}$. Let $\mathcal{F}$ consist of all distribution ...
masala's user avatar
  • 93
0 votes
0 answers
34 views

Normalizing a parameter in a regression

I am thinking about the possibility of making a parameter in my regression, let's say the $\lambda$ in a ridge regression, somehow, inside a range : $\lambda \in [0,1]$. Do you have any ideas how I ...
SUMQXDT's user avatar
2 votes
1 answer
156 views

Representer theorem for a loss / functional of the form $L(h) := \sum_{i=1}^n (|h(x_i)-y_i|+t\|h\|)^2$

Let $K:X \times X \to \mathbb R$ be a (positive-definite) kernel and let $H$ be the induced reproducing kernel Hilbert space (RKHS). Fix $(x_1,y_1),\ldots,(x_n,y_n) \in X \times \mathbb R$. For $t \ge ...
dohmatob's user avatar
  • 6,824
6 votes
1 answer
458 views

Why is this nonlinear transformation of an RKHS also an RKHS?

I came across this paper (beginning of page 6) where they stated that if $f,h\in \mathcal{H}$, where $\mathcal{H}$ is an RKHS, then $l_{h,f}=\left|f(x)-h(x)\right|^q$ where $q\geq 1$ also belongs to ...
Kashif's user avatar
  • 373
3 votes
1 answer
293 views

Games and the right mathematical framework for GANs

Generative Adversarial Networks were introduced in http://papers.nips.cc/paper/5423-generative-adversarial-nets and has more than 20000 citations. It is an important topic within deep learning. Are ...
Turbo's user avatar
  • 13.8k

15 30 50 per page