Skip to main content

Questions tagged [learning-theory]

This tag is used for questions that are related with following branches: Statistical learning theory, Machine learning, Vapnik–Chervonenkis theory (VC theory) and all other branches that are studied and applied in the area of learning theory that involves various kinds of mathematics.

2 votes
0 answers
47 views

How to naturally define an output space with certain properties

Consider the following regression problem $v=A(u) + \varepsilon$ for some operator $A:\mathcal{U} \rightarrow \mathcal{V}$ and some function spaces $U,V$, mapping from $\mathcal{X}$ to $\mathcal{Y}$. ...
emma bernd's user avatar
0 votes
0 answers
61 views

VC dimension of full-dimensional closed polyhedral cone in $\mathbb R^d$

Consider a fixed set of vectors $\{x_i\}_{i\in[n]}$ in $\mathbb R^d$ and closed polyhedral cone $C = \{w \in \mathbb R^d : w^\top x_i \geq 0, \forall i \in [n]\}$ with full dimension i.e. $C$ contains ...
Neophyte's user avatar
3 votes
2 answers
312 views

Minimax optimal multiple hypothesis test

Let us consider the following two-player game between Chooser and Guesser. There is a finite set $\Omega$ and $k$ probability distributions on $\Omega$, denoted by $ \mathcal{P} =\{P_1,\ldots,P_k\} $. ...
Aryeh Kontorovich's user avatar
3 votes
1 answer
100 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
rick's user avatar
  • 179
1 vote
0 answers
64 views

Is learning easy in balls where all candidates hypotheses agree on the query?

Let $\mathcal{H}$ be ahypothesis class, $h\in \mathcal{H}$ be a function a model that maps an input space $\mathcal{X}$ to $\{0,1\}$, and $\epsilon > 0$, let $\mathcal{D}$ denotes the ...
rivana's user avatar
  • 29
7 votes
2 answers
446 views

Upper bound on VC-dimension of partitioned class

Fix $n,k\in \mathbb{N}_+$. Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the ...
Math_Newbie's user avatar
4 votes
0 answers
137 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
Tanishq Kumar's user avatar
2 votes
1 answer
56 views

Non-linear transforms of RKHS question

I was reading the paper Norm Inequalities in Nonlinear Transforms (referenced in this question) but ran into difficulties, so I was wondering if anyone could help? I think I follow the paper until I ...
Mat's user avatar
  • 41
56 votes
10 answers
8k views

A clear map of mathematical approaches to Artificial Intelligence

I have recently become interested in Machine Learning and AI as a student of theoretical physics and mathematics, and have gone through some of the recommended resources dealing with statistical ...
1 vote
0 answers
79 views

Approximation of continuous function by multilayer Relu neural network

For continuous/holder function $f$ defined on a compact set K, a fix $L$ and $m_1,m_2,\dots,m_L$, can we find a multilayer Relu fully connected network g with depth $L$ and each $i$-th layer has width ...
Hao Yu's user avatar
  • 771
0 votes
0 answers
29 views

The hardness of active learning with fixed budget

I have been looking for theoretical papers studying this question of the fundamental hardness of PAC active learning algorithms. I found a few papers studying the problem from a fixed perspective (...
rivana's user avatar
  • 29
1 vote
2 answers
214 views

Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
dohmatob's user avatar
  • 6,814
0 votes
1 answer
91 views

Is it reasonable to consider the subgaussian property of the logarithm of the Gaussian pdf?

Let $Y$ denote a Gaussian random variable characterized by a mean $\mu$ and a variance $\sigma^2$. Consider $N$ independent and identically distributed (i.i.d.) copies of $Y$, denoted as $Y_1, Y_2, \...
Math_Y's user avatar
  • 311
2 votes
1 answer
84 views

VC-based risk bounds for classifiers on finite set

Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...
Math_Newbie's user avatar
1 vote
1 answer
187 views

Rademacher complexity for a family of bounded, nondecreasing functions?

Let $\{\phi_k\}_{k=1}^K$ be a family of functions mapping from an interval $[a, b]$ to $[-1, 1]$. That is, $\phi_k \colon[ a,b] \to [-1, 1]$ are nondecreasing maps on some finite interval $[a, b] \...
Drew Brady's user avatar

15 30 50 per page
1
2 3 4 5
7