Newest 'learning-theory' Questions

2 votes

0 answers

64 views

How to naturally define an output space with certain properties

Consider the following regression problem $v=A(u) + \varepsilon$ for some operator $A:\mathcal{U} \rightarrow \mathcal{V}$ and some function spaces $\mathcal{U},\mathcal{V}$, mapping from $\mathcal{X}$...

emma bernd

21

asked Jul 3 at 15:07

0 votes

0 answers

63 views

VC dimension of full-dimensional closed polyhedral cone in $\mathbb R^d$

Consider a fixed set of vectors $\{x_i\}_{i\in[n]}$ in $\mathbb R^d$ and closed polyhedral cone $C = \{w \in \mathbb R^d : w^\top x_i \geq 0, \forall i \in [n]\}$ with full dimension i.e. $C$ contains ...

Neophyte

1

asked Apr 10 at 19:16

3 votes

2 answers

314 views

Minimax optimal multiple hypothesis test

Let us consider the following two-player game between Chooser and Guesser. There is a finite set $\Omega$ and $k$ probability distributions on $\Omega$, denoted by $ \mathcal{P} =\{P_1,\ldots,P_k\} $. ...

Aryeh Kontorovich

6,111

asked Jan 27 at 22:53

3 votes

1 answer

100 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...

rick

179

asked Jan 23 at 22:07

1 vote

0 answers

64 views

Is learning easy in balls where all candidates hypotheses agree on the query?

Let $\mathcal{H}$ be ahypothesis class, $h\in \mathcal{H}$ be a function a model that maps an input space $\mathcal{X}$ to $\{0,1\}$, and $\epsilon > 0$, let $\mathcal{D}$ denotes the ...

rivana

29

asked Jan 7 at 18:19

7 votes

2 answers

448 views

Upper bound on VC-dimension of partitioned class

Fix $n,k\in \mathbb{N}_+$. Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the ...

Math_Newbie

362

asked Jan 2 at 22:12

4 votes

0 answers

140 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...

Tanishq Kumar

141

asked Dec 31, 2023 at 7:45

2 votes

1 answer

57 views

Non-linear transforms of RKHS question

I was reading the paper Norm Inequalities in Nonlinear Transforms (referenced in this question) but ran into difficulties, so I was wondering if anyone could help? I think I follow the paper until I ...

Mat

41

asked Dec 15, 2023 at 12:54

56 votes

10 answers

8k views

A clear map of mathematical approaches to Artificial Intelligence

I have recently become interested in Machine Learning and AI as a student of theoretical physics and mathematics, and have gone through some of the recommended resources dealing with statistical ...

Community wiki

AI Bert

1 vote

0 answers

79 views

Approximation of continuous function by multilayer Relu neural network

For continuous/holder function $f$ defined on a compact set K， a fix $L$ and $m_1,m_2,\dots,m_L$, can we find a multilayer Relu fully connected network g with depth $L$ and each $i$-th layer has width ...

Hao Yu

781

asked Nov 22, 2023 at 15:31

0 votes

0 answers

29 views

The hardness of active learning with fixed budget

I have been looking for theoretical papers studying this question of the fundamental hardness of PAC active learning algorithms. I found a few papers studying the problem from a fixed perspective (...

rivana

29

asked Oct 20, 2023 at 8:06

1 vote

2 answers

216 views

Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...

dohmatob

6,824

asked Oct 13, 2023 at 10:09

0 votes

1 answer

91 views

Is it reasonable to consider the subgaussian property of the logarithm of the Gaussian pdf?

Let $Y$ denote a Gaussian random variable characterized by a mean $\mu$ and a variance $\sigma^2$. Consider $N$ independent and identically distributed (i.i.d.) copies of $Y$, denoted as $Y_1, Y_2, \...

Math_Y

311

asked May 27, 2023 at 20:38

2 votes

1 answer

84 views

VC-based risk bounds for classifiers on finite set

Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...

Math_Newbie

362

asked Feb 15, 2023 at 21:46

1 vote

1 answer

189 views

Rademacher complexity for a family of bounded, nondecreasing functions?

Let $\{\phi_k\}_{k=1}^K$ be a family of functions mapping from an interval $[a, b]$ to $[-1, 1]$. That is, $\phi_k \colon[ a,b] \to [-1, 1]$ are nondecreasing maps on some finite interval $[a, b] \...

Drew Brady

460

asked Nov 19, 2022 at 22:49

1 vote

0 answers

116 views

Distribution-free learning vs distribution-dependent learning

I came across some papers studying the problem of distribution-free learning, and I am interested in knowing the exact definition of distribution-free learning. I have searched some literature: In ...

yinan

11

asked Oct 29, 2022 at 6:37

4 votes

0 answers

120 views

Progress on "Un-Alching" ML?

So, a couple of years ago I watched both Ali Rahimi's NIPS speech "Machine Learning is Alchemy", (where he talks about how the field lacks a solid, overarching, theoretical foundation) and ...

dicaes

41

asked Oct 22, 2022 at 22:51

1 vote

1 answer

163 views

Tight upper-bounds for the Gaussian width of intersection of intersection of hyper-ellipsoid and unit-ball

Let $\Lambda$ be a positive-definite matrix of size $n$ and let $R \ge 0$, which may depend on $n$. Consider the set $S := \{x \in \mathbb R^n \mid \|x\|_2 \le R,\,\|x\|_{\Lambda^{-1}} \le 1\}$ where $...

dohmatob

6,824

asked Oct 20, 2022 at 7:12

1 vote

0 answers

24 views

Minimax statistical estimation of proximal transform $\mbox{prox}_g(\theta_0)$, from linear model data $y_i := x_i^\top \theta_0 + \epsilon_i$

tl;dr: My question pertains the subject of minimax estimation theory (mathematical statistics), in the context of linear regression. Given a vector $\theta_0 \in \mathbb R^d$, consider the linear ...

dohmatob

6,824

asked Oct 13, 2022 at 6:00

2 votes

0 answers

264 views

Covering/Bracketing number of monotone functions on $\mathbb{R}$ with uniformly bounded derivatives

I am interested in the $\| \cdot \|_{\infty}$-norm bracketing number or covering number of some collection of distribution functions on $\mathbb{R}$. Let $\mathcal{F}$ consist of all distribution ...

masala

93

asked Jun 17, 2022 at 21:40

1 vote

0 answers

368 views

Conditions for equivalence of RKHS norm and $L^2(P)$ norm

Let $K$ be a psd kernel on an abstract space $X$ and let $H_K$ be the induced Reproducing Kernel Hilbert Space (RKHS). Let $P$ be a probability measure on $X$ such that $H_K \subseteq L^2(P_X)$ and ...

dohmatob

6,824

asked Jun 17, 2022 at 16:09

1 vote

1 answer

74 views

How far from a sparse parity function can a function be and still look like such a function on small sets?

Let $\mathbb F_2^n$ denote the set of binary vectors of length $n$. A $k$-sparse parity function is a linear function $h:\mathbb F_2^n\to\mathbb F_2$ of the form $h(x)=u\cdot x$ for some $u$ of ...

Jack M

633

asked May 19, 2022 at 22:53

0 votes

0 answers

34 views

Normalizing a parameter in a regression

I am thinking about the possibility of making a parameter in my regression, let's say the $\lambda$ in a ridge regression, somehow, inside a range : $\lambda \in [0,1]$. Do you have any ideas how I ...

SUMQXDT

1

asked May 9, 2022 at 14:53

0 votes

0 answers

95 views

Verification of a certain computation of VC dimension

Disclaimer: I'm not very familiar with the concept of VC dimensions and how to manipulate such objects. I'd be grateful if expects on the subject (learning theory, probability), could kindly proof ...

dohmatob

6,824

asked Apr 22, 2022 at 6:28

0 votes

1 answer

203 views

VC dimension of a certain derived class of binary functions

Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...

dohmatob

6,824

asked Apr 16, 2022 at 8:32

1 vote

1 answer

201 views

Rademacher complexity of function class $(x,y) \mapsto 1[|yf(x)-\alpha| \ge \beta]$ in terms of $\alpha$, $\beta$, and Rademacher complexity of $F$

Let $X$ be a measurable space and let $P$ be a probability distribution on $X \times \{\pm 1\}$. Let $F$ be a function class on $X$, i.e., a collection of (measurable) functions from $X$ to $\mathbb R$...

dohmatob

6,824

asked Apr 15, 2022 at 16:25

0 votes

0 answers

171 views

Upper-bound for bracketing number in terms of VC-dimension

Let $P$ be a probability distribution on a measurable space $\mathcal X$ (e.g;, some euclidean $\mathbb R^m$) and let $F$ be a class of funciton $f:\mathcal X \to \mathbb R$. Given, $f_1,f_2 \in F$, ...

dohmatob

6,824

asked Apr 13, 2022 at 9:49

1 vote

0 answers

95 views

$L_1$ convergence rates for multivariate kernel density estimation

Let $X$ be a random variable on $\mathbb R^d$ with probability density function $f$, and let $X_1,\ldots,X_n$ of $X$ be $n$ iid copies of $X$. Given a bandwidth parameter $h=h_n > 0$ and a kernel $...

dohmatob

6,824

asked Apr 7, 2022 at 17:03

4 votes

0 answers

161 views

Convergence rates for kernel empirical risk minimization, i.e empirical risk minimization (ERM) with kernel density estimation (KDE)

Let $\Theta$ be an open subset of some $\mathbb R^m$ and let $P$ be a probability distribution on $\mathbb R^d$ with density $f$ in a Sobolev space $W_p^s(\mathbb R^d)$, i.e all derivatives of $f$ ...

dohmatob

6,824

asked Apr 6, 2022 at 14:52

4 votes

2 answers

312 views

Bounds on the number of samples needed to learn a real valued function class

Let us see Theorem 6.8 in this book, https://www.cs.huji.ac.il/w~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf It gives us a lowerbound (and also an ...

Student

555

asked Mar 31, 2022 at 13:20

Stack Exchange Network

Questions tagged [learning-theory]

How to naturally define an output space with certain properties

VC dimension of full-dimensional closed polyhedral cone in $\mathbb R^d$

Minimax optimal multiple hypothesis test

When does the optimal model exist in learning theory?

Is learning easy in balls where all candidates hypotheses agree on the query?

Upper bound on VC-dimension of partitioned class

Known relations between mutual information and covering number?

Non-linear transforms of RKHS question

A clear map of mathematical approaches to Artificial Intelligence

Approximation of continuous function by multilayer Relu neural network

The hardness of active learning with fixed budget

Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class

Is it reasonable to consider the subgaussian property of the logarithm of the Gaussian pdf?

VC-based risk bounds for classifiers on finite set

Rademacher complexity for a family of bounded, nondecreasing functions?

Distribution-free learning vs distribution-dependent learning

Progress on "Un-Alching" ML?

Tight upper-bounds for the Gaussian width of intersection of intersection of hyper-ellipsoid and unit-ball

Minimax statistical estimation of proximal transform $\mbox{prox}_g(\theta_0)$, from linear model data $y_i := x_i^\top \theta_0 + \epsilon_i$

Covering/Bracketing number of monotone functions on $\mathbb{R}$ with uniformly bounded derivatives

Conditions for equivalence of RKHS norm and $L^2(P)$ norm

How far from a sparse parity function can a function be and still look like such a function on small sets?

Normalizing a parameter in a regression

Verification of a certain computation of VC dimension

VC dimension of a certain derived class of binary functions

Rademacher complexity of function class $(x,y) \mapsto 1[|yf(x)-\alpha| \ge \beta]$ in terms of $\alpha$, $\beta$, and Rademacher complexity of $F$

Upper-bound for bracketing number in terms of VC-dimension

$L_1$ convergence rates for multivariate kernel density estimation

Convergence rates for kernel empirical risk minimization, i.e empirical risk minimization (ERM) with kernel density estimation (KDE)

Bounds on the number of samples needed to learn a real valued function class

Questions tagged [learning-theory]

Related Tags