Questions tagged [learning-theory]

Ask Question

This tag is used for questions that are related with following branches: Statistical learning theory, Machine learning, Vapnik–Chervonenkis theory (VC theory) and all other branches that are studied and applied in the area of learning theory that involves various kinds of mathematics.

100 questions

4 votes

1 answer

387 views

Minimize the variance of a Boltzmann distribution

N.B.: Sorry for cross-posting from https://stats.stackexchange.com/posts/347804/edit (I realized it was the wrong venue for the question, but couldn't find an easy way to transfer the question here). ...

dohmatob

6,824

asked May 25, 2018 at 9:04

3 votes

1 answer

271 views

Concentration inequalities specialized for log-likelihood / log-density functions

Let $P$ be a probability measure and $f$ be some probability density function (not necessarily related to $P$). Consider the function $$ L(X_1,\ldots,X_n) =\frac1n\sum_{i=1}^n\log f(X_i), \quad X_i\...

JohnA

asked May 4, 2018 at 11:57

38 votes

4 answers

3k views

Is there research on human-oriented theorem proving?

I know there is already a research community that is working on automatic theorem proving mostly using logic (and things like Coq and ACL2). However, I came across a lecture from a fields medalist W.T....

Charlie Parker

asked Nov 4, 2017 at 0:10

2 votes

0 answers

436 views

Relation between pseudo-dimension and Rademacher complexity

With techniques of Dudley's entropy bound and Haussler's upper bound one can show that there exists a constant $C$ such that any class of $\{0,1\}$ indicator functions with Vapnik-Chervonenkis ...

axk

asked May 29, 2017 at 23:18

3 votes

1 answer

209 views

Is this generalization bound proof wrong?

This is an ICML02 paper by Garg, Har-Peled & Roth: http://sarielhp.org/p/01/bounds/bounds.pdf The equation after eq. (3) is the well-known symmetrization trick for $\sup_{h\in {\mathcal H}} |E(h)-...

axk

asked May 2, 2017 at 2:10

1 vote

1 answer

82 views

Clarification on margin bound uniform w.r.t. the margin parameter

Theorem 4.5. in the book "Foundations of Machine Learning" by Mohri et al: http://prlab.tudelft.nl/sites/default/files/Foundations_of_Machine_Learning.pdf derives a generalization bound to hold ...

axk

asked Apr 24, 2017 at 23:41

2 votes

1 answer

1k views

Rademacher complexity of composition of functions

I am looking for a bound on the empirical Rademacher complexity of the following class: $G=\left\{x \rightarrow \frac{h^T f(x)}{\|h\|_2 \cdot \|f(x)\|_2} : h\in R^d, f()=(f_1(),\ldots,f_d()), f_j \in ...

axk

asked Apr 15, 2017 at 21:39

2 votes

1 answer

621 views

Extension of Talagrand contraction lemma (on empirical Rademacher complexity)

Is the following true? Let $(x_1,...,x_N)$ be a set of points on the unit sphere $S^{d-1}$. Let $\ell_x: [-1,1]\rightarrow [0,1]$ be a family of Lipschitz functions indexed by $x\in S^{d-1}$, with ...

axk

asked Apr 6, 2017 at 18:31

5 votes

1 answer

978 views

VC dimension of axis-parallel boxes on the torus

First the basic definitions: Let $H$ be a family of sets, and let $P$ be a set of points. Then $H$ is said to shatter $P$ if $\{ h \cap P:~h \in H\}=2^P$, that is, if every subset of $P$ can be ...

Kurisuto Asutora

2,199

asked Jan 24, 2017 at 15:53

1 vote

0 answers

94 views

Approximating or calculating the mutual information of certain binary random vectors

In my studies of applied probability I have recently met the following problem on which I need help: We consider two binary random (column) vectors $ X,Y \in \{0,1\}^d $ where the mutual ...

groupoid

asked Nov 20, 2016 at 9:00

1 vote

0 answers

72 views

Determining when specific gradient descent converges to singular or critical points

In my research on neural networks and learning theory I have recently come across the following problem dealing with gradient descent: We consider a given column vector $ x=[x_1,x_2,...,x_{d}]^T \...

groupoid

asked Nov 7, 2016 at 13:28

1 vote

1 answer

255 views

VC dimension of cone-restricted linear classifiers

Let $\mathcal{C}$ be a pointed, salient cone in $\mathbb{R}^d$. We may also assume that $\mathcal{C}$ is full-dimensional. Consider the set of binary classifiers $$\mathcal{H} = \{\boldsymbol{x}\...

S.B.

asked Aug 29, 2016 at 20:23

8 votes

0 answers

685 views

The function space defined by deep neural nets

Given a deep net graph and the activation functions on the hidden vertices do we have a description of the function space spanned by it? (even if for some specific architectures and activation ...

gradstudent

2,186

asked Aug 1, 2016 at 22:53

1 vote

1 answer

357 views

What is the shatter coefficient / VC - dimension of some hypothesis set?

Let $H:=\{h:\mathbb{N}_0^n \rightarrow \{0,1\}| h(x_1,\cdots,x_n) = \mathbb{1}_0(\sum_{i\in I}{x_i}-\sum_{j \notin I}{x_j}) \text{ for some } I \subset \{1,\cdots,n\}\}$ where $\mathbb{1}$ is the ...

user6671

asked Jul 18, 2016 at 10:12

0 votes

0 answers

147 views

choosing regularization constant in compressive sensing

Given a compressive sensing formulation, $$\left\| {Ax - b} \right\|_2^2 + \mu {\left\| x \right\|_1}$$ And given curves (a) $\left\| {Ax - b} \right\|_2^2$ plotted against $\log \left( \mu \...

user40780

asked Jul 5, 2016 at 15:14

15 30 50 per page

Prev 1

…

3 4

6 7 Next

Questions tagged [learning-theory]

Related Tags