Skip to main content

Questions tagged [learning-theory]

This tag is used for questions that are related with following branches: Statistical learning theory, Machine learning, Vapnik–Chervonenkis theory (VC theory) and all other branches that are studied and applied in the area of learning theory that involves various kinds of mathematics.

4 votes
1 answer
387 views

Minimize the variance of a Boltzmann distribution

N.B.: Sorry for cross-posting from https://stats.stackexchange.com/posts/347804/edit (I realized it was the wrong venue for the question, but couldn't find an easy way to transfer the question here). ...
dohmatob's user avatar
  • 6,824
3 votes
1 answer
271 views

Concentration inequalities specialized for log-likelihood / log-density functions

Let $P$ be a probability measure and $f$ be some probability density function (not necessarily related to $P$). Consider the function $$ L(X_1,\ldots,X_n) =\frac1n\sum_{i=1}^n\log f(X_i), \quad X_i\...
JohnA's user avatar
  • 700
38 votes
4 answers
3k views

Is there research on human-oriented theorem proving?

I know there is already a research community that is working on automatic theorem proving mostly using logic (and things like Coq and ACL2). However, I came across a lecture from a fields medalist W.T....
Charlie Parker's user avatar
2 votes
0 answers
436 views

Relation between pseudo-dimension and Rademacher complexity

With techniques of Dudley's entropy bound and Haussler's upper bound one can show that there exists a constant $C$ such that any class of $\{0,1\}$ indicator functions with Vapnik-Chervonenkis ...
axk's user avatar
  • 517
3 votes
1 answer
209 views

Is this generalization bound proof wrong?

This is an ICML02 paper by Garg, Har-Peled & Roth: http://sarielhp.org/p/01/bounds/bounds.pdf The equation after eq. (3) is the well-known symmetrization trick for $\sup_{h\in {\mathcal H}} |E(h)-...
axk's user avatar
  • 517
1 vote
1 answer
82 views

Clarification on margin bound uniform w.r.t. the margin parameter

Theorem 4.5. in the book "Foundations of Machine Learning" by Mohri et al: http://prlab.tudelft.nl/sites/default/files/Foundations_of_Machine_Learning.pdf derives a generalization bound to hold ...
axk's user avatar
  • 517
2 votes
1 answer
1k views

Rademacher complexity of composition of functions

I am looking for a bound on the empirical Rademacher complexity of the following class: $G=\left\{x \rightarrow \frac{h^T f(x)}{\|h\|_2 \cdot \|f(x)\|_2} : h\in R^d, f()=(f_1(),\ldots,f_d()), f_j \in ...
axk's user avatar
  • 517
2 votes
1 answer
621 views

Extension of Talagrand contraction lemma (on empirical Rademacher complexity)

Is the following true? Let $(x_1,...,x_N)$ be a set of points on the unit sphere $S^{d-1}$. Let $\ell_x: [-1,1]\rightarrow [0,1]$ be a family of Lipschitz functions indexed by $x\in S^{d-1}$, with ...
axk's user avatar
  • 517
5 votes
1 answer
978 views

VC dimension of axis-parallel boxes on the torus

First the basic definitions: Let $H$ be a family of sets, and let $P$ be a set of points. Then $H$ is said to shatter $P$ if $\{ h \cap P:~h \in H\}=2^P$, that is, if every subset of $P$ can be ...
Kurisuto Asutora's user avatar
1 vote
0 answers
94 views

Approximating or calculating the mutual information of certain binary random vectors

In my studies of applied probability I have recently met the following problem on which I need help: We consider two binary random (column) vectors $ X,Y \in \{0,1\}^d $ where the mutual ...
groupoid's user avatar
  • 620
1 vote
0 answers
72 views

Determining when specific gradient descent converges to singular or critical points

In my research on neural networks and learning theory I have recently come across the following problem dealing with gradient descent: We consider a given column vector $ x=[x_1,x_2,...,x_{d}]^T \...
groupoid's user avatar
  • 620
1 vote
1 answer
255 views

VC dimension of cone-restricted linear classifiers

Let $\mathcal{C}$ be a pointed, salient cone in $\mathbb{R}^d$. We may also assume that $\mathcal{C}$ is full-dimensional. Consider the set of binary classifiers $$\mathcal{H} = \{\boldsymbol{x}\...
S.B.'s user avatar
  • 215
8 votes
0 answers
685 views

The function space defined by deep neural nets

Given a deep net graph and the activation functions on the hidden vertices do we have a description of the function space spanned by it? (even if for some specific architectures and activation ...
gradstudent's user avatar
  • 2,186
1 vote
1 answer
357 views

What is the shatter coefficient / VC - dimension of some hypothesis set?

Let $H:=\{h:\mathbb{N}_0^n \rightarrow \{0,1\}| h(x_1,\cdots,x_n) = \mathbb{1}_0(\sum_{i\in I}{x_i}-\sum_{j \notin I}{x_j}) \text{ for some } I \subset \{1,\cdots,n\}\}$ where $\mathbb{1}$ is the ...
user avatar
0 votes
0 answers
147 views

choosing regularization constant in compressive sensing

Given a compressive sensing formulation, $$\left\| {Ax - b} \right\|_2^2 + \mu {\left\| x \right\|_1}$$ And given curves (a) $\left\| {Ax - b} \right\|_2^2$ plotted against $\log \left( \mu \...
user40780's user avatar
  • 867

15 30 50 per page
1
3 4
5
6 7