Questions tagged [learning-theory]
This tag is used for questions that are related with following branches: Statistical learning theory, Machine learning, Vapnik–Chervonenkis theory (VC theory) and all other branches that are studied and applied in the area of learning theory that involves various kinds of mathematics.
100
questions
4
votes
1
answer
387
views
Minimize the variance of a Boltzmann distribution
N.B.: Sorry for cross-posting from https://stats.stackexchange.com/posts/347804/edit (I realized it was the wrong venue for the question, but couldn't find an easy way to transfer the question here).
...
3
votes
1
answer
271
views
Concentration inequalities specialized for log-likelihood / log-density functions
Let $P$ be a probability measure and $f$ be some probability density function (not necessarily related to $P$). Consider the function
$$
L(X_1,\ldots,X_n)
=\frac1n\sum_{i=1}^n\log f(X_i),
\quad
X_i\...
38
votes
4
answers
3k
views
Is there research on human-oriented theorem proving?
I know there is already a research community that is working on automatic theorem proving mostly using logic (and things like Coq and ACL2). However, I came across a lecture from a fields medalist W.T....
2
votes
0
answers
436
views
Relation between pseudo-dimension and Rademacher complexity
With techniques of Dudley's entropy bound and Haussler's upper bound one can show that there exists a constant $C$ such that any class of $\{0,1\}$ indicator functions with Vapnik-Chervonenkis ...
3
votes
1
answer
209
views
Is this generalization bound proof wrong?
This is an ICML02 paper by Garg, Har-Peled & Roth:
http://sarielhp.org/p/01/bounds/bounds.pdf
The equation after eq. (3) is the well-known symmetrization trick for $\sup_{h\in {\mathcal H}} |E(h)-...
1
vote
1
answer
82
views
Clarification on margin bound uniform w.r.t. the margin parameter
Theorem 4.5. in the book "Foundations of Machine Learning" by Mohri et al:
http://prlab.tudelft.nl/sites/default/files/Foundations_of_Machine_Learning.pdf
derives a generalization bound to hold ...
2
votes
1
answer
1k
views
Rademacher complexity of composition of functions
I am looking for a bound on the empirical Rademacher complexity of the following class:
$G=\left\{x \rightarrow \frac{h^T f(x)}{\|h\|_2 \cdot \|f(x)\|_2} : h\in R^d, f()=(f_1(),\ldots,f_d()), f_j \in ...
2
votes
1
answer
621
views
Extension of Talagrand contraction lemma (on empirical Rademacher complexity)
Is the following true?
Let $(x_1,...,x_N)$ be a set of points on the unit sphere $S^{d-1}$.
Let $\ell_x: [-1,1]\rightarrow [0,1]$ be a family of Lipschitz functions indexed by $x\in S^{d-1}$, with ...
5
votes
1
answer
978
views
VC dimension of axis-parallel boxes on the torus
First the basic definitions: Let $H$ be a family of sets, and let $P$ be a set of points. Then $H$ is said to shatter $P$ if $\{ h \cap P:~h \in H\}=2^P$, that is, if every subset of $P$ can be ...
1
vote
0
answers
94
views
Approximating or calculating the mutual information of certain binary random vectors
In my studies of applied probability I have recently met the following problem on which I need help:
We consider two binary random (column) vectors $ X,Y \in \{0,1\}^d $ where the mutual ...
1
vote
0
answers
72
views
Determining when specific gradient descent converges to singular or critical points
In my research on neural networks and learning theory I have recently come across the following problem dealing with gradient descent:
We consider a given column vector $ x=[x_1,x_2,...,x_{d}]^T \...
1
vote
1
answer
255
views
VC dimension of cone-restricted linear classifiers
Let $\mathcal{C}$ be a pointed, salient cone in $\mathbb{R}^d$. We may also assume that $\mathcal{C}$ is full-dimensional. Consider the set of binary classifiers $$\mathcal{H} = \{\boldsymbol{x}\...
8
votes
0
answers
685
views
The function space defined by deep neural nets
Given a deep net graph and the activation functions on the hidden vertices do we have a description of the function space spanned by it? (even if for some specific architectures and activation ...
1
vote
1
answer
357
views
What is the shatter coefficient / VC - dimension of some hypothesis set?
Let $H:=\{h:\mathbb{N}_0^n \rightarrow \{0,1\}| h(x_1,\cdots,x_n) = \mathbb{1}_0(\sum_{i\in I}{x_i}-\sum_{j \notin I}{x_j}) \text{ for some } I \subset \{1,\cdots,n\}\}$
where $\mathbb{1}$ is the ...
0
votes
0
answers
147
views
choosing regularization constant in compressive sensing
Given a compressive sensing formulation,
$$\left\| {Ax - b} \right\|_2^2 + \mu {\left\| x \right\|_1}$$
And given curves
(a) $\left\| {Ax - b} \right\|_2^2$ plotted against $\log \left( \mu \...