All Questions
Tagged with machine-learning pr.probability
36
questions
2
votes
0
answers
158
views
How to choose N policemen positions to catch a drunk driver in the most effective way (on a Cayley graph of a finite group)?
Consider a Cayley graph of some big finite group. Consider random walk on such a graph - think of it as drunk driver. Fix some number $N$ which is much smaller than group size.
Question 1: How to ...
1
vote
0
answers
36
views
Interpolation in convex hull
I'm reading a paper, Learning in High Dimension Always Amounts to Extrapolation, that provides a result I don't understand.
It provides this theorem which I do understand:
Theorem 1: (Bárány and ...
1
vote
1
answer
76
views
Bayes classifiers with cost of misclassification
A minimum ECM classifier disciminate the features $\underline{x}$ to belong to class $t$ ($\delta(\underline{x}) = t$) if $\forall j \ne t$:
$$\sum_{k\ne t} c(t|k) f_k(\underline{x})p_k \le \sum_{k\ne ...
2
votes
1
answer
119
views
How to lower bound the absolute value of the difference of two Kullback-Leibler divergences given the constrains below?
Given that $\min (D[p_1||p_3],D[p_2||p_4])=a$, how to find a lower bound of the difference $\left \vert D[p_1\parallel p_2]-D[p_3\parallel p_4] \right\vert$ with respect to $a$? If the condition is ...
6
votes
0
answers
181
views
What are compact manifolds such that GROWTH (of spheres volumes) is well approximated by the Gaussian normal distribution?
Consider some compact Riemannian manifold $M$. Fix some point $p$.
Consider a "sub-sphere of radius $r$" - i.e. set of points on distance $r$ from $p$.
Consider growth function $g(r)$ to be ...
0
votes
0
answers
85
views
Some new questions on Rademacher complexity
For $A\subset R^n$,$A=(a_1,a_2,\dots, a_n)$, $\sigma_i$ are Rademacher random variable.
Is $|\mathbb{E}_\sigma \inf_{a\in A}\sum_{i=1}^n\sigma_ia_i| \le |\mathbb{E}_\sigma \sup_{a\in A}\sum_{i=1}^n\...
3
votes
1
answer
100
views
When does the optimal model exist in learning theory?
In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
2
votes
1
answer
110
views
expectation of the product of Gaussian kernels and their input
I was wondering if anybody knows how to solve: $$\mathbb{E}{\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})}\left[ (\mathbf{x}{i} - \mathbf{z})(\mathbf{x}{j} - \mathbf{z})^\top \exp\left( - (\...
4
votes
0
answers
140
views
Known relations between mutual information and covering number?
This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
1
vote
2
answers
216
views
Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class
Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
3
votes
1
answer
156
views
Why is the logistic regression model good? (and its relation with maximizing entropy)
Suppose we're trying to train a classifier $\pi$ for $k$ classes that takes as input a feature vector $x\in\mathbb{R}^n$ and outputs a probability vector $\pi(x)\in\mathbb{R}^k$ such that $\sum_{v=1}^...
2
votes
0
answers
109
views
Equivalence of score function expressions in SDE-based generative modeling
I am studying the paper "Score-Based Generative Modeling through Stochastic Differential Equations" (arXiv:2011.13456) by Yang et al. The authors use the following loss function (Equation 7 ...
5
votes
1
answer
1k
views
Mathematics research relating to machine learning
What branch/branches of math are most relevant in enhancing machine learning (mostly in terms of practical use as opposed to theoretical/possible use)? Specifically, I want to know about math research ...
3
votes
0
answers
164
views
What is the meaning of big-O of a random variable?
I encountered this problem in a book "Pattern Recognition and Machine Learning" by Christopher M. Bishop. I excerpt it below:
screenshot of the book
In the excerpt, the big-O notation $O(\xi^...
1
vote
0
answers
56
views
How to calculate the unifrom entropy or VC dimension of the following class of functions?
When dealing with U process I meet with such a uniform entropy to calculate.
For any $\eta>0$, function class $\mathcal{F}$ containing functions $f=\left(f_{i, j}\right)_{1 \leq i \neq j \leq n}: \...