Skip to main content

All Questions

2 votes
0 answers
158 views

How to choose N policemen positions to catch a drunk driver in the most effective way (on a Cayley graph of a finite group)?

Consider a Cayley graph of some big finite group. Consider random walk on such a graph - think of it as drunk driver. Fix some number $N$ which is much smaller than group size. Question 1: How to ...
Alexander Chervov's user avatar
1 vote
0 answers
36 views

Interpolation in convex hull

I'm reading a paper, Learning in High Dimension Always Amounts to Extrapolation, that provides a result I don't understand. It provides this theorem which I do understand: Theorem 1: (Bárány and ...
Christopher D'Arcy's user avatar
1 vote
1 answer
76 views

Bayes classifiers with cost of misclassification

A minimum ECM classifier disciminate the features $\underline{x}$ to belong to class $t$ ($\delta(\underline{x}) = t$) if $\forall j \ne t$: $$\sum_{k\ne t} c(t|k) f_k(\underline{x})p_k \le \sum_{k\ne ...
BiasedBayes's user avatar
2 votes
1 answer
119 views

How to lower bound the absolute value of the difference of two Kullback-Leibler divergences given the constrains below?

Given that $\min (D[p_1||p_3],D[p_2||p_4])=a$, how to find a lower bound of the difference $\left \vert D[p_1\parallel p_2]-D[p_3\parallel p_4] \right\vert$ with respect to $a$? If the condition is ...
Richard Ben's user avatar
6 votes
0 answers
181 views

What are compact manifolds such that GROWTH (of spheres volumes) is well approximated by the Gaussian normal distribution?

Consider some compact Riemannian manifold $M$. Fix some point $p$. Consider a "sub-sphere of radius $r$" - i.e. set of points on distance $r$ from $p$. Consider growth function $g(r)$ to be ...
Alexander Chervov's user avatar
0 votes
0 answers
85 views

Some new questions on Rademacher complexity

For $A\subset R^n$,$A=(a_1,a_2,\dots, a_n)$, $\sigma_i$ are Rademacher random variable. Is $|\mathbb{E}_\sigma \inf_{a\in A}\sum_{i=1}^n\sigma_ia_i| \le |\mathbb{E}_\sigma \sup_{a\in A}\sum_{i=1}^n\...
Hao Yu's user avatar
  • 185
3 votes
1 answer
100 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...
rick's user avatar
  • 179
2 votes
1 answer
110 views

expectation of the product of Gaussian kernels and their input

I was wondering if anybody knows how to solve: $$\mathbb{E}{\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})}\left[ (\mathbf{x}{i} - \mathbf{z})(\mathbf{x}{j} - \mathbf{z})^\top \exp\left( - (\...
wsz_fantasy's user avatar
4 votes
0 answers
140 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...
Tanishq Kumar's user avatar
1 vote
2 answers
216 views

Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...
dohmatob's user avatar
  • 6,824
3 votes
1 answer
156 views

Why is the logistic regression model good? (and its relation with maximizing entropy)

Suppose we're trying to train a classifier $\pi$ for $k$ classes that takes as input a feature vector $x\in\mathbb{R}^n$ and outputs a probability vector $\pi(x)\in\mathbb{R}^k$ such that $\sum_{v=1}^...
stupid_question_bot's user avatar
2 votes
0 answers
109 views

Equivalence of score function expressions in SDE-based generative modeling

I am studying the paper "Score-Based Generative Modeling through Stochastic Differential Equations" (arXiv:2011.13456) by Yang et al. The authors use the following loss function (Equation 7 ...
Po-Hung Yeh's user avatar
5 votes
1 answer
1k views

Mathematics research relating to machine learning

What branch/branches of math are most relevant in enhancing machine learning (mostly in terms of practical use as opposed to theoretical/possible use)? Specifically, I want to know about math research ...
Artus's user avatar
  • 173
3 votes
0 answers
164 views

What is the meaning of big-O of a random variable?

I encountered this problem in a book "Pattern Recognition and Machine Learning" by Christopher M. Bishop. I excerpt it below: screenshot of the book In the excerpt, the big-O notation $O(\xi^...
zzzhhh's user avatar
  • 31
1 vote
0 answers
56 views

How to calculate the unifrom entropy or VC dimension of the following class of functions?

When dealing with U process I meet with such a uniform entropy to calculate. For any $\eta>0$, function class $\mathcal{F}$ containing functions $f=\left(f_{i, j}\right)_{1 \leq i \neq j \leq n}: \...
leslie zhang's user avatar

15 30 50 per page