Skip to main content

Questions tagged [machine-learning]

The tag has no usage guidance.

3 votes
0 answers
164 views

What is the meaning of big-O of a random variable?

I encountered this problem in a book "Pattern Recognition and Machine Learning" by Christopher M. Bishop. I excerpt it below: screenshot of the book In the excerpt, the big-O notation $O(\xi^...
zzzhhh's user avatar
  • 31
2 votes
0 answers
115 views

Training an energy-based model (EBM) using MCMC

I'm reading this paper about training energy-based models (EBMs) and don't understand the parameters that we are training for? The part that is relevant to the question is in pages 1-4. Here is the ...
Garfield's user avatar
  • 201
2 votes
0 answers
86 views

Nuclear norm minimization of convolution matrix (circular matrix) with fast Fourier transform

I am reading a paper Recovery of Future Data via Convolution Nuclear Norm Minimization. Here, I know there is a definition for convolution matrix. Given any vector $\boldsymbol{x}=(x_1,x_2,\ldots,x_n)^...
Xinyu Chen's user avatar
1 vote
0 answers
116 views

Distribution-free learning vs distribution-dependent learning

I came across some papers studying the problem of distribution-free learning, and I am interested in knowing the exact definition of distribution-free learning. I have searched some literature: In ...
yinan's user avatar
  • 11
4 votes
0 answers
120 views

Progress on "Un-Alching" ML?

So, a couple of years ago I watched both Ali Rahimi's NIPS speech "Machine Learning is Alchemy", (where he talks about how the field lacks a solid, overarching, theoretical foundation) and ...
dicaes's user avatar
  • 41
2 votes
0 answers
44 views

Combining SVD subspaces for low dimensional representations

Suppose we have matrix $A$ of size $N_t \times N_m$, containing $N_m$ measurements corrupted by some (e.g. Gaussian) noise. An SVD of this data $A = U_AS_A{V_A}^T$ can reveal the singular vectors $U_A$...
user2600239's user avatar
1 vote
0 answers
106 views

Can I minimize a mysterious function by running a gradient decent on her neural net approximations? [closed]

A cross post from on AI StackExchange. So I have this function let call her $F:[0,1]^n \rightarrow \mathbb{R}$ and say $10 \le n \le 100$. I want to find some $x_0 \in [0,1]^n$ such that $F(x_0)$ is ...
Vladimir Zolotov's user avatar
1 vote
0 answers
56 views

How to calculate the unifrom entropy or VC dimension of the following class of functions?

When dealing with U process I meet with such a uniform entropy to calculate. For any $\eta>0$, function class $\mathcal{F}$ containing functions $f=\left(f_{i, j}\right)_{1 \leq i \neq j \leq n}: \...
leslie zhang's user avatar
3 votes
1 answer
239 views

Independent input feature z can be removed: if y=f(x+z,z), then y=g(x)?

Let $y\in \mathbb{R}$ and $\mathbf{x},\mathbf{z}\in\mathbb{R}^p$ be random variable and random vectors. Assume $y=f(\mathbf{x}+\mathbf{z},\mathbf{z})$ for some function $f$. Is the following statement ...
John's user avatar
  • 193
1 vote
0 answers
60 views

Sample Complexity/PAC-Learning Notation

In PAC Learning, Sample Complexity is defined as: The function $m_\mathcal{H} : (0,1)^2 \rightarrow \mathbb{N}$ determines the sample complexity of learning $\mathcal{H}$: that is, how many examples ...
user490208's user avatar
1 vote
0 answers
177 views

Stochastic Gradient Descent

In this question, I am not really sure how to approach this question as I am a beginner in optimisation Consider the function $f : B_1 → R$ with $f(x) = \left\lVert x \right\rVert_2^2$ and $B_1$ := {$...
jzcici's user avatar
  • 11
5 votes
2 answers
325 views

Entropy & difference between max and min values of probability mass

Let $X$ be a random variable with probability mass function $p(x) = \mathbb{P}[X = x]$. I know entropy $H(X)$ of $X$ measures the uncertainty of $X$ and a large value of $H(X)$ means $p(x)$ is nearly ...
aest's user avatar
  • 153
1 vote
1 answer
221 views

Using Hoeffding inequality for risk / loss function

I've got a question to the Hoeffding Inequality which states, that for data points $X_1, \dots, X_n \in X$, which are i.i.d. according to a probability measure $P$ on $X$, we find an upper bound for: $...
Mathematiger's user avatar
20 votes
3 answers
3k views

How can Machine Learning help “see” in higher dimensions?

The news that DeepMind had helped mathematicians in research (one in representation theory, and one in knot theory) certainly got many thinking, what other projects could AI help us with? See MO ...
liuyao's user avatar
  • 485
2 votes
0 answers
264 views

Covering/Bracketing number of monotone functions on $\mathbb{R}$ with uniformly bounded derivatives

I am interested in the $\| \cdot \|_{\infty}$-norm bracketing number or covering number of some collection of distribution functions on $\mathbb{R}$. Let $\mathcal{F}$ consist of all distribution ...
masala's user avatar
  • 93

15 30 50 per page
1 2 3
4
5
13