Newest 'machine-learning' Questions

2 votes

0 answers

127 views

How good is approximation of distance function on the Cayley graph by "Fourier" basis coming from the irreducible representations?

Consider finite group $G$ , symmetric set of its elements $S$, construct a Cayley graph. Consider $d(g)$ - word metric or distance on the Cayley graph from identity to $g$. As any function on a group ...

Alexander Chervov

24.4k

asked Jun 7 at 12:19

1 vote

0 answers

33 views

From a constraint satisfaction problem (CSP) to a sudoku grid [closed]

one of the existing methods of solvin a sudoku grid is via constraints satisfaction (CSP), but can we do the inverse ie convert a CSP problem into a sudoku grid and then solve it ?

youssef Lmourabite

19

asked Jun 4 at 20:52

2 votes

0 answers

158 views

How to choose N policemen positions to catch a drunk driver in the most effective way (on a Cayley graph of a finite group)?

Consider a Cayley graph of some big finite group. Consider random walk on such a graph - think of it as drunk driver. Fix some number $N$ which is much smaller than group size. Question 1: How to ...

Alexander Chervov

24.4k

asked Jun 1 at 21:04

2 votes

0 answers

189 views

Should a neural network architecture change after a pass in gradient descent?

I'm trying to understand neural networks formally a little better and it was always my understanding that the miracle that happens during backpropagation is that "performing a pass in gradient ...

Louis

81

asked May 18 at 22:41

1 vote

0 answers

36 views

Interpolation in convex hull

I'm reading a paper, Learning in High Dimension Always Amounts to Extrapolation, that provides a result I don't understand. It provides this theorem which I do understand: Theorem 1: (Bárány and ...

Christopher D'Arcy

11

asked May 14 at 18:44

1 vote

1 answer

76 views

Bayes classifiers with cost of misclassification

A minimum ECM classifier disciminate the features $\underline{x}$ to belong to class $t$ ($\delta(\underline{x}) = t$) if $\forall j \ne t$: $$\sum_{k\ne t} c(t|k) f_k(\underline{x})p_k \le \sum_{k\ne ...

BiasedBayes

11

asked May 1 at 14:10

2 votes

1 answer

119 views

How to lower bound the absolute value of the difference of two Kullback-Leibler divergences given the constrains below?

Given that $\min (D[p_1||p_3],D[p_2||p_4])=a$, how to find a lower bound of the difference $\left \vert D[p_1\parallel p_2]-D[p_3\parallel p_4] \right\vert$ with respect to $a$? If the condition is ...

Richard Ben

31

asked Apr 18 at 15:06

3 votes

0 answers

109 views

Short path problem on Cayley graphs as language translation task (from "Permutlandski" to "Cayleylandski"(s) :). Reference/suggestion request

Context: Algorithms to find short paths on Cayley graphs of (finite) groups are of some interest - see below. There can be several approaches to that task. One of ideas coming to my mind - in some ...

Alexander Chervov

24.4k

asked Apr 18 at 12:10

4 votes

1 answer

357 views

Difference between deep neural networks and expectation maximization algorithm

Having had a short encounter with deep neural networks, it seems to boil down to the task of determining the values of a vast amount of parameters. The expectation maximization algorithm, of which I ...

Manfred Weis

12.8k

asked Apr 13 at 13:54

6 votes

0 answers

181 views

What are compact manifolds such that GROWTH (of spheres volumes) is well approximated by the Gaussian normal distribution?

Consider some compact Riemannian manifold $M$. Fix some point $p$. Consider a "sub-sphere of radius $r$" - i.e. set of points on distance $r$ from $p$. Consider growth function $g(r)$ to be ...

Alexander Chervov

24.4k

asked Mar 29 at 18:04

5 votes

0 answers

118 views

Does the permutohedron satisfy any minimal distortion property for graph metric vs Euclidean distance?

We can look on the permutohedron as a kind of "embedding" of the Cayley graph of $S_n$ to the Euclidean space. (That Cayley graph is constructed by the standard generators, i.e. ...

Alexander Chervov

24.4k

asked Mar 26 at 12:45

3 votes

0 answers

87 views

What Cayley graphs arise as nodes+edges from "nice" polytopes and when are these polytopes convex?

The Permutohedron is a remarkable convex polytope in $R^n$, such that its nodes are indexed by permutations and edges correspond to the Cayley graph of $S_n$ with respect to the standard generators, i....

Alexander Chervov

24.4k

asked Mar 26 at 8:18

2 votes

0 answers

29 views

Continuity of Kernel Mean Embeddings

Given some kernel $k: X \times X \to \mathbb{R}$ with RKHS $H_k$ we say that $k$ is characteristic on the space of signed Radon measures over $X$, denoted by $\mathcal{M}(X)$, if the kernel mean ...

Gaspar

91

asked Mar 13 at 17:34

1 vote

0 answers

28 views

A network to transform/predict one probability distribution to another

I have a random variable of a particular density (e.g., normal), and a known probability distribution (e.g., mixture Gaussian). I used a simple KL measure to predict/transform one another. Now I need ...

user524691

11

asked Mar 13 at 17:23

0 votes

0 answers

85 views

Some new questions on Rademacher complexity

For $A\subset R^n$,$A=(a_1,a_2,\dots, a_n)$, $\sigma_i$ are Rademacher random variable. Is $|\mathbb{E}_\sigma \inf_{a\in A}\sum_{i=1}^n\sigma_ia_i| \le |\mathbb{E}_\sigma \sup_{a\in A}\sum_{i=1}^n\...

Hao Yu

185

asked Feb 28 at 8:36

12 votes

2 answers

2k views

Soft question: Deep learning and higher categories

Recently, I have stumbled upon certain articles and lecture videos that use category theory to explain certain aspects of machine learning or deep learning (e.g. Cats for AI and the paper An enriched ...

h3fr43nd

231

asked Feb 22 at 9:04

52 votes

1 answer

8k views

What mathematical problems can be attacked using DeepMind's recent mathematical breakthroughs?

I am a research mathematician at a university in the United States. My training is in pure mathematics (geometry). However, for the past couple of months, I have been supervising some computer science ...

Ryan Hendricks

755

asked Feb 11 at 21:23

1 vote

0 answers

161 views

Locally "unshortable" paths in graphs

Setup: Consider a connected graph G, with diameter "d". Informally: Trivially (by definition of diameter), taking any path $P$ any nodes $P(i) , P(i+k)$ for $k>d$ can be connected by a ...

Alexander Chervov

24.4k

asked Jan 28 at 10:10

22 votes

4 answers

2k views

Open problems which might benefit from computational experiments

Question: I wonder what are the open problems , where computational experiments might me helpful? (Setting some bounds, excluding some cases, shaping some expectations ). Grant program: The context of ...

Community wiki

Alexander Chervov

3 votes

1 answer

100 views

When does the optimal model exist in learning theory?

In the context of learning theory, we usually have: data $(x,y)\sim P(x,y)$, with $x\in\mathcal{X}\subseteq\mathbb{R}^d$ and $y\in\mathcal{Y}\subseteq\mathbb{R}^k$, a hypothesis class $\mathcal{F}\...

rick

179

asked Jan 23 at 22:07

9 votes

1 answer

785 views

Sufficient condition for linear separability of a boolean function on $n$ variables

This is a cross-post of two recent questions at math.stackexchange without answers: Q1 and Q2. A boolean function on an $n$-dimensional hypercube is linearly separable when the convex hulls of the ...

Fabius Wiesner

814

asked Jan 21 at 18:47

2 votes

1 answer

110 views

expectation of the product of Gaussian kernels and their input

I was wondering if anybody knows how to solve: $$\mathbb{E}{\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})}\left[ (\mathbf{x}{i} - \mathbf{z})(\mathbf{x}{j} - \mathbf{z})^\top \exp\left( - (\...

wsz_fantasy

245

asked Jan 16 at 3:09

7 votes

2 answers

448 views

Upper bound on VC-dimension of partitioned class

Fix $n,k\in \mathbb{N}_+$. Let $\mathcal{H}$ be a set of functions from $\mathbb{R}^n$ to $\mathbb{R}$ with finite VC-dimension $d\in \mathbb{N}$. Let $\mathcal{H}_k$ denote the set of maps of the ...

Math_Newbie

362

asked Jan 2 at 22:12

4 votes

0 answers

140 views

Known relations between mutual information and covering number?

This is a question about statistical learning theory. Consider a hypothesis class $\mathcal{F}$, parameterized by real vectors $w \in \mathbb{R}^p$. Suppose I have a data distribution $D \sim \mu$ and ...

Tanishq Kumar

141

asked Dec 31, 2023 at 7:45

2 votes

1 answer

57 views

Non-linear transforms of RKHS question

I was reading the paper Norm Inequalities in Nonlinear Transforms (referenced in this question) but ran into difficulties, so I was wondering if anyone could help? I think I follow the paper until I ...

Mat

41

asked Dec 15, 2023 at 12:54

56 votes

10 answers

8k views

A clear map of mathematical approaches to Artificial Intelligence

I have recently become interested in Machine Learning and AI as a student of theoretical physics and mathematics, and have gone through some of the recommended resources dealing with statistical ...

Community wiki

AI Bert

1 vote

0 answers

79 views

Approximation of continuous function by multilayer Relu neural network

For continuous/holder function $f$ defined on a compact set K， a fix $L$ and $m_1,m_2,\dots,m_L$, can we find a multilayer Relu fully connected network g with depth $L$ and each $i$-th layer has width ...

Hao Yu

781

asked Nov 22, 2023 at 15:31

1 vote

2 answers

216 views

Beating the $1/\sqrt n$ rate of uniform-convergence over a linear function class

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in ...

dohmatob

6,824

asked Oct 13, 2023 at 10:09

1 vote

0 answers

127 views

Matrix valued word embeddings for natural language processing

In natural language processing, an area of machine learning, one would like to represent words as objects that can easily be understood and manipulated using machine learning. A word embedding is a ...

Joseph Van Name

28.1k

asked Sep 23, 2023 at 21:34

3 votes

1 answer

156 views

Why is the logistic regression model good? (and its relation with maximizing entropy)

Suppose we're trying to train a classifier $\pi$ for $k$ classes that takes as input a feature vector $x\in\mathbb{R}^n$ and outputs a probability vector $\pi(x)\in\mathbb{R}^k$ such that $\sum_{v=1}^...

stupid_question_bot

7,313

asked Sep 15, 2023 at 0:17

9 votes

1 answer

313 views

Who introduced the term hyperparameter?

I am trying to find the earliest use of the term hyperparameter. Currently, it is used in machine learning but it must have had earlier uses in statistics or optimization theory. Even the multivolume ...

ACR

790

asked Aug 13, 2023 at 18:14

2 votes

0 answers

109 views

Equivalence of score function expressions in SDE-based generative modeling

I am studying the paper "Score-Based Generative Modeling through Stochastic Differential Equations" (arXiv:2011.13456) by Yang et al. The authors use the following loss function (Equation 7 ...

Po-Hung Yeh

43

asked Aug 1, 2023 at 13:30

8 votes

1 answer

546 views

Geometric formulation of the subject of machine learning

Question: what is the geometric interpretation of the subject of machine learning and/or deep learning? Being "forced" to have a closer look at the subject, I have the impression that it ...

Manfred Weis

12.8k

asked Jul 16, 2023 at 9:40

1 vote

0 answers

98 views

Problems Correction of "Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning "' [closed]

Where I can find the problems correction of this book " Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning "

zdo0x0

11

asked Jun 13, 2023 at 16:52

3 votes

0 answers

61 views

Prove the convergence of the LASSO model in the presence of limited eigenvalues

I am researching the properties of the Lasso model $\hat \beta:= \operatorname{argmin} \{\|Y-X\beta\|_2^2/n+\lambda\|\beta\|_1\}$, specifically its convergence when the data satisfies restricted ...

GGbond

39

asked Jun 10, 2023 at 14:33

11 votes

0 answers

165 views

Worst margin when halving a hypercube with a hyperplane

Consider the $n$-cube $C_n=\lbrace-1,1\rbrace^n$ and the problem of partitioning it into halves with hyperplanes through the origin that avoid all its points. We can parameterize the hyperplanes by ...

Veit Elser

1,065

asked Jun 8, 2023 at 23:43

1 vote

0 answers

69 views

Curve fitting with "rough" loss functions

Many real-valued classification and regression problems can be framed as minimization in the following way. Setup: Let $\Theta \in \mathbb{R}^p$ be the parameter space that we are searching over. For ...

Simon Kuang

190

asked May 13, 2023 at 20:18

5 votes

1 answer

1k views

Mathematics research relating to machine learning

What branch/branches of math are most relevant in enhancing machine learning (mostly in terms of practical use as opposed to theoretical/possible use)? Specifically, I want to know about math research ...

Artus

173

asked Apr 28, 2023 at 18:32

1 vote

1 answer

121 views

Adjoint sensitivity analysis for a cost functional under an ODE constraint

I am trying to recover the result given by equation 10 in the article here. I am unable to get rid of the integral, any help would be much appreciated. To keep the description as self contained as ...

Abhi. A

55

asked Mar 23, 2023 at 17:04

2 votes

1 answer

60 views

Convergence of minimiser of empirical risk to minimiser of population risk

Let $X_1, \dots, X_n \sim \mu$ be some random elements of a space $\mathcal{X}$. Let $H$ be a Hilbert space of functions $f: S \to \Re$ with norm $\|\cdot\|_H$. Let $\|f^*\|_{L_2(\mu)} < \infty$ ...

user27182

327

asked Mar 18, 2023 at 12:20

2 votes

0 answers

42 views

can we get a family of classifiers $\left\{f_n\right\}_{n \in N}$such that $\lim_{n->∞} (E_{(X_1, Y_1), ...,(X_n, Y_n) \sim \rho}[R(f_n)]-R(f_B))=0 $

For a given classifier $f: \mathbb{R}^d \mapsto\{0,1,2\}$, let $$ R(f):=\mathbb{E}_{(X, Y) \sim \rho}\left[\mathbb{1}_{f(X) \neq Y}\right] $$ $f_B$ the Bayes classifier. can we get a family of ...

fantacy_crs

51

asked Mar 2, 2023 at 3:19

3 votes

0 answers

57 views

How to prove emprical risk converges to expectation risk as $n\to \infty$?

For example, for a classical binary classification: $x \in \mathbb{R}^d$ and $y \in\{0,1\}$ let empirical risk be $R_{\ell}^n(f):=\frac{1}{n} \sum_{i=1}^n \ell\left(f\left(X_i\right), Y_i\right)$ and ...

fantacy_crs

51

asked Mar 2, 2023 at 2:29

2 votes

1 answer

84 views

VC-based risk bounds for classifiers on finite set

Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...

Math_Newbie

362

asked Feb 15, 2023 at 21:46

4 votes

1 answer

333 views

Perceptron / logistic regression accuracy on the n-bit parity problem

$\DeclareMathOperator{\sgn}{sign}$The perceptron (similarly, logistic regression) of the form $y=\sgn(w^T \cdot x+b)$ is famously known for its inability to solve the XOR problem, meaning it can get ...

ido4848

141

asked Jan 29, 2023 at 21:55

1 vote

0 answers

32 views

Convergent gradient-type scheme for solving smooth nonconvex constrained optimization problem

Let $x_1,\ldots,x_n \in \mathbb R^d$ and $y_1,\ldots,y_n \in \{\pm 1\}$, and $\epsilon, h \gt 0$. Define $\theta(t) := Q((t-\epsilon)/h)$, where $Q(z) := \int_{z}^\infty \phi (z)\mathrm{d}z$ is the ...

dohmatob

6,824

asked Jan 4, 2023 at 11:07

3 votes

0 answers

164 views

What is the meaning of big-O of a random variable?

I encountered this problem in a book "Pattern Recognition and Machine Learning" by Christopher M. Bishop. I excerpt it below: screenshot of the book In the excerpt, the big-O notation $O(\xi^...

zzzhhh

31

asked Dec 21, 2022 at 7:31

2 votes

0 answers

115 views

Training an energy-based model (EBM) using MCMC

I'm reading this paper about training energy-based models (EBMs) and don't understand the parameters that we are training for? The part that is relevant to the question is in pages 1-4. Here is the ...

Garfield

201

asked Dec 15, 2022 at 5:09

2 votes

0 answers

86 views

Nuclear norm minimization of convolution matrix (circular matrix) with fast Fourier transform

I am reading a paper Recovery of Future Data via Convolution Nuclear Norm Minimization. Here, I know there is a definition for convolution matrix. Given any vector $\boldsymbol{x}=(x_1,x_2,\ldots,x_n)^...

Xinyu Chen

21

asked Oct 30, 2022 at 12:58

1 vote

0 answers

116 views

Distribution-free learning vs distribution-dependent learning

I came across some papers studying the problem of distribution-free learning, and I am interested in knowing the exact definition of distribution-free learning. I have searched some literature: In ...

yinan

11

asked Oct 29, 2022 at 6:37

4 votes

0 answers

120 views

Progress on "Un-Alching" ML?

So, a couple of years ago I watched both Ali Rahimi's NIPS speech "Machine Learning is Alchemy", (where he talks about how the field lacks a solid, overarching, theoretical foundation) and ...

dicaes

41

asked Oct 22, 2022 at 22:51

Questions tagged [machine-learning]

Related Tags