Skip to main content

Questions tagged [machine-learning]

The tag has no usage guidance.

9 votes
1 answer
313 views

Who introduced the term hyperparameter?

I am trying to find the earliest use of the term hyperparameter. Currently, it is used in machine learning but it must have had earlier uses in statistics or optimization theory. Even the multivolume ...
ACR's user avatar
  • 790
2 votes
0 answers
109 views

Equivalence of score function expressions in SDE-based generative modeling

I am studying the paper "Score-Based Generative Modeling through Stochastic Differential Equations" (arXiv:2011.13456) by Yang et al. The authors use the following loss function (Equation 7 ...
Po-Hung Yeh's user avatar
8 votes
1 answer
546 views

Geometric formulation of the subject of machine learning

Question: what is the geometric interpretation of the subject of machine learning and/or deep learning? Being "forced" to have a closer look at the subject, I have the impression that it ...
Manfred Weis's user avatar
  • 12.8k
1 vote
0 answers
98 views

Problems Correction of "Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning "' [closed]

Where I can find the problems correction of this book " Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning "
zdo0x0's user avatar
  • 11
3 votes
0 answers
61 views

Prove the convergence of the LASSO model in the presence of limited eigenvalues

I am researching the properties of the Lasso model $\hat \beta:= \operatorname{argmin} \{\|Y-X\beta\|_2^2/n+\lambda\|\beta\|_1\}$, specifically its convergence when the data satisfies restricted ...
GGbond's user avatar
  • 39
11 votes
0 answers
165 views

Worst margin when halving a hypercube with a hyperplane

Consider the $n$-cube $C_n=\lbrace-1,1\rbrace^n$ and the problem of partitioning it into halves with hyperplanes through the origin that avoid all its points. We can parameterize the hyperplanes by ...
Veit Elser's user avatar
  • 1,065
1 vote
0 answers
69 views

Curve fitting with "rough" loss functions

Many real-valued classification and regression problems can be framed as minimization in the following way. Setup: Let $\Theta \in \mathbb{R}^p$ be the parameter space that we are searching over. For ...
Simon Kuang's user avatar
5 votes
1 answer
1k views

Mathematics research relating to machine learning

What branch/branches of math are most relevant in enhancing machine learning (mostly in terms of practical use as opposed to theoretical/possible use)? Specifically, I want to know about math research ...
Artus's user avatar
  • 173
1 vote
1 answer
121 views

Adjoint sensitivity analysis for a cost functional under an ODE constraint

I am trying to recover the result given by equation 10 in the article here. I am unable to get rid of the integral, any help would be much appreciated. To keep the description as self contained as ...
Abhi. A's user avatar
  • 55
2 votes
1 answer
60 views

Convergence of minimiser of empirical risk to minimiser of population risk

Let $X_1, \dots, X_n \sim \mu$ be some random elements of a space $\mathcal{X}$. Let $H$ be a Hilbert space of functions $f: S \to \Re$ with norm $\|\cdot\|_H$. Let $\|f^*\|_{L_2(\mu)} < \infty$ ...
user27182's user avatar
  • 327
2 votes
0 answers
42 views

can we get a family of classifiers $\left\{f_n\right\}_{n \in N}$such that $\lim_{n->∞} (E_{(X_1, Y_1), ...,(X_n, Y_n) \sim \rho}[R(f_n)]-R(f_B))=0 $

For a given classifier $f: \mathbb{R}^d \mapsto\{0,1,2\}$, let $$ R(f):=\mathbb{E}_{(X, Y) \sim \rho}\left[\mathbb{1}_{f(X) \neq Y}\right] $$ $f_B$ the Bayes classifier. can we get a family of ...
fantacy_crs's user avatar
3 votes
0 answers
57 views

How to prove emprical risk converges to expectation risk as $n\to \infty$?

For example, for a classical binary classification: $x \in \mathbb{R}^d$ and $y \in\{0,1\}$ let empirical risk be $R_{\ell}^n(f):=\frac{1}{n} \sum_{i=1}^n \ell\left(f\left(X_i\right), Y_i\right)$ and ...
fantacy_crs's user avatar
2 votes
1 answer
84 views

VC-based risk bounds for classifiers on finite set

Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. ...
Math_Newbie's user avatar
4 votes
1 answer
333 views

Perceptron / logistic regression accuracy on the n-bit parity problem

$\DeclareMathOperator{\sgn}{sign}$The perceptron (similarly, logistic regression) of the form $y=\sgn(w^T \cdot x+b)$ is famously known for its inability to solve the XOR problem, meaning it can get ...
ido4848's user avatar
  • 141
1 vote
0 answers
32 views

Convergent gradient-type scheme for solving smooth nonconvex constrained optimization problem

Let $x_1,\ldots,x_n \in \mathbb R^d$ and $y_1,\ldots,y_n \in \{\pm 1\}$, and $\epsilon, h \gt 0$. Define $\theta(t) := Q((t-\epsilon)/h)$, where $Q(z) := \int_{z}^\infty \phi (z)\mathrm{d}z$ is the ...
dohmatob's user avatar
  • 6,824

15 30 50 per page
1 2
3
4 5
13