Skip to main content

All Questions

2 votes
3 answers
396 views

Implementing multiclass logistic regression from scratch

This is a sequel to a previous question about implementing binary logistic regression from scratch. Background knowledge: To train a logistic regression model for a classification problem with $K$ ...
littleO's user avatar
  • 52.5k
3 votes
1 answer
195 views

Implementing binary logistic regression from scratch

Background knowledge: To train a logistic regression model for a classification problem with two classes (called class $0$ and class $1$), we are given a training dataset consisting of feature vectors ...
littleO's user avatar
  • 52.5k
0 votes
1 answer
183 views

Maximum Entropy Continuous Distribution

In Pattern Recognition and Machine Learning Ch 1.6, the author derives the distribution which maximises the differential entropy; $$H(\textbf{x})-\int p(\textbf{x}) \ln (p(\textbf{x})) d\textbf{x}$$ ...
tail_recursion's user avatar
6 votes
3 answers
754 views

Application of the chain rule to $3$-layers neural network

Consider the differentiable functions $L^1(x,\theta^1),L^2(x^2,\theta^2),L^3(x^3,\theta^3)$, where every $x_k,\theta^k$ are real vectors, for $k=1,2,3$. Also define $\theta=(\theta^1,\theta^2,\theta^3)...
Lilla's user avatar
  • 2,109
0 votes
1 answer
836 views

Simplifying partial derivative of cross-entropy function

How do I simplify: $$\begin{eqnarray} \frac{\partial C}{\partial w_j} & = & -\frac{1}{n} \sum_x \left( \frac{y }{\sigma(z)} -\frac{(1-y)}{1-\sigma(z)} \right) \frac{\partial \sigma}{\...
devinbost's user avatar
  • 157
0 votes
1 answer
465 views

Why are terms flipped in partial derivative of logistic regression cost function?

When calculating the partial derivative: $$\frac{\partial}{\partial\theta_{j}}J(\theta) $$ from: $$ J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i})))$$ ...
devinbost's user avatar
  • 157
1 vote
1 answer
138 views

Deriving the maximum likelihood estimate of Gaussian co variance matrix

$\newcommand{\trace}{\operatorname{trace}}$I recently came across a deduction I couldn't follow. It concerns the maximum likelihood estimate of the co-variance matrix for a multivariate Gaussian ...
user25470's user avatar
  • 1,053
1 vote
0 answers
70 views

General correlation between function with itself and other input data

I've collected data for a function F = f(g(x)), for different function shapes g(x). The goal is to predict values of ...
Mate Matic's user avatar
6 votes
1 answer
1k views

Multivariate Gaussian equivalent for a Gaussian integration identity.

For a one-dimensional x, $$\int_{-\infty}^{\infty}x^{2}e^{-x^{2}}dx=\frac{1}{2}\int_{-\infty}^{\infty}e^{-x^{2}}dx$$ This can be shown through integration by parts. There is a good derivation of ...
BenB's user avatar
  • 161