How to obtain the parameter update for the multiclass classification (general loss and activation function)?

Ask Question

Asked 3 years ago

Modified 3 years ago

Viewed 19 times

Consider the feature space $\mathcal{X}=\mathbb R^{d}$ and $\mathcal{Y}=\{1,...,c\}$ such that $c > 2$. We consider some activation function $\alpha: \mathbb R^{c} \to \mathbb R^{c}$ and out weight matrix $W\in \mathbb R^{c\times d}$, and bias vector $b \in \mathbb R ^{c}$. Now given a particular loss function $L$, e.g. consider the quadratic loss function in $\mathbb R^{c}$, we one-hot encode labels $y \in \mathcal{Y}$ via the operation $\tilde{\cdot}$ such that $\tilde{y}= \hat{e}_{y}$, i.e. the vector with one in $y\in \{1,...,c\}$ and zeros elsewhere, in order to allow it as an argument in the loss function.

Question: Keeping the situation as general as possible (i.e. no specific loss function)

How do we compute $\frac{\partial \hat{L}}{\partial b_{j}}(b,W)$ and $\frac{\partial \hat{L}}{\partial W_{jk}}(b,W)$ such that $j\in \{1,...,c\}$ and $k\in \{1,...,d\}$ where

$ \hat{L}(b,W) =\frac{1}{N}\sum\limits_{i=1}^{N}L(\tilde{y}^{(i)},\alpha(W\cdot x^{(i)}+b))$

Initially I thought of doing something like:

$\frac{\partial \hat{L}}{\partial b_{j}}(b,W)=\frac{1}{N}\sum\limits_{i=1}^{N}\frac{\partial L}{\partial z}(\tilde{y}^{(i)},z)\lvert _{z=\alpha(W\cdot x^{(i)}+b)}\alpha^{\prime}(W\cdot x^{(i)}+b)\cdot \hat{e}_{j}$

and

$\frac{\partial \hat{L}}{\partial W_{jk}}(b,W)=\frac{1}{N}\sum\limits_{i=1}^{N}\frac{\partial L}{\partial z}(\tilde{y}^{(i)},z)\lvert _{z=\alpha(W\cdot x^{(i)}+b)}\alpha^{\prime}(W\cdot x^{(i)}+b)\cdot \hat{e}_{j}x^{(i)}$

But I can clearly see that my dimensions and differentiation is not working out. Any ideas?

asked Jul 3, 2021 at 10:38

MinaThuma

9986 silver badges17 bronze badges

Add a comment |

Stack Exchange Network

How to obtain the parameter update for the multiclass classification (general loss and activation function)?

0

You must log in to answer this question.

Browse other questions tagged
real-analysis
probability
ordinary-differential-equations
statistics
machine-learning
.

Hot Network Questions

How to obtain the parameter update for the multiclass classification (general loss and activation function)?

0

You must log in to answer this question.

Browse other questions tagged real-analysisprobabilityordinary-differential-equationsstatisticsmachine-learning.

Related

Hot Network Questions

Browse other questions tagged
real-analysis
probability
ordinary-differential-equations
statistics
machine-learning
.