Difference between deep neural networks and expectation maximization algorithm

Question

Having had a short encounter with deep neural networks, it seems to boil down to the task of determining the values of a vast amount of parameters.

The expectation maximization algorithm, of which I only learned recently, deals with similar problem, namely determining the values of a vast amount of latent variables that maximize expected log-likelyhood.

Question:

what, if any, are the essential differences between the deep neural networks and expectation maximization, w.r.t.

the problems they solve

the methods used for optimizing the values of a vast amount of parameters, resp. latent variables?

this is discussed at stats.stackexchange.com/q/447741/338593 (where it is probably more on-topic than here) — Carlo Beenakker, Commented Apr 13 at 13:57

Jim · Accepted Answer · 2024-04-14 05:47:52Z

A high level view for neural networks is that they are just heuristics for approximating multi-variate functions. You pick a class of parametrizable functions and you optimize their parameters. However picks the best heuristic wins. Apart from the fact that those heuristics work surprisingly well some are also accompanied by theorems that guarantee that they are universal function approximators. Any function can be approximated with these heuristics: from PDE solutions to probability distributions on discrete sets etc.

The EM algorithm on the other hand tries to approximate the (parametric) probability distribution of a very specific class of problems: hidden variable models. Suppose I have multivariate random variables X,Y. Suppose $Y$ is observable, $X$ is unobservable, but I know the form of $p(X|Y, \theta)$ and this relationship constitutes a sufficiently constraining relationship. Then the EM algorithm estimates $X$ and $\theta$ from the values of $Y$.

The optimization procedures for the two methods differ a lot also. Neural networks need large datasets to train (because of their size), and therefore the only method that can work reasonably fast is small-batch stochastic gradient descent. Essentially break the dataset into pieces (batches) and at each step get a gradient direction computed just from the data in the batch.

The EM algorithm is a specific case of the MM algorithm, and the optimization step works as follows. My target function (it will be a log-likehood function) to be maximized is in general non-concave. However at any given every point that my optimization visits: I approximate the target with a concave function that touches on that point but is everywhere else smaller. I can maximize at each point this lower bound function with any standard convex optimization algorithm and always the target function is guaranteed to increment more.

Finally, both methods can converge to local minima/maxima, but the MM algorithm is extremely stable, whereas for many neural network heuristics stability is far from granted.

The EM algorithm vs neural networks comparison is best seen through neural networks taking over of natural language processing, where in the early days hidden markov models dominated. If you want to purse this I recommend reading Bengio's 2003 paper "A Neural Probabilistic Language Model" and lookup some of the early methods mentioned in there in Manning's "Foundations of statistical natural language processing".

Bengio's paper is important for the following reason. It's an easy to understand example of what a good heuristic looks like: it consists both of a large expressive function part and a "dimensionality reduction" part, and it can dominate over traditional probabilistic methods.

Finally for examples for the MM algorithm any Kenneth Lange paper you pick will usually have a bunch of examples.

$\begingroup$ a very clear and detailed explanation! $\endgroup$
– Manfred Weis
Commented Apr 14 at 4:43 — Manfred Weis, Commented Apr 14 at 4:43

Stack Exchange Network

Difference between deep neural networks and expectation maximization algorithm

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
st.statistics
machine-learning
neural-networks
or ask your own question.

Difference between deep neural networks and expectation maximization algorithm

1 Answer 1

Not the answer you're looking for? Browse other questions tagged st.statisticsmachine-learningneural-networks or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
st.statistics
machine-learning
neural-networks
or ask your own question.