What does it mean to take the expectation with respect to a probability distribution?

Question

I see this expectation in a lot of machine learning literature:

$$\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})] = \int p(\mathbf{x};\mathbf{\theta}) f(\mathbf{x};\mathbf{\phi}) d\mathbf{x}$$

For example, in the context of neural networks, a slightly different version of this expectation is used as a cost function that is computed using Monte Carlo integration.

However, I am a bit confused about the notation that is used, and would highly appreciate some clarity. In classical probability theory, the expectation:

$$\mathbb{E}[X] = \int_x x \cdot p(x) \ dx$$

Indicates the "average" value of the random variable $X$. Taking it a step further, the expectation:

$$\mathbb{E}[g(X)]=\int_x g(x) \cdot p(x) \ dx$$

Indicates the "average" value of the random variable $Y=g(X)$. From this, it seems that the expectation:

$$\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})]$$

Is shorthand for and the same as:

$$\mathbb{E}_{\mathbf{x}}[f(\mathbf{x};\mathbf{\phi})]$$

Where:

$$ \mathbf{x} \sim p(\mathbf{x};\mathbf{\theta})$$

And this indicates the average value of the random vector $\mathbf{y} = f(\mathbf{x};\mathbf{\phi})$. Is this correct?

By this logic, would this statement be correct too?

$$\mathbb{E}[X] = \mathbb{E}_{p(X)}[X]$$

Re "Is shorthand for and the same as": Not quite. Notice that the original expression explicitly mentions $\theta$ while the subsequent one does not. — whuber, Commented Sep 11, 2020 at 18:35
You got it right! This is quite a confusing notation. I prefere to use the notation $$\mathbb{E}_{\mathbf{x} \sim p(\mathbf{x}|\theta)}[X].$$ — MachineLearner, Commented Sep 11, 2020 at 19:59
I think you need to rely on the conventions and context established by the author. There is no universal notation. — whuber, Commented Sep 11, 2020 at 20:12
$\mathbb E[\mathbf X]$ is ambiguous, while $$\mathbb{E}_{\mathbf{X} \sim p(\mathbf{x}|\theta)}[X]$$and$$\mathbb{E}_{p(\cdot|\theta)}[X]$$and$$\mathbb{E}_{p(\mathbf{x}|\theta)}[X]$$are not. This is particularly true when considering varying values of a parameter $\theta$ such as$$\mathbb{E}_{p(\cdot;\mathbf{\theta})}[\log p(\mathbf{X};\mathbf{\phi})]$$found eg in the EM algorithm. — Xi'an, Commented Sep 12, 2020 at 8:08
Hi @jbuddy_13, in a classical neural network architecture, the posterior probability of classes $\mathbf{y}=[y_1,y_2,...,y_K]$ given an input feature vector $\mathbf{x}$ is $p(\mathbf{y}|\mathbf{x};\mathbf{w})$, where $\mathbf{w}$ are the parameters of the network. Note that $\mathbf{y}$ is in one-hot encoding. This posterior probability is estimated using maximum likelihood estimation, and therefore the objective is to maximize $E_{p(\mathbf{x},\mathbf{y})}[log(p(\mathbf{y}|\mathbf{x};\mathbf{w}))]$. — mhdadk, Commented Sep 13, 2020 at 12:09

Alecos Papadopoulos · Accepted Answer · 2022-08-15 00:10:55Z

The expression

$$\mathbb E[g(x;y;\theta;h(x,z),...)]$$

always means "the expected value with respect to the joint distribution of all things having a non-degenerate distribution inside the brackets."

Once you start putting subscripts in $\mathbb E$ then you specify perhaps a "narrower" joint distribution for which you want (for your reasons), to average over. For example, if you wrote $$\mathbb E_{\theta, z}[g(x;y;\theta;h(x,z),...)]$$ I would be inclined to believe that you mean only

$$\mathbb E_{\theta, z} = \int_{S_z}\int_{S_\theta}f_{\theta,z}(\theta, z)g(x;y;\theta;h(x,z),...) d\theta dz$$

and not $$\int_{S_z}\int_{S_\theta}\int_{S_x}\int_{S_y}f_{\theta,z,x,y}(\theta, z,x,y)g(x;y;\theta;h(x,z),...) d\theta\, dz\,dx \,dy$$

But it could also mean something else, see on the matter also https://stats.stackexchange.com/a/72614/28746

Stack Exchange Network

What does it mean to take the expectation with respect to a probability distribution?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
neural-networks
mathematical-statistics
expected-value
notation
or ask your own question.

Linked

Hot Network Questions

What does it mean to take the expectation with respect to a probability distribution?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged neural-networksmathematical-statisticsexpected-valuenotation or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
neural-networks
mathematical-statistics
expected-value
notation
or ask your own question.