As Oscar suggested, the most common intuition for $\mathbb{E}[X | \mathcal{F}]$ is that it is the best guess of $X$ given the information in $\mathcal{F}$. However, I find that the alternative intuition that it is the orthogonal projection of $X$ onto a subspace makes it clearer why it is defined the way it is.
First, just for clarity's sake, let me set up the orthogonal projection. Lets say you have an inner product space $V$ and $v \in V$. Then for a subspace $W \subseteq V$, we can define the orthogonal projection of $v$ onto $W$ as the unique $p_{w}(v) \in W$ such that
$$ \left< v, w \right> = \left< p_{w}(v), w \right>$$
for all $w \in W$.
This is based on the idea that a vector can be determined entirely by its inner product with other vectors. That is, the projection $p_{w}(v)$ is unique because if there were some other candidate $z \in W$ we would have that $\left< p_{w}(v) - z, w \right> = 0$ for all $w \in W$.
Before we move back to random variables, let's consider a space of functions, specifically the square-integrable functions $L^{2}(\mathbb{R})$. On this space, we have an inner product
$$ \left< f, g \right> = \int_{-\infty}^{\infty} f(x) g(x) \ dx. $$
Now if we consider specifically the indicator function $I_{A}(x) = I[x \in A]$ for a set $A$, we get that
$$ \left< f, I_{A} \right> = \int_{A} f(x) \ dx, $$
assuming that $A$ is appropriately chosen (e.g. a closed interval).
This in particular will be very convenient for re-interpreting the standard definition of the conditional expectation as a projection.
Now, a function $f$ cannot be determined entirely by its integrals with other $g$. Specifically, if $\tilde{f}$ is another function such that $f(x) = \tilde{f}(x)$ except on a very small set of points (a set of measure zero), the inner product of $\tilde{f}$ with other $g$ will be the same as for $f$. We skirt around this by considering two such functions to be equivalent, ie working with the set of square-integrable functions up to almost-everywhere equivalence.
Now, back to random variables. First, like with $L^{2}$, we consider two random variables $X$ and $Y$ the same if they are equal with probability one. In other words, we can have $X(\omega) \not = Y(\omega)$, but only on a small set of outcomes $\omega$ (small meaning probability zero).
The inner product space here is now the set of random variables with respect to our ambient $\sigma$-algebra, say $\Sigma$. Notice that this is actually a space of functions, with the constraint that they must be $\Sigma$-measurable. The inner product is
$$ \left< X, Y \right> = \mathbb{E}[XY] = \int XY dP. $$
The subspace $W_{\mathcal{F}}$ we project onto is the subset of $\mathcal{F}$-measurable random variables. Then for any $X$, the projection $\mathbb{E}[X|\mathcal{F}]$ is defined by
- $\mathbb{E}[X|\mathcal{F}] \in W_{\mathcal{F}}$
- $\int XY dP = \int \mathbb{E}[X|\mathcal{F}] Y dP$ for all $Y \in W_{\mathcal{F}}$.
Now compare this to the traditional definition of $\mathbb{E}[X|\mathcal{F}]$.
- $\mathbb{E}[X|\mathcal{F}]$ is $\mathcal{F}$-measurable
- $\int_{A} \mathbb{E}[X|\mathcal{F}] dP = \int_{A} X dP$ for all $A \in \mathcal{F}$.
The first two correspond to each other since $W_{\mathcal{F}}$ is just the $\mathcal{F}$-measurable random variables. The second two seem slightly different. Specifically, the definition of conditional expectation only looks at $Y$ of the form
$$ Y(\omega) = I_{A}(\omega) = I[\omega \in A] $$
where $A \in \mathcal{F}$. However, using the definition of the Lebesgue integral, considering the indicator functions is enough to ensure equality across all $\mathcal{F}$-measurable functions.
In other words, the two definitions are equivalent in this context.
Basically, this shows that $\mathbb{E}[X|\mathcal{F}]$ is the closest approximation of $X$ that is $\mathcal{F}$-measurable. To put it briefly, if we know which event in $\mathcal{F}$ occurred, but not the specific outcome $\omega$, $\mathbb{E}[X|\mathcal{F}]$ allows us to make a specific guess for the value of $X$ using all the information we have.
There is a lot more to be said about this. One great resource for learning more is David Williams' "Probability with Martingales", which I think anyone even slightly interested in understanding the theory of statistics should have. It has a chapter on the conditional expectation that goes into detail. If this were not already a ridiculously long answer, I would also go into the idea that $\mathbb{E}[X|\mathcal{F}]$ can be thought of as a regression estimate of $X$ (least-squares, of course). But hopefully this gives a start.