There are several questions and answers about the inclusion-exclusion principle, e.g. here, here or here. Similarly, I found a lot of proofs, e.g. induction, comparing both sides, ... . There is another approach, however, that I grapple with at the moment:
Let $(\Omega, \mathcal{F}, P)$ be a probability space and $A_i \in \mathcal{F}, i \in I = \{1, \ldots, n\}$. For $J \subset I$ define $$S_J = \bigcap_{j \in J} A_j \cap \bigcap_{j \in I\setminus J} A_j^c$$
Apparently, one can show now that $\bigcap_{k \in K} A_k = \dot{\bigcup}_{K \subset J \subset I} S_J$ for all $K \subset I$. This relation, especially the disjointness of the $S_J$ is not immediately clear to me formally.
Building on this result, one can then show that for all $J \subset I$ it holds that
$$ P(S_J) = \sum\limits_{K: J \subset K \subset I} (-1)^{\vert K \setminus J \vert} P(\bigcap_{k \in K} A_k) $$
Then, setting $J = \emptyset$, we recover the usual inclusion-exclusion principle.
Besides the clarification on the disjointness of the $S_J$, I would like to better grasp what is going on here in terms of intuition or visual representation. The usual inclusion-exclusion principle is nicely illustrated with the help of Venn diagrams, for example, and how many times elements are counted on both sides of the equation. In the above approach, I don't yet see visually how the definition of the $S_J$ fits in this framework of intersections and unions.