Mergesort takes $Θ(n \log n)$ compares to sort n elements [in the worst case].
Just unfold definitions. Step one:
Mergesort takes $O(n \log n)$ and $\Omega(n \log n)$ compares to sort $n$ elements [in the worst case].
Step two (and three, I guess, since I recombine the unfolded definitions of $O$ and $\Omega$):
For some $n_0 \in \mathbb{N}$ and $a,b \in \mathbb{R}_{>0}$, the number of comparisons $C_n$ Mergesort uses to sort $n$ elements [in the worst case] is bounded by $a \cdot n \log n \leq C_n \leq b \cdot n \log n$ for all $n \leq n_0$.
That's it. Would I call that insight? Hardly.
Adding the justification for why we ignore constant factors in the first place and that comparisons are a dominant operation in Mergesort, I guess the intended statement is something like this:
If you implemented this Mergesort algorithm on a machine [that fits the RAM model and can compare all input numbers in constant time], its running-time cost would be roughly proportional to $n \log n$ for large enough $n$.
Note that neither the proportionality constant nor the sense of "roughly" and "large enough" are specified by the formal definitions. It may very well be that the algorithm is utterly unusable for all instances relevant to you. Hence, statements about algorithm performance using Landau notation are, for practical purposes, mostly vacuous -- formally speaking.
Implicitly, "we know" that "usually" constants are small and lower-order terms behave "nicely" (see e.g. here for a more precise result on Mergesort) so comparing Landau bounds is a somewhat meaningful criterion for choosing algorithms in practice. It has serious limitations, though, and it is important to keep those in mind.