Why is convergence in distribution defined in terms of "weak" convergence in the law?
Intuitively, (at least at a literal level) convergence in distribution of $(X_n)_n$ sequence of Borel random variables should imply "convergence" in the Law of these random variables. If I were to guess the definition of Convergence in Distribution before studying it, I would probably say:
$X_n \stackrel{d}{\to} X$ if $\mu_{X_n}(S) \to \mu_X(S)$ for all Borel $S$.
This is obviously a lot stronger than the actual definition of Convergence in Distribution. But what would be the problem if we defined it this way?
Here is the definition of Convergence in Distribution introduced in my Probability class:
Definition of Convergence in Distribution:
Fix $(X_m)_{m = 1} ^\infty$ of random vectors in $\mathbb{R}^n$(they may live on different probability spaces) and $X$ random vector in $\mathbb{R}^n$. Then $X_n \stackrel{d}{\to} X$ in distribution if $\mu_{X_n} \stackrel{w}{\to} \mu_X$.
Definition of Weak Convergence in probability:
$\mu_{X_n} \stackrel{w}{\to} \mu_X$ means $\int_\mathbb{R^n} h \,d\mu_{X_n} \to \int_\mathbb{R^n} h \,d\mu_X$ for all $h: \mathbb{R}^n \to \mathbb{R}$ continuous and bounded.