I only understood this after taking a class in general relativity (GR), differential geometry and quantum field theory (QFT). The essence is quite trivial, actually:

You have a theory that is invariant under some symmetry group. So in quantum electrodynamics you have a Lagrangian density for the fermions (no photons yet)
$$ \mathcal L = \bar\psi(x) [\mathrm i \gamma^\mu \partial_\mu - m] \psi(x) \,.$$
This $\bar\psi
$ is just $\psi^\dagger \gamma^0$, important is that it is complex conjugated. The fact that it is a four-vector in spin-space is of no concern here. What one can do now is transform $\psi \to \exp(\mathrm i \alpha) \psi$. Then $\bar\psi \to \bar\psi \exp(-\mathrm i \alpha)$ and the Lagrangian will be invariant. There you have a global symmetry.

Now promote the symmetry to a local one, why not? Instead of a global $\alpha$ one now has $\alpha(x)$. The problem is that when we transform now, one picks up the $\partial_\mu \alpha(x)$ with the chain and product rules of differentiation. That seems like a technical complication at first.

There is a more telling way to see this:  
You take a deriviative of a field $\psi(x)$. This means taking a difference quotient like
$$ \partial_\mu \psi(x) = \lim_{\epsilon \to 0} \frac{\psi(x + \epsilon \vec e_\mu) - \psi(x)}{\epsilon} \,.$$
This works just fine with a global transformation. But with the local transformation, you basically subtract two values that are *gauged* differently. In differential geometry you have that the tangent spaces at the different points of the manifold are different and therefore one cannot just compare vectors by their components. One needs a *connection* with *connection coefficients* to provide *parallel transport*. It is similar here. We now have promoted $\phi$ from living on $\mathbb R^4$ to living in the bundle $\mathbb R^4 \times S^1$ as we have an U(1) gauge group. Therefore we need some sort of connection in order to transport the transformed $\phi$ from $x + \epsilon \vec e_\mu$ to $x$. This is where one has to introduce some connection which is
$$ \partial_\mu \to \mathrm D_\mu := \partial_\mu + \mathrm i A_\mu \,.$$

If you plug that into the Lagrange density to make it
$$ \mathcal L = \bar\psi(x) [\mathrm i \gamma^\mu \mathrm D_\mu - m] \psi(x)$$
and then choose $A_\mu = \partial_\mu \alpha$ you will see that the Lagrangian density does stay invariant even under local transformations as the connection coefficient will just subtract the unwanted term from the product/chain rule.

In general relativity you have the symmetry under arbitrary diffeomorphism, the price is that you have to change the derivative to a connection,
$$ \partial \to \nabla + \Gamma + \cdots \,.$$