Consider the free Maxwell Lagrangian:
$$L= -\frac{1}{4}F_{\mu\nu}F^{\mu\nu}. $$
As we know, the gauge symmetry $A_{\mu} \rightarrow A_{\mu}+\partial_\mu \lambda$ must be fixed when quantizing the theory. Consider the case where we do this covariantly, using the Lorenz Gauge. This is done by adding a gauge fixing term to the Lagrangian, as such:
$$L_{gf} = -\frac{1}{4}F_{\mu\nu}F^{\mu\nu}+\frac{\alpha}{2}(\partial_\mu A^\mu)^2.$$
Now, I understand how this extra term helps us imposing $\partial_\mu A^\mu = 0$. Indeed, we can see this as a lagrange multiplier for example, and varying the non-dynamical $\alpha$ yields the desired condition.
In some other derivations using the path integral for example, it is obtained roughly by integrating the gauge fixing Dirac $\delta$ on all possible gauges, with some "weighted integral" (see Peskin and Schroeder, p.296): $$ =\int \mathcal{D}\omega \exp{\left(-i\int d^4x\mbox{ }\alpha\frac{\omega^2}{2}\right)}\delta(\partial_{\mu} A^\mu-\omega(x)) = \exp{\left(-i\frac{\alpha}{2}(\partial_\mu A^\mu)^2\right)}$$
Which has the effect of adding the same gauge fixing term to the effective Lagrangian.
My problem in both cases (and all other derivations I have seen of this gauge-fixing term), is that we do something that looks like an arbitrary choice. To be more precise, in the case where we see this extra term as a Legendre multiplier, why not have chosen something like $\frac{\alpha}{2}\partial_\mu A^\mu$ ? Someone might argue that with this term the Lagrangian will be unbounded from below thus giving an ill-behaved theory.
What then about a term like $\frac{\alpha}{2}(\partial_\mu A^\mu)^4$ ? Again we might say that this term is "forbidden" because it is of order 4 in the derivatives and this might again produce problems in the theory.
Are these really the only reasons that we choose this specific form for the gauge fixing term? I was under the impression that there must be something simpler that determines the form of the gauge-fixing term.
Again, in the path integral formulation, we have the seemingly arbitrary choice of the weight with which we perform the $\mathcal{D}\omega$ integral. Again we have some arguments like "we weigh with a Gaussian to insure convergence" or something like that, but to me this isn't very convincing.
So is there a profound way to explain the form of this term, or is it just an educated guess which turns out to work?