Magnetic work in equation ($1$) is correct. It can be derived from analyzing the work required to vary the magnetic $B$ field. To be precise, because, in general, the fields are not uniform, it should be written as an integral over the volume (see, for instance, Landau & Lifshitz Electrodynamics of continuous media ):
$$
dF = -S dT+\int dV {\bf H}\cdot \delta {\bf B}.\tag{1}
$$
Thus, in general, $F$ is a function of the temperature $T$ and a functional of the magnetic field ${\bf B}({\bf r})$. The field ${\bf H}({\bf r})$ should be related to ${\bf B}$, in general, by a non-linear functional relation:
$$
{\bf H}({\bf r})= \tilde {\bf H}({\bf r},[{\bf B}],T).\tag{2}
$$
Equation $(2)$ depends on the materials' presence, geometry, and characteristics in the region where fields are present. It allows us to close the Maxwell equations for the magnetic fields. Notice that in general, even if the fields in the absence of material, ${\bf B_0}$ and ${\bf H_0}$ are uniform and aligned (${\bf B_0}=\mu_0 {\bf H_0}$), the fields in the presence of a material sample (${\bf B}$ and ${\bf H}$) are point dependent and not proportional.
If we want to introduce the magnetization density ${\bf M}({\bf r})$, defined via
$$
{\bf B}= \mu_0\left( {\bf H}+{\bf M} \right),
$$
we can do it, but the correct expression for the work is
$$
\mu_0 \int dV {\bf H}\cdot \delta {\bf H} + \mu_0 \int dV {\bf H}\cdot \delta {\bf M}. \tag{3}
$$
This last equation differs from the expression of OP's equation ($3$) for the presence of the first integral. Such a term is not the work for creating the field in the absence of the material because, as noted above, the ${\bf H}$ field does depend on the presence of the sample. Therefore, it cannot be ignored when we need the thermodynamic work.
A more convenient reformulation of eq. $(1)$, is by rewriting the term $\int dV {\bf H}\cdot \delta {\bf B}$ in terms of ${\bf M}$ and ${\bf H_0}$, the field in the absence of the sample.
It can be shown (see Stratton's textbook on Electromagnetism or the cited Landau & Lifshitz) that we get
$$
\mu_0 \int dV {\bf H_0}\cdot \delta {\bf H_0} + \mu_0 \int dV {\bf H_0}\cdot \delta {\bf M}. \tag{3'}
$$
The huge advantage of equation $(3')$ over equation ($3$) is that now the field ${\bf H_0}$ is entirely under our control, whatever sample we introduce in it. It can be chosen as uniform, and the first term in equation $(3')$ is the work to build the field without the sample. Therefore, most thermodynamic manipulations can ignore its contribution to free energy.
Summarizing, the differential of the free energy without the work to build the ${\bf H_0}$ field, $F'(T,[{\bf M}])$ can be written as
$$
dF' = -S dT+ \mu_0 \int dV {\bf H_0}\cdot \delta {\bf M}.\tag{4}
$$
and, by choosing a uniform field ${\bf H_0}$ and an ellipsoidal sample (in such a case the magnetization is uniform) aligned with the field, we can finally write
$$
dF' = -SdT + \mu_0 H_0 d\tilde M \tag{5}
$$
where $\tilde M$ is the total dipole moment of the sample.
Finally, if instead of $F'(T,\tilde M)$ we have better working with a function of the external field, we can introduce the Legendre transform of $F'$, a new function.
$
\tilde F(T,H_0) = F' - \mu_0 H_0 \tilde M
$ whose differential is
$$
d \tilde F = -S dT - \mu_0 \tilde M dH_0. \tag{6}
$$
Summarizing, OP's equations $(2)$ and $(3)$ do not follow directly, in general from equation $(1)$, because the contribution to the free energy from the term
$\frac{\mu_0}{2} \int dV H^2$ is sample-dependent.
There are only a few cases where equations $(5)$ and $(6)$ coincide with OP's equations $(2)$ and $(3)$: the cases where the demagnetizing field, i.e. the difference ${\bf H}-{\bf H_0}$ vanishes. Such cases require a special shape of the sample.