Firstly there is the usual intuitive physical argument, which are probably aware of and is also explained in the post by @Ratman. Another version of this same argument is that, because of the asymptotic freedom of QCD, and the high energy scale of the process, the partons inside the hadron are approximately free particle moving collinear to the hadron, since the QCD coupling decreases going to higher energy scales. Then the partons have approximate momenta $\xi P$, where $\xi \in (0,1)$ and $P$ is the hadron momentum. Hence if two hadrons collide it is as if the partons just scatter of each other and for the total cross section you just have to take into account the distribution of the partons inside the hadrons, given by the PDFs.
Now this is a useful but of course very simplified picture. Proving QCD factorization from first principles is quite complicated. But it can be proven for many processes, with rigor at the level of physics, and there are a number of different ways to access factorization theorems. The usual way is the operator product expansion. Then there is also soft collinear effective theory. But as I think the most intuitive way is the perturbative QCD approach which operates at the level of Feynman graphs and I will try to summarize this briefly. For a really in depth treatment, Collins book "Foundations of perturbative QCD" is definitely the right source.
Explaining this in a few words is not easy, so you should take what I write as a crude approximation of what is really going on. The argument is, very rougly, that there are certain regions of the loop integration of the Feynman graphs of the given process, which gives the leading power in $m/Q$, where $Q$ is the characteristic large scale of the process and $m$ is a small mass scale, of order $\Lambda_{\text{QCD}}$. And the cool thing is that these leading regions are precisely those regions, where the lines of the Feynman graph close to the hadron external lines are collinear to the hadron momentum $P$ and the lines close to the interaction vertex carry lines of large virtuality, of order $Q^2$. Then, in those leading integration regions, we can factorise any graph into a convolution of a factor, contributing to the hard function, corresponding to a subgraph close to the interaction vertex and a function, the PDF, corresponding to the subgraph close to the hadron external lines. The convolution integration comes from the loop momentum connecting these two subgraphs. See the following picture for DIS
So indeed the Feynman diagrams give an intuitive understanding which goes beyond the level of perturbation theory. The leading contributions correspond to collinear "parton" lines on Feynman diagrams, reproducing the classical picture. The coefficient function (or hard subgraph) is just the amplitude for the partonic scattering, corresponding to the $\hat{\sigma}$ in your equation.
The PDFs can be defined as hadronic matrix elements of so-called light-ray operators, as you know they can not be calculated in perturbation theory. The number density interpretation of the PDFs can be justified using light-front quantization, also explained in Collins book.