Why cannot we take the output at the drain of M1 and M2 (the main
transistors of the differential pair)
Above M1 and M2 are M3 and M4 and these are drain-gate connected diodes (in order to implement current mirrors in conjunction with M5 and M6 respectively). Because of that, the voltage amplification seen at the drains of M1 and M2 is much lower than if they had current sources in their respective drain circuit.
I suspect that the reason might be to avoid miller capacitance coupling from drain to input gate and thus provide a greater high frequency performance (rather like a cascode amplifier achieves for exactly the same reasons).