You have to be careful what you're taking the derivative of. $\mathrm Df(a):x\mapsto \mathrm Df(a)(x)$ is a linear map for fixed $a$. However, the map $\mathrm Df:a\mapsto\mathrm Df(a)$ is a (usually) nonlinear map assigning to each $a$ a linear map.
Differentiating the first one will return the same map, since it's linear, and the derivative of a linear map is exactly that linear map. However, differentiating the second one will give you something different, and it's what we usually care about.
In the first case, if $f$ is linear, then you will indeed get $\mathrm D(\mathrm Df(a))(b)=f$ for all $a$ and $b$. In the second case, since $\mathrm Df(a)$ is a constant, you will get $\mathrm D(\mathrm Df)(a)=0$. Notice the meaningful difference in notation. The way you wrote it, we should actually mean the first case Your friend is right in that this is not a case you'll often see discussed, though.