The proof of your second statement is actually pretty easy using induction. The base case $k=0$ is true based on elementary arguments. Now, suppose inductively that the result is true for any $k \geq 0$. We'll show it's true for $k+1$. Note that by the chain rule,
\begin{align}
D(g \circ f)_x &= (Dg)_{f(x)} \circ Df_x.
\end{align}
Now, the following three maps:
\begin{align}
\begin{cases}
K:\mathcal{B}(Y;Z) \times \mathcal{B}(X;Y) \to \mathcal{B}(X;Z) \qquad &(T,S) \mapsto T \circ S \\\\
\iota_1:\mathcal{B}(Y;Z) \to \mathcal{B}(Y;Z) \times \mathcal{B}(X;Y) \qquad &T \mapsto(T,0) \\\\
\iota_2:\mathcal{B}(X;Y) \to \mathcal{B}(Y;Z) \times \mathcal{B}(X;Y) \qquad &T \mapsto(0,T)
\end{cases}
\end{align}
$K$ is the "composition map", and $\iota_1, \iota_2$ are the "canonical inclusions." Note that $K$ is a continuous bilinear map, and hence $C^{\infty}$ (the third derivative vanishes identically), and $\iota_1, \iota_2$ are continuous linear maps, and hence $C^{\infty}$ (their second derivatives vanish). With this, we can write:
\begin{align}
D(g \circ f)_x &= K\left( Dg_{f(x)}, Df_x\right) \\
&= K\bigg( [\iota_1 \circ (Dg) \circ f](x) + [\iota_2 \circ Df](x)\bigg) \\
&= \bigg[K \circ \left(\iota_1 \circ (Dg) \circ f + \iota_2 \circ Df \right) \bigg](x)
\end{align}
Or as an equality of functions, we can write:
\begin{align}
D(g \circ f) &= K \circ \bigg( \iota_1 \circ (Dg) \circ f + \iota_2 \circ Df\bigg) \tag{$*$}
\end{align}
By the induction hypothesis, $f$ and $g$ are $C^{k+1}$, so $Df, Dg$ are $C^k$. As explained above, the maps $K, \iota_1, \iota_2$ are all $C^{\infty}$. Thus, in $(*)$, we have expressed $D(g \circ f)$ as a sum and composition of functions all of which are atleast $C^k$. By induction hypothesis, it follows that $D(g \circ f)$ is $C^k$, but this means exactly that $g \circ f$ is $C^{k+1}$. Hence, the induction is completed.
Very often, to show smoothness of maps between Banach spaces the quickest way is to define such auxillary maps, defined on a larger space, which we already know to be smooth. Then, after some practice, it becomes unnecessary to explicitly introduce them, and you can just "see" for example directly from the equation $D(g \circ f)_x = Dg_{f(x)} \circ Df_x$ that the RHS is smooth "as a function of $x$".
For example, in a Banach algebra $A$ (such as $\mathcal{B}(X,Y)$ with "multiplication" being composition of linear maps), let $U$ be the open set of all invertible elements of the algebra $A$ (the fact that this set is open shouldn't be too hard to prove). Consider the inversion mapping $\psi: U \to U$, $\psi(a) = a^{-1}$. By a direct "difference-estimate", one can show that $\psi$ is differentiable on $U$, with derivative given by
\begin{align}
D \psi_a(h) &= -a^{-1}\cdot h \cdot a^{-1} \\
&= - \psi(a) \cdot h \cdot \psi(a).
\end{align}
Notice that this is a type of "differential equation" for the function $\psi$ (we have the derivative on the LHS and the function on the RHS). By similar trickery in the induction process, one can prove that $\psi$ is actually $C^{\infty}$.