The symptoms clearly indicate that there is a client-side receive processing bottleneck.
The trace shows that the receive buffer in the client network stack (LwIP apparently) is filling - Specifically the window size in acknowledgements from the client is shrinking as a series of packets are received from the server, until the receive buffer completely fills and a "ZeroWindow" is sent from client to server to request that it stop sending data.
This typically indicates that the application reading from the socket isn't draining the receive buffer fast enough to let the network stack keep up with the arriving packets.
The frequent Retransmissions from the server, combined with delayed ACKs from client suggest that conditions on the client are also negatively impacting the lower-level receive packet processing in the network stack.
You mentioned the client host was an embedded device, and also commented that your app appears to be blocked at lwip_read(). The wait suggests that another IO or CPU resource or scheduling bottleneck could be preventing your read thread from getting enough CPU time to keep up with the file transfer. However the delayed ACKs suggest that there may be a broader problem. Without knowing more about the embedded device it's difficult to troubleshoot further.
LwIP also has special constraints relating to servicing network calls that could apply to your implementation - See this LwIP pitfalls page.
I hope this information is helpful for resolving your issue.