0

Let's say I have a file that's 10GB and I want to transfer it over the Internet. Will it be best and fastest if I split the file into many smaller files and send it, then reassemble it after transfer, or just send the one large file without splitting and reassemblement?

I was thinking of using ftp, but is there a better solution to this?

5
  • That depends on many factors, making your question too broad to answer and off-topic here as is the question for other solutions/resources, sorry.
    – Zac67
    Commented Nov 8, 2019 at 10:09
  • Unfortunately, questions about applications and protocols above OSI layer-4 are off-topic here. You could try to ask this question on Server Fault for a business network.
    – Ron Maupin
    Commented Nov 8, 2019 at 13:59
  • @RonMaupin this isn't really about applications as much as network theory, IMO. The use of ftp, is just an example.
    – Ron Trunk
    Commented Nov 8, 2019 at 20:04
  • 1
    It wasn't specifically FTP that I was concerned about; I realize that was an example. It was files and splitting and reassembly of files that is an application-layer process, whether FTP or some other application-layer protocol/process. The answer is really about the hosts (hardware, CPU load, other software, etc. as you pointed out). The network is basically a constant in this scenario. I thought Server Fault would be the place to get a more comprehensive answer about the hosts, and I think there are actually several there already on this exact scenario.
    – Ron Maupin
    Commented Nov 8, 2019 at 20:13
  • Did any answer help you? If so, you should accept the answer so that the question doesn't keep popping up forever, looking for an answer. Alternatively, you can provide and accept your own answer.
    – Ron Maupin
    Commented Dec 15, 2019 at 20:13

3 Answers 3

1

Comment 4 in Ron’s answer is vitally important and dominant for long distance data copies over tcp. You just can’t fill up a large bandwidth pipe with a tcp socket over a long (or even medium) distance. If you are copying a large file long distance one approach is to split and use multiple tcp sockets.

Another approach is to use file transfer software designed for that task. Such software either optimizes the tcp window size or uses udp. There are both public-domain and commercial options available (but specific recommendations are off topic).

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. — Andrew S. Tanenbaum

1
  • Applications and product or resource recommendations are explicitly off-topic, so I removed that. The reason I put it on-hold in the first place is that something like that really belongs on Server Fault not here. We can discuss the TCP/UDP protocol theory, but above layer-4 of the OSI model is off-topic.
    – Ron Maupin
    Commented Nov 9, 2019 at 4:42
0

Edited:

In theory, it makes no difference. If the slowest link between the source and destination is 100Mb (for example), then the transfer speed is 1E10*8/1E8, or 800 secs (ignoring overhead).

In practice, there a few factors that can make a difference.

  1. Disk transfer speeds of the source and destination computers.
  2. Other processes that may be running on the computers.
  3. Error rates. If there are significant errors, it may be faster to send in smaller chunks.
  4. If the bandwidth-delay product (BDP) is large enough so that the slowest link is not fully utilized, multiple concurrent flows can be faster.
3
  • 1
    RTT making a single stream slower than end-to-end bandwidth allows. In that case, parallel streams would be faster.
    – Ricky
    Commented Nov 8, 2019 at 17:23
  • @ricky ... or using a larger window, ie. w/ window scale option.
    – Zac67
    Commented Nov 8, 2019 at 19:45
  • Most applications have no knob to tweak those settings. "scp" is "scp".
    – Ricky
    Commented Nov 8, 2019 at 21:37
0

If you just look at the network, leaving out all other factors (source and destination hardware and software limitations, overhead on intermediate devices/routers, ...), there are essentially two factors: bandwidth and round-trip time. For a larger network, the bandwidth of the slowest link in the path is the one relevant - obviously, your throughput cannot ever get higher than that.

If the transmission protocol sends one packet at a time, waits for acknowledgment, and then sends the next, the round-trip time totally dominates the achievable throughput: sending and acknowledging take a full round trip (RTT) and a single packet is transported. The throughput is RTT * packet size, regardless of available bandwidth (unless that is actually lower).

To get around this limitation, many protocols send a specific number of packets before the first acknowledgement is due - most prominently the TCP transport protocol where this is the send window. With the same logic as above, the achievable throughput has now increased to RTT * window size, becoming independent from the physical packet size.

Now, a large, high-bandwidth network may still be limited by the RTT - when the bandwidth-delay product is greater than the possible window size. It may be necessary to increase the window size beyond TCP's standard 64 KiB. This is where the window scale option comes in, increasing the potential window size to appr. 1 GB.

Summing up, if the possible window size is too small to utilize the network's full bandwidth, it's faster to split the data into multiple streams and transport them concurrently.

However, we've previously assumed that the network has appropriate free bandwidth to accommodate our stream. The situation changes when there is contention between the different streams and the network becomes congested. Normally, contention happens between the different streams or connections. So, using multiple connections in parallel may be able to use a larger portion of the congested network: when there are four competitors in addition to your single stream, each connection gets 1/5 of the bandwidth. If you then split your stream into four, each stream gets 1/8 of the bandwidth but your combined streams 4/8 = 1/2 of the bandwidth.

In a nutshell, splitting up a stream into multiple, concurrent ones is faster when a) the window size is insufficient for the available bandwidth or b) when there is congestion on the path and contention is arbitrated by connection

Whether you use FTP or HTTP doesn't matter much from the network perspective - both use TCP as underlying transport layer protocol and should behave very similarly. In practice, that may differ because of differences in the applications and their implementation.

Not the answer you're looking for? Browse other questions tagged or ask your own question.