39

Just wondering, since right now I'm importing all my pictures from a CD my dad burned for me. I was curious if 5 GB of pictures took the same exact amount of time as 5 GB of text when doing these kinds of transfers. Since there might be 'overhead' associated with the different file formats even if they are cumulatively the same size...

edit: it's actually not a CD-rom but a DVD-R

5
  • 11
    5 gig is 5 gig unless its not.
    – Xavierjazz
    Commented Sep 30, 2011 at 4:24
  • 2
    Can't argue with that... Commented Sep 30, 2011 at 5:41
  • 35
    Which is heavier: a ton of bricks, or a ton of feathers? Commented Sep 30, 2011 at 9:29
  • 1
    See my answer (and the other good ones that higlight different factors) before dismissing this as an obviously bad question. 5GB may be 5GB, but the effiency of the pipe the data travels down makes a difference. Commented Sep 30, 2011 at 13:24
  • 1
    @Graham: Which is heavier, a pound of feathers or a pound of gold? (answer) Commented Oct 2, 2011 at 10:59

7 Answers 7

75

The answer is "it depends". Depends on what you mean by "download".

If you're downloading from a web site, than some sites automatically compress files "on the fly", and text compresses very well, while JPEG is already compressed, so it won't compress at all. In this case, there will be a big difference.

If you're just using a copy command to copy files from one computer to another, than there will be no difference. However, if you're employing some kind of a specialized tool, then again, it depends if that tool uses automatic compression or not. The only difference between jpeg and text is a possibility to compress the files.

There is no difference in the 'overhead' associated with file transfer, no matter what the file is.

11
  • 29
    In the case of a copy, if the overall size is the same then the number of files is more likely to have an impact as there is overhead in transferring the file/folder metadata.
    – Chris Nava
    Commented Sep 30, 2011 at 4:27
  • 2
    @chris-nava: Yes, this is very true. I've only considered files of the same size, but you're correct to point to this nuance.
    – haimg
    Commented Sep 30, 2011 at 4:34
  • 2
    @DarkTemplar: it includes the metadata. Almost always. Usually the amount of metadata stored "outside" the file is pretty limited: file name, permissions and some access times. Many file systems have an option to store arbitrary (even large) meta-data "outside" the file, but that is rarely used. Commented Sep 30, 2011 at 6:16
  • 4
    The transfer mechanism could also be a source of delay. For example SMB (Windows File Sharing) is BAD at transferring large numbers of small files while NFS or FTP are much faster for the same file set.
    – Chris Nava
    Commented Sep 30, 2011 at 14:56
  • 4
    I'm surprised no one has mentioned the possibility of an anti-virus adding in some significant overhead. Many anti-virus applications scan JPEG files for viruses, and ignore text documents. This could definitely contribute to the it depends factor. Commented Oct 1, 2011 at 0:02
17

With 5GB of pictures you are likely to be talking about a few thousand reasonably sized files, say 3MB+ each. If you were download 5GB of text files, you'd typically expect each file to be a lot smaller. So you'd likely be dealing with an order of magnitude or two extra files (hundreds of thousands or millions of files).

Copying lots of small files takes longer than copying the same amount of data in bigger files. There is a reasonable overhead in creating each individual file.

Not enough to make a massive difference probably, but still a difference.

2
  • 3
    I think this can make a large difference. Copying a hundred 30K text files can definitely take longer than copying one 3MB file, depending on where you are copying to and from. Commented Sep 30, 2011 at 14:38
  • +1 For addressing the real issue here. By far the best answer.
    – wnrph
    Commented Nov 12, 2011 at 18:51
12

The "It Depends" in ftp is in the fine details.

ftp Binary mode just a straight transfer and will take the time it takes for 5GB.

If you're going from Windows to Linux as an ftp text transfer (for surprisingly, plain text), ftp actually changes the line endings from /r/n to /n and vice-versa. There's probably a little overhead in the streaming replace, but with 5GB of text, you'll have less to write to disk going from win to lin as you drop one character per line, and more going from lin to win as you add one character per line.

So, is it 5GB on Linux? or Windows?

Enough pedantry for one night, going to bed!

2
  • How did we get to FTP ? The OP appears to be copying from the DVD drive to a local drive ? Commented Oct 1, 2011 at 10:12
  • From the title. 'Twas late at night and I answered the question, not the paragraph below it. As did the highest voted poster in his initial paragraphs. Now for copying from one media to another... Commented Oct 1, 2011 at 20:19
3

There is no overhead associated with files themselves, but some storage/transfer facilities support automatic compression, and that may introduce a difference.

When copying from DVD to an uncompressed drive, there is no difference. When copying to a compressed NTFS drive, text will take less space than JPEGs.

When downloading from HTTP server that uses compression, text will take less time to download. But if the server does not use compression, there will be no difference.

Also, talking about overhead, a million of small files 5GB total size will take more [actual] space and usually more time to copy than a single 5GB file, because that 5GB does not include space needed to store file names, dates and other metadata.

3

This is meant to be an addition to the other answers that address compression, etc as factors that affect efficiency and download time.

One point that hadn't been mentioned yet is packet efficiency. I doubt most people have even come across this, so here's a brief bit of background.

Before venturing into using web services, we wanted to know the difference in efficiency between using them and using a more "standard" database connection (Such as OleDb, System.Data.SqlClient, JDBC, etc). We had our guru put packet sniffers in place to track the data streams across the network to see the difference.

We expected that using web services would be less efficient because of the binary format of the other types of connections, and the added overhead of the XML tags used to describe the data.

What we found was that the web services were, in many cases MORE efficient, at least on our network. The difference was that when were transferring binary data, some of the bytes within the packets were empty, but when sending text data, the packets were used more efficiently.

We found this interesting, and tried it while transferring different sorts of files, and found that as a rule, plain text going over the network always used 100% of the bits available in each packet, where binary transfers often had unused bits. Why this is, I couldn't tell you, but several experiments bore this out.

Several comments on the question seemed to dismiss this as an obviously flawed question, but it's really not. Even though the amount of data remains the same, the efficiency of the pipe, matters as well.

Because I can't resist making analogies that a non-IT person would understand:

A single shelf in a freezer in a grocery store has x amount of space, yet you can fit more gallons of ice cream on a shelf if the containers are square than you can if they are round, because of the wasted space created by using round containers. Our tests, although counter-intuitive at first, told us what any grocery store stocker could have told us.

1
  • 2
    What was the database involved? Different RDBMS are more or less "network efficient" than others. You measured from connection establishment or just the dataset data? I'm really curious. Commented Sep 30, 2011 at 22:42
1

Traditional wisdom says that 5GB is 5GB. However, there are some scenarios where these two are not alike; it has to do with a difference in how the files' data is structured.

First off, JPEGs are compressed. To view the image, the file must first be uncompressed, and for the overwhelming majority of such images you must have the whole file to do this. There are progressive JPEGs that provide an iteratively sharper picture as it is loaded, but they're rarely used anymore in an age where DSL and other high-speed connections are very common. Text, on the other hand, is more or less streamable; as soon as you have a byte (or two or four, depending on the UTF encoding used), you can show that character. Even the oldest data transfer mechanisms can load text faster than you can read it. So, a 5GB JPEG would take longer to be able to display something than a 5GB text file.

Second, also because JPEGs are compressed, they don't work well with browsers or file-transfer programs/protocols that compress large amounts of data before transmission. You can see this by ZIPping a ZIP file; unless the second ZIP process was configured to do more compacting (slowing it down), you won't see much difference in size. That means that when using one of these tools, 5GB is not 5GB; the JPEGs are still going to be about 5GB, but the text can be compressed, maybe down to 1GB or less. If you were comparing 5GB of bitmap files to 5GB of plain text the comparison would be much closer.

However, simply moving 5GB of files from one computer to another using NTP, FTP or HTTP, without any compression or "doanload booster" mechanism used, will take about the same time overall; any difference would be a result of differing network traffic levels at any given second during each transfer.

5
  • I've never heard of interleaved JPG. Are you conflating progressive JPG with interleaved GIF/PNG?
    – fluffy
    Commented Sep 30, 2011 at 17:41
  • The "Progressive JPEG" variant is an interlaced format much like interlaced GIF/PNG. The term "progressive" for JPEGs is confusing IMO, because of well-known terms like "progressive scan", "720p(rogressive)" and "1080p". Those terms all indicate that an entire frame is drawn in full res in one pass instead of in two interlaced passes, the exact opposite of "progressive" JPEG display behavior.
    – KeithS
    Commented Sep 30, 2011 at 19:09
  • 1
    But that's not how progressive JPEG works. It's not an interlaced/interleaved format like GIF or PNG (or DVD video, for that matter), it's an iterative refinement of DCT blocks. An in-progress progressive JPEG has full pixel coverage - it's just at a lower bitrate. JPEG doesn't deal with things in scanlines like GIF or PNG either, it deals with them as a collection of square groups of pixels.
    – fluffy
    Commented Sep 30, 2011 at 20:55
  • Tomato, tomahto. The image is originally displayed using a subset of the full image data which comes in early, then refined with the rest of it. That was my point. Whether it's lines or blocks it's a multi-pass loading style as opposed to a one-pass.
    – KeithS
    Commented Sep 30, 2011 at 23:50
  • It's not just a minor terminology difference as you imply, but this is turning into a brick wall argument for no good reason. I was only trying to suggest a minor edit for you to make to your answer, not trying to get into a pissing match.
    – fluffy
    Commented Oct 1, 2011 at 0:41
0

5 GB from an optical drive should be the same - if JPG or text. Transfered via the net, I remember the times of modems, which had, depending on the hardware, a builtin compression, so that an already compressed 5 GB JPG would not be further compressed, but a text of 5 GB would normally have much potential for compression.

So why isn't this used for harddrives? Maybe you would need too much logic on the harddrive, too vulnerable the compression heating the harddrive too much, and too easy to compress data explicitly, if wanted? Maybe it exists for some drives?

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .