7

A programmer at my work, who has used Linux all his life, was berating Windows for having a case-insensitive file system (among other things), which he said is the worst idea possible and can never be beneficial. I said that was just because he was used to case-sensitive filesystems, and that it makes a lot more sense to have a case-insensitive file system (part of my reasoning being that my name is David, but if you referred to me david I would still know you meant me, and the same should apply to files). He then explained his position, stating that a case-insensitive filesystem must incur a performance hit.

So now I'm wondering... how does a case-insensitive filesystem access files? Let me try to explain what I'm thinking:

Say you have a case-sensitive filesystem (and OS kernel etc.) such that in practical terms, if a directory exists called exampleDir, I must type exactly cd exampleDir to cd into it. If I type cd exampledir, I should receive an error that the directory does not exist. This seems like a simple case in my mind. When I type the command, the filesystem can simply take the exact characters I typed (ignoring what the kernel might do to add the current working directory path to the string and so on) and begin running through the list of available filenames, doing a direct compare on each name; for example:

for(var i = 0; i < files.length; i++) {
    if(filename == files[i]) return true;
}

Now the interesting part, the case-insensitive filesystem (assuming case-preserving, as per Windows). In practical terms, if a directory exists called exampleDir, I could type cd exampleDir or cd eXamPleDIr and I would still succeed in getting into the folder. What I really want to know, is what does the code look like to achieve this. In order to preserve case, the directory name must be stored with its case. So does that mean you have to do two conversions to lower or upper case every time you want to access a file by its filename? How much of a performance hit does that translate into? Are there any tricks used to reduce the performance decrease from using a case-insensitive filesystem? This is how I imagine the filesystem code would have to look:

for(var i = 0; i < files.length; i++) {
    if(toLowercase(filename) == toLowercase(files[i])) return true;
}

Please Note: Since it seems this wasn't clear from my question, I'm absolutely not asking which type is better, nor am I asking what the advantages and disadvantages are. I am only asking how (in technical terms) a case-insensitive filesystem deals with the fact that humans can type a filename with random case.

11
  • 3
    Your wording is sloppy. You're not referring to "case-insensitive file system", but simply case-insensitive filenames. Huge difference.
    – sawdust
    Commented May 12, 2017 at 0:47
  • HFS+ also has a case-insensitive version
    – phuclv
    Commented May 12, 2017 at 1:06
  • 1
    @sawdust Sorry? In what way? I can't think of any other meaning that could imply.
    – Clonkex
    Commented May 12, 2017 at 1:54
  • @LưuVĩnhPhúc I know. Not sure why you said that.
    – Clonkex
    Commented May 12, 2017 at 1:54
  • 1
    @sawdust a) I AM a programmer, and I absolutely prefer case-sensitive languages over case-insensitive ones (such as BASIC). b) You still haven't explained what this "huge difference" actually is. I think it's pretty obvious what I meant. c) I actually choose my words very carefully in nearly everything I write. d) Excuse me? What I consider a "performance hit" is a decrease in performance due to an increase in workload. I never said anything about magnitude. e) Wtf? In what way is that "ludicrous"? Care to elaborate rather than just insulting me?
    – Clonkex
    Commented May 12, 2017 at 3:45

2 Answers 2

4

Operating systems generally work with handles. An "open" function is called, which specifies the filename, and a handle is returned. Further I/O calls take a handle, not a filename.

Other functions that require file name would be creating files, listing a directory, and deleting files.

So any performance hit with dealing with case insensitivity is not really going to affect much actual I/O, just file management.

Some programs use lock files to indicate resources are in use. This could translate to a lot of creates and deletes.

However, the overhead of doing two comparisons instead of one is likely a matter of a few additional assembly language instructions. Meaning less than 50 or so cycles. Maybe 500 or 5000 if cache misses come into play.

It's really, really not worth worrying about unless you literally are worried about the performance of creating/deleting billions of files in a short amount of time. High disk I/O applications include things like databases, and databases typically open a few very large files and keep them open while the database is being used. So those sorts of applications - one that typically requires all the disk I/O that there is - do not make a lot of calls where the filename has to be parsed.

The speed of the medium is going to be a bottleneck far before the time in dealing with filenames even approaches it.

2
  • 1
    Now we're getting somewhere. That last sentence is the reason I've chosen this as the correct answer, and not something I had considered. It puts the extra CPU time in perspective and makes it not seem so unlikely. However I will point out: Yes, I'm aware once you've opened a file you refer to it by the handle (imagine if every operation was by filename!), and also it's not the difference between 1 or 2 comparisons, it's the difference between a comparison or a comparison & 2 function calls. But a good answer nonetheless.
    – Clonkex
    Commented May 13, 2017 at 0:45
  • @Clonkex: I do recall it causing a problem once and having to stomp something so it reverted to case sensitive files. Something about linear scanning the director vs h-tree lookup.
    – Joshua
    Commented Nov 12, 2020 at 18:40
0

If you assume that the filesystem itself is case sensitive, insofar as it allows you to store a filename using upper and lower case characters without restriction, then for certain operations there must be some kind of performance penalty.

For example say you have a file foobar.txt and then you tell your program to save it as fooBar.txt without checking yourself whether it already exists.

For every file you create on a case-insensitive system it needs to do only one search - the exact filename you specified. Save, done.

For every file you create on a case-sensitive system it has to either search for every combination of "foobar" "Foobar" "fOobar" or it has to buffer the list of files and then convert the entire list of filenames to lower or upper case and do a search on that to see if there are duplicate files. The same goes for reading files, if an exact match didn't work then it must check all the possibilities.

There is a massive difference in the amount of work that the filesystem driver has to go through to check for the existence of a file.

For reading filenames there is much less of a penalty to the system, in almost all cases the filesystem driver will just pass the list of files up to the program that requested it. I'm sure I've seen people mention that you can create "duplicate" filenames on an NTFS filesystem using a case-insensitive​ system like Linux and Windows just deals with it.

Case-insensitive systems do involve an amount more work on the programmers side, but it slightly simplifies the view of things from a users perspective. There are pros and cons for both ways of doing things.

For one I can see a problem in case-sensitive systems for case dependant programming errors when reading files. If your program hardcodes a request for /etc/fish and someone renames it to /etc/Fish (or you forgot to hold shift for the "f") then you will get an error you otherwise would not have had on a case-insensitive system.

It's all about where you are putting your effort and there are tradeoffs in both ways of doing things.

4
  • 1
    On a side note: if he's arguing about such trivialities then he's either trying to get a rise out of you to defend your favourite system, which he apparently managed to, or he is insecure about his favourite system. In either case the decisions were made for both systems for a reason and both have their benefits. Posturing that "mine is better than yours" helps no one in the long run. Just accept that there are differences, see how they do affect things, and consider how both ways can work better in the future.
    – Mokubai
    Commented May 12, 2017 at 7:38
  • Dangit, I had a really long comment written and accidentally clicked off the page :facepalm: Well anyway, the gist is that re: your answer (which I greatly appreciate, I can see you've put a lot of effort into it), the first paragraph seems totally opposite to what I would expect (see my edited answer), and re: your comment, I was definitely not saying "mine is better than yours". I absolutely agree that both have advantages. Gee wiz, why does everyone seem to think I was trying to say one was better than another...
    – Clonkex
    Commented May 12, 2017 at 12:58
  • @Clonkex in order for you to see any upper and lower case filenames the actual filesystem itself (as stored on the disk) must be case sensitive, the filesystem driver on the other hand can be programmed to ignore or work around that sensitivity and it is there that the actual work is done. The problem as I see it is that you are conflating the filesystem and it's driver when they are in fact two distinct, albeit closely related, things.
    – Mokubai
    Commented May 12, 2017 at 13:18
  • That's what I meant by a case-preserving case-insensitive filesystem. It doesn't really matter what part of the chain (from the cmd prompt window right down to the filesystem itself) actually does the work to support case-insensitivity, I just wanted to know how that case-insensitivity is achieved. It seemed incredible that it might actually have to do the equivalent of a toLowercase() on every filename to compare, but LawrenceC put that in perspective and I now realise the extra CPU time is insignificant compared to the IO wait of the disk (even for an SSD).
    – Clonkex
    Commented May 13, 2017 at 0:56

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .