8

I have a C++ application test that creates 10,000 files in an NFS mounted directory, but my test recently failed once due to one file appearing twice with the same name in that directory with all the other 10,000 files. This can be seen on either Linux Centos v4 or v5 where the directory is NFS mounted, but not on the host machine where the disk resides.

How is it even possible to have two files with the same name in the same directory?

[centos4x32 destination] ls -al ./testfile03373
-rwx------  1 user root 3373 Sep  3 03:23 ./testfile03373*
[centos4x32 destination] ls -al ./testfile03373*
-rwx------  1 user root 3373 Sep  3 03:23 ./testfile03373*
-rwx------  1 user root 3373 Sep  3 03:23 ./testfile03373*
[centos4x32 destination] ls -al *testfile03373
-rwx------  1 user root 3373 Sep  3 03:23 testfile03373*
-rwx------  1 user root 3373 Sep  3 03:23 testfile03373*
[centos4x32 destination] ls -alb test*file03373
-rwx------  1 user root 3373 Sep  3 03:23 testfile03373*
-rwx------  1 user root 3373 Sep  3 03:23 testfile03373*

Running the Perl script suggested in one of the answers below:

ls -la *03373* | perl -e 'while(<>){chomp();while(/(.)/g){$c=$1;if($c=~/[!-~]/){print("$c");}else{printf("\\x%.2x",ord($c));}}print("\n");}'

gives:

-rwx------\x20\x201\x20user\x20root\x203373\x20Sep\x20\x203\x2003:23\x20testfile03373*
-rwx------\x20\x201\x20user\x20root\x203373\x20Sep\x20\x203\x2003:23\x20testfile03373*

Printing with the inode (-i) values shows the two copies have the same inode entry (36733444):

[h3-centos4x32 destination] ls -alib te*stfile03373
36733444 -rwx------  1 user root 3373 Sep  3 03:23 testfile03373*
36733444 -rwx------  1 user root 3373 Sep  3 03:23 testfile03373*

It would seem the directory entry is corrupted somehow.

Could my application have legitimately created this situation or is this a bug in the operating system? Is there anything I can do to protect against this in my program that creates the files?

I'm thinking there is some kind of bug in the NFS mounting software. Also 'umount' and then 'mount' of the NFS drive that has the issue does not resolve it, the repeated entry remains after remount.


Update 1: I've now hit this issue a second time, a few hours later, and the really strange thing is it happened on the exact same file, testfile03373, although it got a different inode this time, 213352984, for the doubled files. I'll also add that the file is being created on the Centos 5 machine where disk is being hosted, so it is being created locally, and showing correct locally, but all the other machines that NFS mount it are seeing the doubled entry.


Update 2: I mounted the drive on a Centos v6 machine and found the following in /var/log/messages after listing and seeing the double entry there:

[root@c6x64 double3373file]# ls -laiB testfile03373* ; tail -3 /var/log/messages
36733444 -rwx------. 1 user root 3373 Sep  3 03:23 testfile03373
36733444 -rwx------. 1 user root 3373 Sep  3 03:23 testfile03373
...
Sep  4 14:59:46 c6x64 kernel: NFS: directory user/double3373file contains a readdir loop.Please contact your server vendor.  The file: testfile03373 has duplicate cookie 7675190874049154909
Sep  4 14:59:46 c6x64 kernel: NFS: directory user/double3373file contains a readdir loop.Please contact your server vendor.  The file: testfile03373 has duplicate cookie 7675190874049154909

Additionally, I found that renaming the file causes the double entry to disappear, but renaming it back causes it to reappear doubled, or alternatively, just touching a new file with the name testfile03373, causes a double entry to appear, but this only happens in the two directories where this double entry has been seen.

7
  • AFAIK, it is impossible for two files with the same name and extension coexisitng in the same directory in any filesystem. You could use some exception mechanism in your program to prevent failure, other that this... Commented Sep 3, 2013 at 14:16
  • What filesystem are you using? Commented Sep 3, 2013 at 14:25
  • Are they precisely the same? E.g. no leading or trailing whitespace? no UTF-16 chars, ...
    – Hennes
    Commented Sep 3, 2013 at 14:25
  • What other tests can I perform to confirm they are exactly the same?
    – WilliamKF
    Commented Sep 3, 2013 at 14:41
  • Sounds like you learned how to do an end run around a vital OS sanity check. Commented Sep 3, 2013 at 15:03

3 Answers 3

8

A friend helped me track this down and found this is a bug as recorded in Bugzilla 38572 for the Linux kernel here. The bug is supposedly fixed in version 3.0.0 of the kernel, but present at least in version 2.6.38.

The issue is that the server's ReadDIR() RPC call returns incorrect results. This occurs because of the following:

When the client reads a directory, it specifies a maximum buffer size and zeroes a cookie. If the directory is too large, the reply indicates that the reply is only partial and updates the cookie. Then the client can re-execute the RPC with the updated cookie to get the next chunk of data. (The data is sets of file handles and names. In the case of ReadDirPlus(), there is also stat/inode/vnode data.) The documentation does not indicate that this is a bug with ReadDirPlus(), but it probably is there as well.

The actual problem is that the last file in each chunk (name, handle tuple) is sometimes returned as the first file in the next chunk.

There is an bad interaction with the underlying filesystems. Ext4 exhibits this, XFS does not.

This is why the problem appears in some situations but not in others and rarely occurs on small directories. As seen in the question description, the files show the same inode number and the names are identical (not corrupted). Since the Linux kernel calls the vnode operations for underlying operations such as open(), etc., the file system's underlying routines decide what happens. In this case, the NFS3 client just translates the vnode operation into an RPC if the required information isn't in its attribute cache. This leads to confusion since the client believes the server can't do this.

1
  • It's happening to me too, with kernel 3.18.17-13.el6.x86_64 (CentOS 6).I'm pretty sure it's a bug of the underlying NFS system of the QNAP TS-212 NAS which the directory is mounted on, can anyone confirm? Commented Sep 24, 2015 at 9:13
6

The disk is an NFS mounted disk. When I go to the host computer that publishes the drive, the file is only listed once.

Probably a bug, issue, or race condition with NFS.

It's possible to have two files of the same name if you directly edit the filesystem structures using a hex editor. However I'm not sure what would happen if you try to delete or open the files. I'm unsure of what tools exist on Linux to access a file by inode number (which can't be duplicated) but that may work.

Duplicate file names are something fsck would likely catch and try to fix.

Make sure none of the files have differing trailing spaces though.

3
  • I was going to suggest that the amount of writing on the filesystem ultimately broke something and allowed the existence of two identical files. Commented Sep 3, 2013 at 17:55
  • Running fsck found no issues. Rebooted both host and client machines, issue still shows.
    – WilliamKF
    Commented Sep 3, 2013 at 18:43
  • I should have been more clear - fsck is probably only going to work on the local file system, not an NFS mounted one. You probably need to upgrade/patch your nfs packages and possibly your kernel. As @somequixotic mentions, your CentOS is old and the issues you are having may have been resolved in future updates.
    – LawrenceC
    Commented Sep 3, 2013 at 18:46
4

There is a chance that you have a hidden non-printable character or whitespace in one of the filenames. You can check with by providing the -b option to ls, e.g.:

user@server:~/test$ ls -lab
total 8
drwxr-xr-x 2 user user 4096 Sep  3 12:20 .
drwx------ 8 user user 4096 Sep  3 12:20 ..
-rw-r--r-- 1 user user    0 Sep  3 12:19 hello
-rw-r--r-- 1 user user    0 Sep  3 12:19 hello\

Note the \ signifying the space at the end of that filename.

   -b, --escape
          print C-style escapes for nongraphic characters

As an alternative (though the above should work), you can pipe the output through this perl script to replace anything that isn't a printable ASCII character with its hex code. For example, a space becomes \x20.

while (<>) {
    chomp();
    while (/(.)/g) {
        $c = $1;
        if ($c=~/[!-~]/) {
            print("$c");
        } else {
            printf("\\x%.2x", ord($c));
        }
    }
    print("\n");
}

Usage:

ls -la | perl -e 'while(<>){chomp();while(/(.)/g){$c=$1;if($c=~/[!-~]/){print("$c");}else{printf("\\x%.2x",ord($c));}}print("\n");}'

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .