5

I'm using AWS sagemaker and today I went to commit some files and got this error:

Bad owner or permissions on /home/sagemaker-user/.ssh/config
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I investigated a bit more and found that my home folder /home/sagemaker-user/ has the wrong owner

sagemaker-user@studio$ ls -la /home
total 4
drwxr-xr-x  1 root             root               30 Feb 27 14:18 .
drwxr-xr-x  1 root             root               73 Feb 27 14:18 ..
drwx------  2 sagemaker-studio sagemaker-studio   62 Feb 27 14:18 sagemaker-studio
drwx------ 22 sagemaker-studio            65534 6144 Feb 27 14:47 sagemaker-user

I tried changing ownership with chown but I can't. Not even with root!!

sagemaker-user@studio$ sudo su root
root@studio$ sudo chown -R sagemaker-user /home/sagemaker-user/
...
chown: changing ownership of XXXX Operation not permitted

I also tried with the "new owner" sagemaker-studio but using sudo with that user prompts for a password which I don't have.

Also, when logged as the default user sagemaker-user files are created with the wrong user

sagemaker-user@studio$ whoami
sagemaker-user
sagemaker-user@studio$ touch test
sagemaker-user@studio$ ls -la test 
-rw-r--r-- 1 sagemaker-studio users 0 Feb 27 15:05 test

This started happening suddenly in this sagemaker domain. Other domains are working as expected (everything is created with sagemaker-user as owner).

Sagemaker runs on centos if that helps.

6
  • Check mount for whether /home or /home/sagemaker-user` are remotely mounted filesystems? 65534 is usually a fake user ID like 'nobody' or 'anonymous' that gets assigned when you try to set root as the owner of a folder over something like an NFS share
    – Cpt.Whale
    Commented Mar 9, 2023 at 16:33
  • Yes, /home/sagemaker-user is a remotely mounter FS. This is because sagemaker data is stored in a separate AWS EFS volume. 127.0.0.1:/ on /home/sagemaker-user type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,port=20209,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) I compared that output vs running mount in a different sagemaker domain that is working as expected and the output was similar except for the address 127.0.0.1:/200005 on /home/sagemaker-user type nfs4 ... Commented Mar 9, 2023 at 17:06
  • So that volume is shared between sagemaker domains? You might be able to get away with using the same uid, but you should consider sagemaker-user@domain1 and sagemaker-user@domain2 as totally separate accounts (because they are) with different home folders. If you can authenticate between domains, then you should specify which domain's user you are when you log in. If you don't want that, then both devices should be in the same domain.
    – Cpt.Whale
    Commented Mar 9, 2023 at 17:26
  • No, that volume is only used in one sagemaker domain with a single sagemaker user. I was just comparing between two different AWS accounts to investigate why one of them is not working as expected Commented Mar 9, 2023 at 17:29
  • Ah ok, I'm not really familiar with sagemaker. Double-check which UID/GID got set with ls -n /home and which id belongs to each user? Make sure that /etc/idmapd.conf is the same on server and client, and the idmapd service is in the same state when comparing between domains. You can disable id mapping if it's misbehaving (which seems likely if new files are owned by the wrong uid), or maybe mess with sudoers temporarily to allow studio to chown without password
    – Cpt.Whale
    Commented Mar 9, 2023 at 18:04

0

You must log in to answer this question.

Browse other questions tagged .