While going through the Amazon documentation was helpful (as per @Ouroborus's answer), I finally figured how to repair this mess that I got myself into.
Let's see if I can recall all the steps...
Prepare a New Instance
Easiest way to match your existing instance as closely as possible is to go under My AMIs and select the same AMI image used by your problem instance.
Since this is just an instance for recovery purpose, pick the Free instance type:
This step is Very important! Make sure to match the subnet mask in this recovery instance with your problem instance (otherwise, you won't be able to mount the problem EBS to your recovery instance! I found that out the hard way...)
You can click "Next: Add Storage", leave that page as is, and click "Next: Tag Instance".
In the Value field, type something like "Recovery" (any name will do, in my case it's just to mark the purpose of this instance).
There should be one last step / popup where it prompts you to create the inbound / outbound security group. Make sure to pick the same one you should already have setup for your problem instance. Meaning, you can reuse the same SSH key file to login into this instance (via Putty or whichever program you prefer).
Once that Recovery instance is created...
- Make sure to keep track of your EBS volume names (as you will need to mount / unmount your problem volume between this Temporary instance and back to your original instance).
- Mark down which path your problem instance accesses the volume (ex:
/dev/xvda
).
- Ok, now Stop (not Terminate!!!) your problem instance.
- You may need to refresh the browser to confirm it is stopped (might take a few seconds/mins).
Now navigate to the EBS Volumes section:
Unmount the Volume that is currently attached to the problem instance (you will see your instance's status marked as stopped in one of the far right columns).
- Refresh to confirm the volume is "available".
Mount the volume to your new Recovery instance (if you can't see your recovery instance in the list, you probably missed the "subnet" step I mentioned above - and you will need to redo your Recovery instance all over again to match that subnet setting).
Refresh, confirm it's now "in-use" in your recovery instance.
Now, on to the fun command line steps!
Login / SSH into your Recovery box (you can lookup your Recovery instance IP / host address in the Instances section in AWS).
Set your current working directory to the root:
cd /
Create a directory to hold your problem EBS volume:
sudo mkdir bad
Mount it:
sudo mount /dev/xvdf /bad
(NOTE: If this doesn't work, you may have encountered the same issue I had, so try the following instead: sudo mount /dev/xvdf1 /bad
, thanks to this answer https://serverfault.com/a/632906/356372).
If that goes well, you should now be able to cd into that /bad
directory and see the same file structure you would normally see when it's mounted on your original (currently problematic) instance.
VERY IMPORTANT Note in the following couple steps how I'm using ./etc
and not /etc
to indicate to modify the permissions / ownership on the /bad/etc/sudoers
file, NOT this Recovery EBS volume! One broken volume is enough, right?
Try:
cd /bad
ls -l ./etc/sudoers
... then follow that by:
stat --format %a ./etc/sudoers
Confirm that this file's ownership and/or chmod
value is in fact incorrect.
To fix its chown
ownership, do this:
sudo chown root:root ./etc/sudoers
To fix its chmod
value, do this:
sudo chmod 0755 ./etc/sudoers
Now it's just a matter of reversing the steps!
Once that's done, time to unmount:
cd /
sudo umount /bad
Back to the AWS configuration page, go to the EBS Volume section.
- Unmount the fixed volume from the Recovery instance.
- Refresh, confirm it's
available
.
- Mount it back on the original instance (DO NOT FORGET - use the same
/dev/whatever/
volume path your original instance was using prior to all these steps).
- Refresh, confirm it's
in-use
.
- Now, navigate to the Instances section and Start your original instance again. (It might take a few seconds / minutes to restart).
- If all is well, you should now be able to login and SSH into your EC2 instance and use
sudo
once again!
Congrats if it worked for you too!
If not.... I'm so.... so very sorry this is happening to you :(