So to make a long story short(er), I suspect I had a bad SATA card. I have a 8 drive storage pool in Windows 11 in Storage Spaces (1 parity) that's been running great for a few months. Suddenly it started to go offline, but restarting the computer and making double sure the cables were secure fixed it for a while. Then it stopped working entirely and Storage Spaces would just freeze if I tried to open it. Today I swapped a new SATA card in and all the drives popped back up, no more freezing - great. CrystalDiskInfo, HDSentinal, and Windows all report the drives as OK/PERFECT. The pool showed it was in an offline state, but healthy otherwise. The Event Viewer had a few of these:
I figured, no problem, and ran those commands, but the volume didn't come back up. After a bunch of fiddling, I realized running this command:
Get-VirtualDisk | ?{ $_.ObjectId -Match "{bb97ba58-8273-4e7d-95c1-9eb0fa705f15}" } | Get-Disk | Set-Disk -IsOffline $false
Brings the drive back up just fine in a read-only state. Big relief. At least if all else fails, I can get the data off and try again.
But when I run this command:
Get-VirtualDisk | ?{ $_.ObjectId -Match "{bb97ba58-8273-4e7d-95c1-9eb0fa705f15}" } | Get-Disk | Set-Disk -IsReadOnly $false
Or otherwise try to bring the drive online in a read/write fashion via DiskManager or Store Spaces, I get an error and the volume goes away. The pool still reads as healthy, but it won't go into write mode. I can't seem to find any other errors that hint at a specific drive being bad or anything. Here's some additional diagnostics:
get-storagepool -isprimordial 0
FriendlyName OperationalStatus HealthStatus IsPrimordial IsReadOnly Size AllocatedSize
------------ ----------------- ------------ ------------ ---------- ---- -------------
Storage pool OK Healthy False False 160.07 TB 16.48 TB
get-storagepool -isprimordial 0 | get-physicaldisk
Number FriendlyName SerialNumber MediaType CanPool OperationalStatus HealthStatus Usage Size
------ ------------ ------------ --------- ------- ----------------- ------------ ----- ----
2 ST22000NM001E-3HM103 ZX20E51C HDD False OK Healthy Auto-Select 20.01 TB
4 ST22000NM001E-3HM103 ZX201Y4F HDD False OK Healthy Auto-Select 20.01 TB
6 ST22000NM001E-3HM103 ZX207NJP HDD False OK Healthy Auto-Select 20.01 TB
9 ST22000NM001E-3HM103 ZX204FW5 HDD False OK Healthy Auto-Select 20.01 TB
8 ST22000NM001E-3HM103 ZX205GC4 HDD False OK Healthy Auto-Select 20.01 TB
7 ST22000NM001E-3HM103 ZX203GLT HDD False OK Healthy Auto-Select 20.01 TB
3 ST22000NM001E-3HM103 ZX214L6H HDD False OK Healthy Auto-Select 20.01 TB
5 ST22000NM001E-3HM103 ZX2078LS HDD False OK Healthy Auto-Select 20.01 TB
get-storagepool -isprimordial 0 | get-virtualdisk
FriendlyName ResiliencySettingName FaultDomainRedundancy OperationalStatus HealthStatus Size FootprintOnPool StorageEfficiency
------------ --------------------- --------------------- ----------------- ------------ ---- --------------- -----------------
Storage space Parity 1 OK Healthy 121.18 TB 16.48 TB 66.65%
get-storagepool -isprimordial 0 | get-volume
DriveLetter FriendlyName FileSystemType DriveType HealthStatus OperationalStatus SizeRemaining Size
----------- ------------ -------------- --------- ------------ ----------------- ------------- ----
X Storage space NTFS Fixed Warning Full Repair Needed 110.23 TB 121.18 TB
The "Full Repair Needed" is replaced by "Healthy" when I take the pool offline. But I figured I'd try repairing. When I run:
Repair-Volume -DriveLetter X -Scan
NoErrorsFound
Same with all other troubleshooting recommendations here: https://learn.microsoft.com/en-us/powershell/module/storage/repair-volume?view=windowsserver2022-ps
At this point, I'm at a loss. If there is a bad drive, I don't see any way to identify it. If there isn't, I don't see how I can repair the space. My only option I can think of is to buy an extra drive, move all the data off, reformat/rebuild the pool, and hope this doesn't happen again. Anyone have any ideas on things I can check before going down that route?