0

I have had problems with my RAM for a long time and I would like to find an answer to 2 questions

My PC is a B450m Steel Legend, Ryzen 7 2700, 2x8GB Asgard Loki 3200MHz (the same problems happen in stock or xmp for that reason I'm testing at 3200MHz 1.35v)

Problems I have, BSOD, tabs crashing, games closing, apps closing, (W7 and W10)

Explanation 1, I did a lot of tests on memtest86 I had problems in Test 8 [Random number sequence], a few months ago I did a few tests and had 1 or 2 errors in test 8, in the last few days I did several tests on the same days that I had problems using Windows , in one test I did 25 passes in test 8 and had errors in 13 of them, then I turned off the PC and did 30 again in test 8 with 0 errors, on other days I did all these other tests without any errors, just test 8, 17pass, 17pass, 6pass, 41pass, and finally I passed 10 in all tests and no errors

Question 1, how is it possible to fail 13 passes out of 25 and nothing happens in the others? If the RAM is bad and giving problems for several days in a row on Windows, shouldn't the error frequency in test 8 be higher?

Explanation 2 (more out of curiosity as it doesn't make sense what happened) a long time ago I had the problems more constantly, I reinstalled the video drivers every few days and that helped to improve, I stopped messing with the drivers and for a while 3 months I only had 1 or 2 errors per month, after that I needed to install an audio driver and the problem returned more constantly, it was increasingly clear that it was something related to Windows as I had not found any error with memtest86, aida64 and prime95

Question 2, if the problem is in memory as we saw in explanation 1, why would changing the drivers influence the problem so much if the problem is in memory? It doesn't make sense.

Here are the tests that I had errors in, the first two are the old ones, the last one is from the last few days (they are all "bits in error 63, what is that)

https://i.imgur.com/7w3ll5z.png

1 Answer 1

4

The reason for all the different types of tests in the program is because there are many subtle ways that memory can fail.

Memory might have a single bit that is stuck or floats high over time, it might have a bit that only floats high with a particular pattern in the other bits. Your problem does not seem to be that kind of problem though, as the address seems to shift a lot.

What is common though is where the error occurs in the bytes.

Expected: 526BC02A
Actual:   5263C02A

Expected: 00DFF6AF
Actual:   00D7F6AF

Expected: 0AC343BB
Actual:   0ACB43BB

Your problem is always in the topmost bit of the third byte. It transposed a "B" to a "3" and an "F" to a "7" and there is even a "3" going the other way to a "B". It looks to be the same bit error at different addresses.

Those errors are always at the same bit and thats what the "Bits in Error" is telling you. There is one particular bit in a 64-bit memory transfer "word" that is throwing an error.

To me that suggests a dry solder joint under one of your memory chips on one of your modules. It mostly makes contact and works a good amount of the time but temperature and vibrations can cause it to make marginal contact and sometimes it is not being set or read right.

Alternative options are a slightly dodgy connection at the DIMM slot or under the CPU.

The problem is that due to your system likely having dual-channel RAM Memtest has no visibility of which module has an issue.

It is odd that it only shows up in random tests, but it could be that more ordered bit patterns are causing less noise and interference at that dry joint and so hiding it somewhat.

It could be that it is a problem with power to a specific area of the DDR chip (each chip will have multiple power pins around the data and address bus), and ordered patterns make for better power supply use that masks the problem.

If it were a problem within the memory chip I would expect the same address every time, but the same bit position at different addresses suggests a problem slightly outside the silicon.

Changing drivers was likely a red herring. It may have caused different temperature conditions that masked the problem, or it coincided with cooler or warmer weather which allowed the joint to make better contact.

You should figure out which module specifically is faulty and replace it. For performance you might want to replace both.

1
  • Thank you very much for the great answer and explanation, I hope it's just a RAM problem which is most likely, if I solve this problem in the future I will comment here if it is still open
    – kunotty
    Commented Jan 19 at 12:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .