4

Does anyone know if MTBF (Meantime Between Failures) can be used to determine when a piece of computer hardware (CPU cooler, power supply, case fans, etc) will fail, thus requiring replacement? Or is it just an unreliable statistic that can be ignored even after 3 or 4 years have passed?

My case fans and everything else that has a fan is working fine and has been for about a year. I'm just worried about the life expectancy of them after seeing what the MTBF was.

For example: I went on the vendor's website and when I looked at the case fans that I had in my case, it said that the MTBF was greater than 30,000 hours. Does that mean it will last for more than 3 years and after those 3 or 4 years have elapsed, they will fail, and I will have to replace them?

Also, with any type of hardware, does higher MTBF mean longer life expectancy?

2
  • 2
    You seem to misunderstand what MTBF is. It's the mean time (average) between failures. As a number, it only indicates the average life expectancy of the device, but in reality said device could last 1 hour, or many hours longer than the MTBF. Commented Aug 17, 2019 at 0:38
  • 2
    Close voters: this has attracted a number of close votes based on the question being unclear or opinion based. It is neither. This is completely and factually answerable. It is asking about the nature of MTBF and whether it can reliably be used to predict life expectancy. It isn't asking people to predict how long the OP's parts will last. Please read the question carefully and if you think it has issues, identify them and we can fix them.
    – fixer1234
    Commented Aug 17, 2019 at 8:43

2 Answers 2

3

The advice about MTBF not being useful for prediction is all true. But it's even less useful than people who are aware that it isn't useful tend to think.

Life Expectancy

Different kinds of failures at different stages of life

The graph of the "bathtub curve" in wrecclesham's answer is a conceptual diagram. In reality, the two ends can be proportioned a little different. For example, if a manufacturer uses good quality parts and good manufacturing quality control, the infant mortality can be very low.

Failures at different stages in a product's life are largely for different kinds of reasons. During the infant mortality period, products fail because of defective parts or manufacturing defects. If they don't fail early, they go through a period of "attrition". Some small percentage of units fail each year for random reasons. By end of life, parts are wearing out and failures happen because that's how much use the components were built for.

Big differences in build quality can affect failures at all stages to some degree. A very cheaply made product may use cheap parts, poor manufacturing precision, and have generally little attention to quality control. A high-end product would likely be the opposite on all counts.

Products in the same class may not have much difference in build quality. So, for example, the manufacturer could receive a batch of a component part with the normal life expectancy but wider tolerances. It might have a higher percentage of infant mortalities, but if it doesn't fail for that reason, will have the same service life.

Life expectancy of old parts

Life expectancy can actually work in the reverse of what you think. The life expectancy of an average unit includes infant mortality failures and random attrition failures. An old device hasn't failed from either of those kinds of causes. So a pool of just old devices will have a longer average total life than a pool of all new devices.

The longer a device lasts, the more likely it is that it's lasted that long because of its quality. It's one of the lucky few units that got the most perfect components, was manufactured with the greatest precision, and got the best handling and care. Given that you have an old part that is still working, it isn't expected to fail momentarily because that's how long the average unit lasts. Your old part has a longer life expectancy than the average unit (although that still tells you nothing about how much longer your specific unit will last).

MTBF

What it is

The term "MTBF" is used in several ways. "Mean Time Between Failures" is used as a measure of expected system uptime or availability for repairable systems. "Mean Time Before Failure" is applied to devices that may or may not be repairable, but it is often associated with end-of-life events. That's the applicable meaning here.

The name of the measure, "Mean Time Before Failure", is very misleading. It usually isn't really that. You would measure that definition by running a quantity of the units until they failed and then take an average of those times. For items with an expected lifespan of many years, that would be impractical; the products would never make it into the marketplace because they would forever be in testing.

The number is developed another way. When it isn't simply extrapolated from the bogus number for a similar model, the method often used to measure "MTBF" is to test a large number of units for a relatively short time. They divide the total test hours (test duration x number of test units) by the number of failures during that time (and they typically don't stop the clock on a per-unit basis when it fails).

What it really measures

The failures that happen during this timeframe are infant mortalities and random failures during the early part of their lives. They never get to the end-of-life failures, which is what you want to know. Only a small percentage of units fail early in their life. You can't extrapolate or derive time to end-of-life from those statistics, they really tell you nothing about life expectancy. At best, they're a crude relative measure to compare one item to another.

Looking at the events MTBF is based on, the attrition failures are random events that happen with any product, regardless of quality. A product of substantially higher quality will last longer because the parts take longer to wear out, but the random failures before that may not be much different.

So differences in MTBF tend to mostly reflect differences in infant mortality, not end-of-life. As described earlier, that could potentially reflect a difference in build quality, which might translate to some difference in life expectancy. But it could also be a case of high quality devices where the manufacturer received a bad batch of one component. If your device isn't one with the bad component, it could have a much longer life expectancy than the average unit.

Does MTBF have any practical value?

If you have a choice between two products, where MTBF was estimated the same way for both (say two different models from the same manufacturer), and one product has a substantially better MTBF, that might be a sign of generally better quality, and you might expect it to last some amount longer. All else being equal, the safer bet would be to go with the one with the better MTBF.

If the MTBF numbers are close, small differences are just statistical noise; it tells you nothing at all.

1
  • This is an excellent explanation. Commented Aug 17, 2019 at 9:02
2

No. Unfortunately not.

A component's MTBF (mean time between failure) rating cannot be used to determine exactly when a product will fail as it is simply a measure of the mean (i.e. average) time it can be used before a failure occurs.

MTBF ratings take into consideration the fact that some individual components will vastly exceed the average, while others may fail extremely quickly (also known as "infant mortality"), distilling all this information into a simple average.

In reality, depending on the product type in question, failure rates can vary widely:

enter image description here

2
  • I've been debating about your answer. What it covers, it does well. The only issue I had was the common misconception that MTBF actually represents average life or average time to failure. For items with a long service life, that's not the case because of the impracticality of testing. But for items with a short life (sort of limited to consumables), it actually could be. For example, the service life of incandescent light bulbs, so there are probably computer-related parts in the same category. Things in development for a long time can also collect service life stats. OK, close enough. :-)
    – fixer1234
    Commented Aug 17, 2019 at 9:23
  • That makes sense. You wouldn't really be able to test a new power supply model for 50,000+ hours before starting to sell them! Commented Aug 17, 2019 at 10:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .