22
\$\begingroup\$

We work with some electronic devices in our projects. We are at the stage where we need to create some test strategy to validate the lifespan (e.g. 1 to 5 years) of our devices for which we provide warranty.

We need to standardize this test strategy to provide documentation to our clients that we have our devices tested in our facility to support the given warranty period. This test strategy will focus on testing the electrical components (e.g., LCD, crystal oscillator, thermistor, LED, accelerometer, light sensor, etc.) we use in our devices, for their performance over the warranty period/lifespan.

We are thinking of creating some test strategy where we’ll degrade the performance of the electrical components we use by putting those in some extreme environment (temperature, high voltage, low humidity, etc.) for a certain period. The “time” for which the components will be inside the extreme environment will be a representation of an actual time duration. For example: one week will represent one year. This way, we are trying to achieve a testing duration of one month to reflect the wearout of the components over a 5/6 years of time.

Now my questions are:

  1. Is our testing process correct? Is it going to be accurate to some extent (e.g., for some specific components)?
  2. If we are not thinking to the right direction, can anyone suggest other ways of testing standard to validate the lifespan of electrical component provided that we cannot run the test for 5/6 year time?
\$\endgroup\$
14
  • 2
    \$\begingroup\$ You will need physical models, as a start. And you will need to test them in order to validate the models. For example, some things (such as metal migration) vary with respect to \$\propto T^4\$. For those, you need to use that kind of power function to make estimates of lifetime. But there are many different scaling functions. Use dimensional analysis to help you out. It's a huge help. \$\endgroup\$
    – jonk
    Commented Mar 1, 2021 at 8:48
  • 7
    \$\begingroup\$ There is no need to re-invent the wheel. Tests like this already exist and are used and there are specifications for them. Just follow those! These tests are used for product development and are called HAST: en.wikipedia.org/wiki/Highly_accelerated_stress_test and test-navi.com/eng/research/handbook/pdf/… \$\endgroup\$ Commented Mar 1, 2021 at 9:04
  • 3
    \$\begingroup\$ Have you calculated the product lifespan based on MTBF figures for the components and the type of environment it is used in. This is usually the first place to start. After all, why waste time on expensive testing when you can estimate the average lifespan based on component information and stress levels. \$\endgroup\$
    – Andy aka
    Commented Mar 1, 2021 at 9:13
  • 4
    \$\begingroup\$ It's a big learn but, MIL-HDBK-217F is usually quite a good source for evaluating MTBF for a product and people still use it to theoretically get a number. It has its critics of course. \$\endgroup\$
    – Andy aka
    Commented Mar 1, 2021 at 10:10
  • 2
    \$\begingroup\$ More important than testing is design. An electronic product that is well designed, in a benign environment, will usually last a very long time. Well designed: Derating, especially on power, to keep parts from getting hot. Voltage derating on caps so they aren't stressed. \$\endgroup\$
    – Mattman944
    Commented Mar 1, 2021 at 13:08

5 Answers 5

22
\$\begingroup\$

It is excellent to test your product like you describe but it is not the easiest way to provide a rationale for the lifespan. The following method is the one I follow in the medical devices industry :

Step 1 :

The more important thing you first need to have is a good definition of the use cases your electronic components will face. You mention a target lifespan of 1 to 5 years, but will the components be always powered during this period for example (LCD working only 10 hours a day)? By making this you will extract a list of the minimal ON time you must ensure for each electronic parts of your design.

Step 2 :

For each component, investigate the datasheet and look for the mean time between failures (MTBF) or the mean time to failure (MTTF) and compare it with your use cases. In this example, if you look at page 11 you will find a lot of useful information. The manufacturer did it well, so don't waste time and money by redoing tests :

lifespan information

Step 3

Well, the previous example is a nice one but you will face laconic datasheets where you will not find the information you are looking for. In this case, before doing tests you should contact the manufacturer and ask for the information. Sometimes you will be able to obtain useful information that was not made public in the datasheet (I still don't know why but it can happen).

Step 4

If in the end you were not able to find any information but you definitely need it (required for a certification or as you mention to know where it is more likely to fail), you will have to do the test. In my company there is a whole department dedicated to this so you should not underestimate the work and the money it can involve. We have a lot of tools like programmable power supply, programmable loads, expensive probing tools to continuously monitor changes, .... The merging of all the data requires a very good knowledge of statistical methods. Also, the derating of components when subjected to high temperature, high humidity levels (among others) is a complex domain and should be handled with care. But definitely if you reach this step, I strongly recommend to look for experts in this domain. They will advise you according to your needs, your deadlines and your budget. Most engineering consulting companies provide this service.

\$\endgroup\$
2
  • 2
    \$\begingroup\$ Many thanks for your very well constricted guideline. You are right, we should not consider doing the MTBF/MTTF for the individual component which we can collect from the manufacturer. So we are also evaluating that approach. At this point we are thinking of generating the MTBF/MTTF for our entire product as a whole. But still we need to move cautiously so that we do not reinvent the wheel. The idea for consulting with subject matter expert was extremely helpful. \$\endgroup\$
    – MKSJ
    Commented Mar 2, 2021 at 11:09
  • \$\begingroup\$ @SJMK Notably, manufacturers don't always write MTTF in the datasheet, you often have to poke the silicon vendor with a support ticket to get access to such reports. \$\endgroup\$
    – Lundin
    Commented Mar 3, 2021 at 11:54
8
\$\begingroup\$

We are thinking of creating some test strategy where we’ll degrade the performance of the electrical components we use by putting those in some extreme environment (temperature, high voltage, low humidity etc…) for a certain period. The “time” for which the components will be inside the extreme environment will be a representation of an actual time duration. For example: 1 week will represent 1 year. This way, we are trying to achieve a testing duration of 1 month to reflect the wearout of the components over a 5/6 years of time.

Is our testing process correct? Is it going to be accurate to some extent (e.g. for some specific components)?

Yes, I believe you're in the right track. We do the same processes (Highly Accelerated Life Testing) at the manufacturing company I work at, like:

  1. Subjecting PCBAs under 125°C @ 85% RH for 336 hours, among other temp/RH/durations combinations
  2. Thermal cycling (-60°C ←→ +150°C, or a subrange within this range)
  3. Multi-axis vibrations
  4. Salt-spray tests
  5. UV exposure/ ageing
  6. ESD tests

From which the lifetime of the product or PCBA is determined or extrapolated, aside from MTFF and MTBF calculations already discussed above.

Edit: I'd like to add that...

  1. Most of these tests require that the product/ PCBA/ DUT is active i.e. powered on, and
  2. These tests are destructive and will require multiple fresh DUTs for each tests
  3. Would require that the DUT/ product is, as much as possible, setup in a way the customer will use it i.e. in its final enclosure, with the final matching components for RF, etc.
\$\endgroup\$
6
\$\begingroup\$

Ever heard of Predictive Maintenance techniques? It's useful to detect anomalies, diagnose the root cause of faults, and estimate RUL (Remaining Useful Life) by using machine learning and time-series models.

MATLAB has the toolbox, ebook, videos, documentation and reference example of it.


Example of Wind Turbine High-Speed Bearing Prognosis https://www.mathworks.com/help/predmaint/ug/wind-turbine-high-speed-bearing-prognosis.html real-time RUL estimation. wind-turbine-high-speed-bearing-prognosis

\$\endgroup\$
4
\$\begingroup\$

HALT Testing and several parameters to test again were already menitoned by the other answers. Statistical considerations were described well and all relevant environmental parameters were accounted for.

However it is crucial to clearly define test goals and not to mix up different test methods. That's what I want to add here.

HALT (highly accelerated lifetime testing)

The original purpose of HALT is not to proove the fitness of your product for a certain lifetime under certain environment. In fact the purpose is somehow to disprove it. (Take that sentence with a grain of salt)

At the end of a successful HALT assessment the DUT will be dead. And if an iterative process has been chosen, a lot of DUT will be dead.

The iterative HALT process is a method to find weaknesses. Environmental parameters and working conditions of a device are worsened step by step until failure. The failure is investigated in detail. In most cases this failure is then fixed by ad hoc or preliminary measures just to make a continuation of the test possible. The next failure (hopefully at a different point of the device) is recorded and investigated. This is repeated until the device can't be further fixed for a next stress level. During this process a lot of potential weaknesses arise. These weaknesses can then be assessed and decisions can be made whether the weaknesses are problematic for normal use cases or not.

Type tests

Different from HALT tests are type tests. These focus on single parts or even raw materials like insulators. E.g. prepregs, cores, conformal coatings can be tested. The type test tries to prove the real lifetime. By applying adequate reasoning a tradeoff between stress level and test time is calculated to prove, that a material or part will survive the desired lifetime at expected conditions. For dielectrics a common formula used relates time and voltage like this:

$${U_{op}}^6 * T_{lifetime}={U_{test}}^6 * T_{test}$$

This way, the test time for the dielectrics can be reduced drastically. Be aware, that this test basically will effect the weardown during lifetime. Hence it also will damage the product. After testing with no dielectric breakdown the lifetime is proven. Nevertheless a microscopic analysis of the DUT is recommended.

Tests on products

Hi-Pot tests are often used to find defective parts during production process. IPC recommends voltage levels of 250V up to 500V. This is done typically to prove the electrical safety (to prevent harm to humans) of a product certified for certain insulation classes or other code. For reliability testing, this is however of limited use, as the tests last typically only milliseconds and the extended behaviour under stress can't be monitored.

Real lifetime tests

Are recommended. They may bring up things not related to mean environment conditions or even the other way round. Having a device operate for years under expected conditions may collect events from the outside you haven't thought of in advance. Even if you are only some month ahead of the sold devices operating in the field you still gain the opportunity to react to unforseen failures in some cases.

Updates I really find noteworthy

MTBF

Other answers mention the MTBF and give a proper explanation. However it is not easy to get a reasonable MTBF for your product. If your product consists of subcomponents (and that is to be assumed when it is some electronic device) which may or may not have a given MTBF for themselves it is a fairly difficult task to derive a total MTBF. That's because the failure rates change during lifetime and adhere different distributions. This was, as well explained by other answers.

However if you are interested in the calculation methods used for gaining a prediction of lifetime and failure rates you should read something about

FTA

This is an acronym for fault tree analysis which breaks down your device in a hierarchical way down to the single components. Each component or feature is then assigned a FIT (failure in time) value. The FIT values are then filled into a calculation scheme driven from the hierarchical view. An experienced engineer then can then take into account the cumulative effects of different failure mechanisms by setting parameters accordingly. Single points of failure become more visible and you will see which compoments will cause the device to fail most likely. You can then reselect components which will bring better FIT values. E.g. for capacitors you can select types with higher voltage or temperature rating.

FMEA

If you think you have set up a test strategy and have calculated some statistics, you still may want to assess risks for certain scenarios. The way to go is the Failure Mode and Effects Analysis. This is a method which comes in a lot of different flavours. It can be used during design phase but as well for examining potential problems during manufacturing.

Last but not least, it is very unlikely that you will rule out every failure in advance. You can try and do whatever you want, the most obscure error will strike, when you don't expect it. In most cases you will have limited time, to track the error down. When you are in that pinch, have a look at

8D

which is a superseding methodology to deal with real failures. 8D, properly executed until the 8th step, can help you to

  • quickly get the really very cause of the problem (so called root cause)
  • find a way to not let the error terminate your business immediately
  • implement a solution which is not duct tape
  • learn a lot, learn a lot and learn a lot.

I have another very personal message in terms of failure:

Failures are opportunities

Failures give us the chance to learn a lot. Not only about the mistakes made and the pitfalls of some tech thingies, but also about the greater link of different fields of working. Therefore I recommend a mind change. Most people live in fear of failures. I tend to live in fear, too. But what does fear do? In many cases it makes us avoid the perimeters of hardship and hazard unconciously. I have been and I'm still working on devices with utmost complexity and our team tries to hail every problem or failure we encounter as a possibility to grow better. This way we manage to reduce the uneasy feeling when thinking about the million ways our products may break.

Try to build a culture with your team on how to deal with and think of failures.

\$\endgroup\$
1
  • \$\begingroup\$ Here, I’d like to mention that our products are powered by 3V batteries with approximate pic current of around 850µA. And we have the capacity to dispose enough devices for the HALT test (if we chose to go that route) to reach to the point from where we can measure the Life Expectancy of our product as whole after the complete assembly. I really appreciate your suggestions. Your “Type Test” concept was very helpful. We’ll look into this. The “Real lifetime tests” is what we are trying to bypass for this moment and come up with some alternative approaches. Many thanks! \$\endgroup\$
    – MKSJ
    Commented Mar 2, 2021 at 18:30
2
\$\begingroup\$

MTBF is based on very broad statistical models. Depending on method and parameters you will get results all over the place. I find it to not be suitable for real lifetime estimations. At best it can be useful for redundancy calculations and identifying the weakest part in an assembly. (assuming they are all calculated with the exact same method). However: Excluding statistical analysis of in-field instruments, MTBF calculations it is the only method that can provide what you wanted: A number on how long your device (might) last. If you are interested in getting that number you can purchase MTBF calculators where you input your BOM and a some environmental parameters and it will give a statistical approximation on your lifetime. These are not expensive calculators- there are lots of hits on "MTBF calculator" on google. (including some free ones)

However: The MTBF calculator has no idea if your LDO is loaded to 105% of its rated current. The calculator assumes that your component fails at the same rate as the component in the statistical database.

If you want a robust product you need to apply appropriate derating considerations and other reliability methods during design, run HALT tests during prototyping (a test that would immediately identify that overloaded LDO) and finally ESS testing during manufacturing. You will get very robust products and you can expect a couple of decades of lifetime. 30 years is not unheard of.

If your goal is 1-5 years you don't need any of the high-reliability methods in this tread. I might even say that you don't need to design for reliability at all.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.