How do you design-in reliability?

While there is an expectation that a product will be reliable over its (long) life, the market still demands value for money. Design engineers, claims Jean-Louis Evans, must therefore use test techniques that are fast and cost-effective while also producing worthwhile results that assure product reliability.

The term ‘reliability’ is internationally defined as the ability of an item to perform a required function under stated conditions for a stated period of time. However, this definition requires some further explanation in order to be a useful guide to the meaning of reliability.

The ‘required function’ includes the specification of satisfactory operation as well as unsatisfactory operation. For a complex system, unsatisfactory operation may not be the same as failure. The ‘stated conditions’ are the total physical environment including mechanical, thermal and electrical conditions. The ‘stated period’ of time is the time during which satisfactory operation is desired and is often called the service life of a product.

There are also different measures of reliability, depending on the application of the end product.

‘Survivability’ is the probability that an item will perform a required function under stated conditions for a specified period of time, but without failure. Survivability applies only to applications in which failures will not be routinely repaired, whereas the generic definition of reliability does include the possibility of repair.

‘Availability’ applies where there is the possibility of both repair and failure, and it is a measure of the degree to which an item is in an operable state when called upon to perform.

And ‘Maintainability’ refers to the maintenance process associated with system reliability and is the degree to which an item can be retained in, or restored to, a specified operating condition.

Time to market speed

In a fast moving marketplace, competition for new products means that time to market is a vital factor of success. However, the traditional approach to reliability evaluation has been life cycle testing, which involves tests carried out within the product's ‘expected environment' or using actual operational conditions.

However, this is an unrealistic approach for design engineers that need to quickly assess across the product development lifecycle if both prototypes and the final product are going to deliver on reliability.

Accelerated life testing and environmental stress screening have become increasingly accepted as methods of assessing product reliability. Not only do they give a level of confidence that a product will not develop faults after delivery, they also provide a process for the design engineer to identify any design defects or component problems.

Accelerated life testing is based on using real-life operational data, trying to accelerate fault conditions by applying key operational failure-causing stresses at levels above those that the product would experience in its application environment. This accelerated approach allows a distribution of failure times to be obtained, albeit at more stressful conditions than ordinary operating conditions. It also requires the distribution of failure times to be related to the distribution of failure times that would be anticipated under normal operational conditions. This would call for an accelerated life model to be created which is typically characterised by a linear relationship between failure times at different sets of conditions.

The key operational failure-causing stresses that contribute most commonly to the impairment of a product's reliability are:

  • Temperature cycling – extending the temperature, (both high and low), to which a product is exposed accelerates stresses due to differential expansion of components and materials. The more extreme the temperature cycle, the higher the acceleration factor.
  • Vibration – this promotes mechanical failures and the deterioration of material strength, due to such cyclic stressing, is known as fatigue. If a product's normal operational vibration environment is known, then it can be accelerated too.
  • Power cycling – this is the act of repeatedly turning a piece of equipment off and then on again to check that an electronic device reinitialises its configuration and continues operating normally.

The benefit of accelerated life testing is principally that it helps detect the design flaws which are most likely to give rise to a product's ‘infant mortalities'. The disadvantage is that this method may precipitate some unrepresentative failures and Highly Accelerated Life Testing may therefore provide the answer here.

HALT testing in action

Highly Accelerated Life Testing

A key difference between highly accelerated life testing (HALT) and traditional accelerated life testing is that stress factors, such as high temperatures, are applied directly to the component or sub-assembly under test and not to the system as a whole. This can make a great difference in accelerating failure rates. Thermal and mechanical stimuli are also applied separately, and then together, in order to determine the operating and destruct limits of the item under test.

Defect analysis is a key stage in the HALT process and is conducted once the operation and destruct limits have been identified. The operating limit is defined as the point at which the unit remains operational but any further increase in stress causes a recoverable failure. The destruct limit is the level at which the product stops functioning and remains inoperable. This test method has been proved to expose design flaws within hours when traditionally this might have taken many days or weeks using conventional test methods.

History lesson

When designing new versions of an existing product, data from previous reliability tests, as well as in-service failure information from warranty returns, will be available. However, product reliability history can be relied on too much when assessing the next generation. This is because what can be perceived as the slightest alteration, such as the use of different plastics, more up to date electronic components, or a change in the manufacturing process can have a significant impact on the product’s reliability.

It is therefore imperative that some form of gap analysis is performed between the known product currently available on the market and the new version under development. The gap analysis data should be mapped onto previous reliability information to gain a clearer understanding of the upgraded product’s reliability.

Human error

As all design engineers know, there is no accounting for end-user behaviour. This is where reliability testing can unravel as it focuses purely on product performance and not what people might do to it, as products are often used in ways that the designers never envisaged.

In an ideal world reliability tests should therefore be taken a step further to include user tests in the field. Observations can then be made of how the product might be used and how maintenance will be managed. In the rush to release a product onto the market, this is a key part of reliability testing that is often overlooked, but we are increasingly seeing standards that try to compensate for this element of non-intentional use by the end-user.

As time to market constraints require accelerated testing that cannot guarantee 100 per cent reliability and end-user behaviour cannot be predicted, reliability is difficult to guarantee. However, it is often brand reputation that sets one product apart from another, of which product reliability is a key element. To remain competitive, companies must select the appropriate techniques to develop a product’s reliability that are fast, cost-effective, and produce worthwhile results. Without the ability to gauge reliability throughout the design lifecycle there is no assurance that the final product will meet market expectations.

Author profile:
Jean-Louis Evans is managing director at
TÜV SÜD Product Service