Loading AI tools
Frequency with which an engineered system or component fails From Wikipedia, the free encyclopedia
Failure rate is the frequency with which any system or component fails, expressed in failures per unit of time. It thus depends on the system conditions, time interval, and total number of systems under study.[1] It can describe electronic, mechanical, or biological systems, in fields such as systems and reliability engineering, medicine and biology, or insurance and finance. It is usually denoted by the Greek letter (lambda).
In real-world applications, the failure probability of a system usually differs over time; failures occur more frequently in early-life ("burning in"), or as a system ages ("wearing out"). This is known as the bathtub curve, where the middle region is called the "useful life period".
The mean time between failures (MTBF, ) is often reported instead of the failure rate, as numbers such as "2,000 hours" are more intuitive than numbers such as "0.0005 per hour".
However, this is only valid if the failure rate is actually constant over time, such as within the flat region of the bathtub curve. In many cases where MTBF is quoted, it refers only to this region; thus it cannot be used to give an accurate calculation of the average lifetime of a system, as it ignores the "burn-in" and "wear-out" regions.
MTBF appears frequently in engineering design requirements, and governs the frequency of required system maintenance and inspections. A similar ratio used in the transport industries, especially in railways and trucking, is "mean distance between failures" - allowing maintenance to be scheduled based on distance travelled, rather than at regular time intervals.
The simplest definition of failure rate is simply the number of failures per time interval :
which would depend on the number of systems under study, and the conditions over the time period.
To accurately model failures over time, a cumulative failure distribution, must be defined, which can be any cumulative distribution function (CDF) that gradually increases from to . In the case of many identical systems, this may be thought of as the fraction of systems failing over time , after all starting operation at time ; or in the case of a single system, as the probability of the system having its failure time before time :
As CDFs are defined by integrating a probability density function, the failure probability density is defined such that:
where is a dummy integration variable. Here can be thought of as the instantaneous failure rate, i.e. the fraction of failures per unit time, as the size of the time interval tends towards .
A concept closely-related but different[2] to instantaneous failure rate is the hazard rate (or hazard function), .
In the many-system case, this is defined as the proportional failure rate of the systems still functioning at time (as opposed to , which is the expressed as a proportion of the initial number of systems).
For convenience we first define the reliability (or survival function) as:
then the hazard rate is simply the instantaneous failure rate, scaled by the fraction of surviving systems at time :
In the probabilistic sense, for a single system this can be interpreted as the conditional probability of failure time within the time interval to , given that the system or component has already survived to time :
To convert between and , we can solve the differential equation
with initial condition , which yields[2]
Thus for a collection of identical systems, only one of hazard rate , failure probability density , or cumulative failure distribution need be defined.
Confusion can occur as the notation for "failure rate" often refers to the function rather than [3]
There are many possible functions that could be chosen to represent failure probability density or hazard rate , based on empirical or theoretical evidence, but the most common and easily-understandable choice is to set
an exponential function with scaling constant . As seen in the figures above, this represents a gradually decreasing failure probability density.
The CDF is then calculated as:
which can be seen to gradually approach as representing the fact that eventually all systems under study will fail.
The hazard rate function is then:
In other words, in this particular case only, the hazard rate is constant over time.
This illustrates the difference in hazard rate and failure probability density - as the number of systems surviving at time gradually reduces, the total failure rate also reduces, but the hazard rate remains constant. In other words, the probabilities of each individual system failing do not change over time as the systems age - they are "memory-less".
For many systems, a constant hazard function may not be a realistic approximation; the chance of failure of an individual component may depend on its age. Therefore, other distributions are often used.
For example, the deterministic distribution increases hazard rate over time (for systems where wear-out is the most important factor), while the Pareto distribution decreases it (for systems where early-life failures are more common). The commonly-used Weibull distribution combines both of these effects, as do the log-normal and hypertabastic distributions.
After modelling a given distribution and parameters for , the failure probability density and cumulative failure distribution can be predicted using the given equations.
Failure rate data can be obtained in several ways. The most common means are:
Given a component database calibrated with field failure data that is reasonably accurate,[4] the method can predict product level failure rate and failure mode data for a given application. The predictions have been shown to be more accurate[5] than field warranty return analysis or even typical field failure analysis given that these methods depend on reports that typically do not have sufficient detail information in failure records.[6]
A decreasing failure rate describes cases where early-life failures are common[7] and corresponds to the situation where is a decreasing function.
This can describe, for example, the period of infant mortality in humans, or the early failure of a transistors due to manufacturing defects.
Decreasing failure rates have been found in the lifetimes of spacecraft - Baker and Baker commenting that "those spacecraft that last, last on and on."[8][9]
The hazard rate of aircraft air conditioning systems was found to have an exponentially decreasing distribution.[10]
In special processes called renewal processes, where the time to recover from failure can be neglected, the likelihood of failure remains constant with respect to time.
For a renewal process with DFR renewal function, inter-renewal times are concave.[clarification needed][11][12] Brown conjectured the converse, that DFR is also necessary for the inter-renewal times to be concave,[13] however it has been shown that this conjecture holds neither in the discrete case[12] nor in the continuous case.[14]
When the failure rate is decreasing the coefficient of variation is ⩾ 1, and when the failure rate is increasing the coefficient of variation is ⩽ 1.[clarification needed][15] Note that this result only holds when the failure rate is defined for all t ⩾ 0[16] and that the converse result (coefficient of variation determining nature of failure rate) does not hold.
Failure rates can be expressed using any measure of time, but hours is the most common unit in practice. Other units, such as miles, revolutions, etc., can also be used in place of "time" units.
Failure rates are often expressed in engineering notation as failures per million, or 10−6, especially for individual components, since their failure rates are often very low.
The Failures In Time (FIT) rate of a device is the number of failures that can be expected in one billion (109) device-hours of operation[17] (e.g. 1,000 devices for 1,000,000 hours, or 1,000,000 devices for 1,000 hours each, or some other combination). This term is used particularly by the semiconductor industry.
If a complex system consists of many parts, and the failure of any single part means the failure of the entire system, then the total failure rate is simply the sum of the individual failure rates of its parts
however, this assumes that the failure rate is constant, and that the units are consistent (e.g. failures per million hours), and not expressed as a ratio or as probability densities. This is useful to estimate the failure rate of a system when individual components or subsystems have already been tested.[18][19]
Adding "redundant" components to eliminate a single point of failure may thus actually increase the failure rate, however reduces the "mission failure" rate, or the "mean time between critical failures" (MTBCF).[20]
Combining failure or hazard rates that are time-dependent is more complicated. For example, mixtures of Decreasing Failure Rate (DFR) variables are also DFR.[11] Mixtures of exponentially distributed failure rates are hyperexponentially distributed.
Suppose it is desired to estimate the failure rate of a certain component. Ten identical components are each tested until they either fail or reach 1,000 hours, at which time the test is terminated. A total of 7,502 component-hours of testing is performed, and 6 failures are recorded.
The estimated failure rate is:
which could also be expressed as a MTBF of 1,250 hours, or approximately 800 failures for every million hours of operation.
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.