7
Prognostic Enabling: Selection, Evaluation, and Other Considerations

7.1 Introduction to Prognostic Enabling

As you have already learned, a key objective of deploying a prognostic‐enabled system is to monitor prognostic targets to provide advanced warning of failures in support of condition‐based maintenance (CBM). There are criteria associated with, for example, equipment availability and other metrics, test coverage, and confidence levels. To meet the criteria, the various sensing, signal‐processing, and computational (algorithms) routines in a prognostics and health management/monitoring (PHM)1 system need to be factored into the entire design. CBM methods and approaches – especially those using condition‐based data (CBD) signatures that are ultimately transformed into functional failure signature (FFS) data that is processed by a very good prognostic information program – provide significant advantages over (i) a system based on statistical or other methods applicable to populations rather than a specific instantiation of a population and (ii) a system based on using CBD to detect damage without prognosing when such damage will result in the system no longer operating within specifications.

7.1.1 Review of Chapter 6

Chapter 6 presented a design of an exemplary prototype of a PHM system that prognostic‐enabled multiple instantiations of systems and prognostic targets with excellent results (see Table 7.1).

Table 7.1 Performance measurements and metrics.

Prognostic target PI specifications BD EOL PD PH
Name PITTFF0 @ time @ time Maximum PH ERROR = (PITTFF0/PDMAX)
FFS NM Estimated BD Estimated FOM 25% @ time 10% @ time
χ [%]–pts [#] χ [%]–pts [#]
SMPS 4800 h 1368 h 4200 h 2939 h Initial PH error = 63%
3% 1261 h 4176 h 96.4% 1560 h 2760 h
93.5%–9 pts 52.6%–23 pts
EMA load 4800 h 504 h 3168 h 2727 h Initial PH error = 76%
2% 441 h 3164 h 97.7% 576 h 1368 h
97.3%–3 pts 68.3%–19 pts
EMA winding 4800 h 1104 h 2760 h 1693 h Initial PH error = 184%
2% 1067 h 2745 h 97.8% 1320 h 1680 h
87.2%10 pts 66.0%25 pts
EMA power 4800 h 960 h 4440 h 3476 h Initial PH error = 38%
Transistor 1% 958 h 4434 h 99.9% 984 h 1512 h
99.3%2 pts 84.1%

The design supported multiple subsystems (two), each comprising prognostic targets: a power supply and two electro‐mechanical actuators (EMAs). The design included monitoring each of the two power supplies for a single mode of failure and monitoring each of the four EMAs for three failure modes. The monitoring, conditioning, and processing were all based on CBD signatures, and the prognostic approaches and methods provided excellent, if not superior, results:

  • Degradation was detected at least 96% of the distance in time before functional failure occurred.
  • Estimate accuracy was within 25% at least 93% of the distance in time before functional failure occurred.
  • Estimate accuracy was within 10% at least 52% of the distance in time before functional failure occurred.
  • A time to failure (TTF) value of 9600 hours was used to specify a prognostic TTF (using PITTFF0) value of 4800 hours:
    • All four of the prognostic targets that failed did so prior to that time: at estimated times of 2745 hours, 3164 hours, 4176 hours, and 4434 hours.
    • The actual TTF was 3630 hours.

7.1.2 Electronic Health Solutions

Electronic health solutions, such as those described in this book, become part of a PHM system. They are sometimes referred to as a prognostic ecosystem (see Figure 7.1), within which such solutions can be categorized at levels as shown in Figure 7.2: die, component, board, module, and system (Ridgetop Group 2018). Other levels could be added, such as an assembly of boards and a collection of modules into a replaceable unit.

Image described by caption and surrounding text.

Figure 7.1 Example of a broad view of an ecosystem.

Step diagram illustrating the five-level model for health solutions with curve arrows from die level to component level, to board level, to module level, and to system level (in ascending order).

Figure 7.2 Example of a five‐level model for health solutions.

Ecosystems

An ecosystem can be described as prognostic models within a system that includes descriptions of data, quantification of uncertainty, justification and validation of model selection, and limitations of application (Astfalck et al. 2016). The locations in the broader view of an ecosystem shown in Figure 7.1 are the following: location 1 is a system or subsystem comprising one or more line‐replaceable units (LRUs) that are prognostic enabled (monitored for damage and/or degradation); location 2 is a PHM system that acquires, manipulates, manages, and processes data to produce prognostic information that is used to initiate a service and maintenance action; location 3 is where failures are analyzed and products are improved by a supplier; location 4 are repositories of LRUs, assemblies, components, and devices used for service and repair; and location 5 are maintenance personnel who perform service and maintenance.

Levels of Solutions

A complex PHM system contains devices, components, boards, subassemblies, and so on. A sensor is attachable to any node within a system, and therefore health solutions that process sensor data can be categorized in accordance with the node to which the sensor is attached, as exemplified by the five‐level model of health solutions shown in Figure 7.2 (Ridgetop 2018 ).

7.1.3 Critical Systems and Advance Warning

Critical systems are considered vital to the ongoing operation of everyday life, and criticality is a key consideration when evaluating and selecting a node for prognostic enabling (a prognostic target). For example, a power system in an aircraft, or a gear box in a wind turbine, would be considered critical, since their operation is essential to meet design objectives of the overall system. Another example of criticality is the safety of life and health: prevention and avoidance of loss of life and injury is a primary objective of a system. Fault severity and fault propagation also play a role in the definition of systemwide criticality.

Advance warning, such as an alert, of any impending failure of a mission‐critical or safety‐critical prognostic target is vital: a properly designed PHM system will provide detection of anomalies that affect the ability of the system to operate within specifications and issue appropriate alerts. For example, Chapter 6 included examples of messages and alerts issued by an exemplary prototype PHM system. A PHM system will issue alerts (health monitoring) and/or initiate appropriate actions (health management) such as soft shutdowns, load shedding, and scheduling maintenance. There might also be various levels of alerting where threshold levels and fault models can be used to prioritize what information is available to an operator of an aircraft, or a seagoing ship, or a machine tool on the manufacturing floor. This brings in the notion of fault severity, access to the information, and what is done to mitigate an issue that results in an alert.

7.1.4 Reduction in Maintenance

To save money and resources, maintenance intervals can be optimized based on actual evidence of degradation. For example, a system might have components that fail after 250 hours of operation, some that fail after about 600 hours, and others that fail at about 500 hours. A usage‐based PHM system might be designed to do one of the following:

  1. Schedule repair and/or replace maintenance for all units on or before 200 hours of operation – a 20% safety margin with respect to failure.
  2. Schedule repair and/or replace maintenance for all units on or before 400 hours of operation.

In a first design of a PHM system, an objective might be to avoid all unexpected failures, but at increased maintenance costs: for example, an average 300 hours of lost usage for each instance of avoidance, and cost increases due to increased maintenance actions (more frequent replacement). A second design might focus on reducing sustainment costs by increasing the time between maintenance actions – but unexpected failures would increase. Typically, even disregarding mission and safety issues, the cost of an unexpected failure is higher than an early repair‐and‐replace action.

So, instead of a usage‐based PHM system, we advocate CBM using a PHM system that is CBD‐based; uses signature‐based detection and prognostic approaches and methods; and employs a fast, highly accurate set of data‐conditioning, prediction, and computational routines. The advocated approach of using CBD signatures is an effective method for handling variability introduced by the operational environment: operating in the desert of Arizona is very different than operating equipment in a rainy, cold environment such as that in Puget Sound, Washington.

7.1.5 Health Management, Maintenance, and Logistics

This book is focused on prognostic enabling to monitor the health of a system of nodes: the prognostic targets chosen because those nodes have signals that change in response to degradation of devices, components, and so on that, when they fail, have a critical effect on the operation of the system. They may cause mission‐critical functions to cease or otherwise operate out of specifications, or they may create a hazardous threat to the safety of the system or life.

But monitoring, per se, does not avoid unexpected outages, does not repair anything, and does not prevent loss of life. Refer back to Figure 7.1 : a PHM system needs to provide services for health management, maintenance, and logistic support to schedule maintenance, locate and deliver parts and equipment, and dispatch a maintenance team.

Management

Given the accuracy of the prognostic information produced by the PHM system in Chapter 6, it would not be unreasonable to defer maintenance until a detected state of health (SoH) value at or below a specified level, such as 25%. PHM management support might be designed to act on alerts, such as those shown in Figure 7.3, which are excerpted from Chapter 6. In contrast, PHM management support might be designed to act on damage‐detected alerts, such as those shown in Figure 7.4.

No alt text required.

Figure 7.3 Example of alerts for SoH at or below 25%.

No alt text required.

Figure 7.4 Example of alerts for a damage‐detection approach.

Maintenance

A PHM system needs to alert users when maintenance is required – based on physical evidence of degradation, and not on an arbitrary number of elapsed hours. In addition to alerts, a PHM needs to provide for and support maintenance‐related services to avoid unnecessary replacements, increase usage of systems, decrease downtime, and reduce sustainment costs. The approaches and methods used for maintenance are application specific, need be integrated with health management and logistic services, and are beyond the scope of this book.

Logistics

A critical function of a PHM system is logistics support. Parts and equipment must be located and delivered to the service and repair site, and a service and maintenance team needs to be dispatched to arrive on or after, but not before, the arrival of needed parts and equipment. Additionally, maintenance and inventory records need to be updated; and suppliers, vendors, and manufacturers must be notified per contractual obligations. The latter is especially true when dealing with a government agency such as the Department of Defense. Logistic support might also be required to arrange for and record the outcome of ancillary activities such as cause‐and‐effect review of repairs.

7.1.6 Chapter Objectives

The previous chapters were focused on PHM aspects deemed critical to the design of a PHM system, including approaches and methods not suitable for CBD‐based prognostics. The overall objective of this book is to provide you with the knowledge to understand, evaluate, design (at least at a high level), and verify health monitoring. The introduction of this chapter has briefly introduced you to the concept of ecosystems, critical systems and warnings, reduction in maintenance, and health management. The remainder of this chapter is devoted to the evaluation, selection, and specifications of prognostic targets: nodes to be monitored to detect damage and provide prognostic information, to avoid unscheduled outages of critical functions and loss of safety in a system.

7.1.7 Chapter Organization

The remainder of this chapter is organized to present and discuss topics related to prognostic enabling:

  • 7.2 Prognostic Targets: Evaluation, Selection, and Specifications

    This section includes descriptions of the meaning and relationship of TTF, time before/between failure (TBF), prognostic distance (PD), and prognostic horizon (PH); distributions of the onset of degradation and functional failure; mean time to failure (MTTF); and mean time before/between failure (MTBF).

  • 7.3 Example: Cost‐Benefit of Prognostic Approaches

    This section is devoted to the cost‐benefit analysis of prognostic approaches and includes example comparisons of no PHM, two usage‐based approaches, CBD‐based detection, and CBD‐based prognostics.

  • 7.4 Reliability: Bathtub Curve

    This section is devoted to the bathtub curve, prognostic triggers, and the relationship of the bathtub curve to failure rate and MTBF.

  • 7.5 Chapter Summary and Book Conclusion

    This section summarizes and ends both the chapter and the book.

7.2 Prognostic Targets: Evaluation, Selection, and Specifications

Selecting a target to be prognostic enabled probably seems pretty straightforward: collect and analyze historical records pertaining to maintenance and repair to identify those targets having high rates of failure and/or failure of mission‐critical and/or safety‐critical parts regardless of failure rate. Prepare a cost‐benefits business case: cost to replace or repair, cost associated with unplanned failure, savings due to prolonged time in use, savings due to reduction in sustainment costs and unplanned downtime, and so on. But you also need to factor in a hard‐to‐quantify cost related to criticality of mission and/or safety (refer back to Section 1.8).

7.2.1 Criteria for Evaluation, Selection, and Winnowing

Because you are designing a PHM system to prognostic enable an operational system, your team will not perform any traditional failure mode and effect analysis (FMEA) or failure mode effect and criticality analysis (FMECA) (DAU 2018). Instead, your team will review the existing FMEA and FMECA data; the historical failure, service, and repair data; and any other data related to failure. The focus is on identifying, selecting, and winnowing prognostic targets. To select and winnow a list of prognostic candidates, including those identified as candidates by FMEA/FMECA, you need to know the following:

  • TTF or TTFF. An estimate of the time to failure (alternatively, time to functional failure). This is a primary focus of this chapter.
  • PD (estimated). An initial estimate of the distance in time between the onset of degradation and functional failure from which a prediction algorithm converges upward or downward to a true PH.
  • Failure mode to be detected. The failure mechanism that causes the characteristic shape of the signature captured by one or more sensors.
  • Severity classification. A qualitative assessment of the consequences of functional failure.
  • Cost of failure. An estimate of the cost of an unplanned failure versus the cost of service and repair before failure.
  • Cost of prognostic enabling. An estimate of, for example, the cost of the sensor, PHM support, design, development, testing, qualification, and fielding.
  • Cost‐benefit analysis. An estimate of, for example, the change in sustainment cost plus time in use due to prognostic enabling, the effective savings achieved, and the change in service actions because of prognostic enabling.

7.2.2 Meaning of MTBF and MTTF

PHM systems are typically referenced to either MTBF or MTTF, and therefore you should know and understand the difference between them (Speaks 2005):

  • MTBF. Calculated mean time between (or before – context dependent) failures of a repairable system.
  • MTTF. Expected TTF of a nonrepairable system.

As you can see, the meaning of (and therefore the use of) these terms is dependent upon the definition of failure and the definition of repairable. To achieve an understanding of those terms, recollect that Chapter 2 introduced failure in time (FIT) in Eq. (2.15):

equation

where AF is the value of an acceleration factor for a specified test. Refer to Tables 2.3 and 2.4 for examples.

But you are not given a failure rate: instead, you are told that the FIT number is 50. Now you need to know the following to relate that FIT number to a failure rate (Ellerman 2012; NIST 2018 ):

7.1equation

where 1 FIT = 1/109 hours.

Even though your research confirms that your calculation is correct, because the calculated rate of failure is so large, you decide to calculate an MTBF (mean time before failure) value using Eq. (7.1) (Abernethy 2006; RAC 2005; Speaks 2005 ; Weibull 2008):

7.2equation

But this does not help, because you do not know the total time, which you know is calculated using Eq. (7.3):

7.3equation

You also don't know the number of tested units or the test time. So, you find another expression for MTBF

7.4equation

which, for FIT = 50, is equal to the Example 7.4 calculated value for λ: 20 000 000 hours.

Literature research reveals the following:

  • In 1993, MTTF was defined as “mean time to first” failure (Seymour 1993), and the expression to calculate that value is given as
    equation
  • So, MTTF (circa 1993) is calculated the same way as MTBF in 2018, when B means before.
  • You discover that MTBF is also defined as “mean time between failures” for a repairable product.
  • MTTF is defined as “mean time to failure” (no reference to first) for a nonrepairable product.
  • A relationship of MTBF to MTTF is defined as follows (Ellerman 2012 ):
    7.5equation

    where MTTR is defined as “mean time to repair.”

  • Ergo, you have a multiplicity of definitions and acronyms that are ambiguous and cause uncertainty regarding use and meaning.

Regardless of meaning and/or definition, neither MTTF nor either of the two definitions of MTBF is useful for prognostic enabling. MTTF and MTBF should be limited to a classical definition of reliability (Section 1.6): without intervention, there is a 63% probability the system will fail prior in time to the value of MTTF or MTBF.

7.2.3 MTTF and MTBF Uncertainty

Figure 7.5 illustrates the relationship of a failure distribution (density of failures/time), the MTBF (failure rate between failures), the MTBF (mean time before a first failure), and MTTF: MTTF and MTBF were originally defined for an exponential distribution having a constant, low failure rate: for example, solid‐state (integrated circuit) devices. Those devices are subjected to one or more accelerated tests, such as a HALT, with test results extrapolated to normal life using an AF (Ellerman 2012 ; O'Connor and Kleyner 2012; NIST 2018 ; RAC 2005 ; Speaks 2005 ; Wilkins 2002).

Image described by caption and surrounding text.

Figure 7.5 MTTF, TTF, and PITTFF0: CBD signature and failure distribution.

Typical commercial FIT values for solid‐state devices are in the 50–1000 range, with FIT values in the range of 1–10 for space applications (Johnston 2010). There are simulators that calculate values called MTTF and MTBF using simulated failure times (Weibull 2008 ), which adds even more uncertainty as to the meaning of and/or calculation of a particular MTTF and/or MTBF value.

Even worse, different failure distributions, different CBD signatures, and so on, can result in identical (or nearly identical) reliability metrics such as MTTF: compare Figures 7.5 and 7.6. Finally, this book asserts that attempts to use a TTF value of hundreds of thousands (or larger) of hours for CBD‐based prognostics is nonsensical: an MTTF value of 100 000 hours is equivalent to more than 11 years.

Image described by caption and surrounding text.

Figure 7.6 Same MTTF for different failure distributions and signatures.

7.2.4 TTF and PITTFF

We need to know how to determine, calculate, and/or estimate a TTF value that begins when degradation begins and ends when functional failure occurs (see TTF 1 and TTF 2 in Figure 7.5 ). But at the time when degradation is first detected, there is no a priori knowledge of that future time of failure; yet our prediction program needs to converge from an initial estimate of that time to a very accurate estimate of the time of failure. Research into reliability metrics such as MTTF, MTBF, and FIT indicates that they are not really close in value to what we need for TTF.

The prediction program we are using, ARULEAV, provides a parameter called PITTFF0 (introduced in Chapter 6) as a means to specify an initial value for TTF. TFF is not a value that is looked at or specified by manufacturers and/or vendors of products; failures in the field are usually due to either an anomalous event such as a lightning strike or degradation. Degradation typically is not caused by a part entering what is referred to as the wear‐out region of a bathtub curve; rather, degradation is typically due to an accumulation of fatigue damage caused by cyclic stresses and strains (such as thermal and mechanical) during operation (Hofmeister et al. 2006).

We can estimate an initial value for TTF using a number of methods: a service‐life determination, an end‐use test method, or an MTTF‐based method. But be aware that the supplier of the prediction program advises that, in general, that program converges to within 25% accuracy in less time when the initial estimate is higher, rather than lower, in comparison to the true time of functional failure.

TTF: Service Life Determination

Instead of using MTTF or MTBF values for an extremely low failure rate, you might use end‐use values based on service‐life values from vendors. For example:

  • A tire is warranted to last 5 years or 40 000 miles, and on or about that amount of usage, the tire is deemed “worn out” and is replaced: the tire is not repairable. The new tire (same brand and type) is also warranted to last 40 000 miles. The tire can be said to have a TTF of 40 000 miles.
  • A fuel pump is warranted to last 5 years or 40 000 miles, whichever occurs first. When the fuel pump fails, it is replaced with a remanufactured pump that is warranted to last 2 years or 25 000 miles, whichever occurs first: the fuel pump is deemed repairable. Since the repaired pump has a smaller expected lifetime, the fuel pump can be said to have a time before failure (TBF) of 2 years or 25 000 miles: less than the original expected lifetime. In this situation, choose the larger value.

Set the PITTFF0 parameter to twice the value you have of the service‐life determined value for TTF, for three reasons:

  • The supplier advises that the prediction program generally converges to a solution from a high initial error.
  • Reliability theory states that a mean lifetime, TTF, means 63% of failures are likely to occur before that mean.
  • Human factors: a low, initial error means that the prediction program is likely to produce increasing values of remaining useful life (RUL) for decreasing values of SoH. For example, Figure 7.9 plots the prediction results for the same data using a high value for PITTFF0 (left‐hand plot) and a low value for PITTFF0 (right‐hand plot): at best, the results are perplexing to an operator; at worst, they may cause distrust and loss of confidence in the reliability of the PHM system.

TTF: End‐Use Test Method

You can calculate TBF and TTF values in the same way as MTBF and MTTF values, respectively, using the following (Weibull 2008 ):

7.6equation
7.7equation

TTF: MTTF‐Based Method

Referring to Figure 7.5 , a simplistic MTTF‐based method would be to set TTF equal to MTTF, but that method works well when the spread of the majority of the failures is wide compared to the value of MTTF. In such cases, simply setting PITTFF0 to twice the value of MTTF suffices. But if you are fairly confident that the situation is more like that illustrated in Figure 7.6 , you need to specify a lower value for PITTFF0. However, it might be the case that your PHM system supports the same type of LRUs in two distinct operating environments: one that induces earlier‐than‐expected failures (akin to a situation like that shown in Figure 7.5 ) and a second that is less variable and causes failures to be more closely bunched together (akin to a situation like that shown in Figure 7.6 ). Further, to avoid misunderstanding and/or for procedural reasons, suppose you must always set PITTFF0 to a value (such as MTTF) specified by a manufacturer, vendor, or governmental agency. In such situations, you need a method to cause the prediction program to adjust the specified PITTFF0 value. The supplier of your prediction program agrees, and changes are made to provide a node‐definition parameter, PITTFADJ, to allow you to adjust how the value of PITTFF0 is handled:

  • When PITTFADJ is not specified or is less than 1, the value of PITTFF0 is used as an initial value for calculating the next RUL and PH values.
  • When PITTADJ ≥ 1, the initial value is calculated as [1 – exp(‐PITTADJ)] (PITTFF0).

7.3 Example: Cost‐Benefit of Prognostic Approaches

We will use the SMPS‐EMA examples from Chapter 6 as a base platform to construct an example situation to illustrate the cost‐benefit for various approaches. You and your customer agree that cost‐benefit analyses are not to include unavoidable catastrophic failures (such as, for example, being hit by another vehicle) and that, because of criticality considerations, repair‐and‐removal activity due to unexpected functional failures will be held to less than 5% of the total number of repairs and removals.

7.3.1 Cost‐Benefit Situations

The situations are the following: (i) none; (ii) usage‐based, MTTF; (iii) usage‐based, 2/3 MTTF; (iv) CBD‐based, replace when damage is detected; and (v) CBD‐based, replace within 720 hours after estimated SoH becomes 75% or less. Although the first situation fails to meet the requirement that unexpected failures will be less than 5%, it establishes a baseline estimate of cost.

Your customer arranged for special delivery of 6 power supplies and 12 EMAs. Those 18 units were subjected to end‐use tests similar to HALTs (refer back to Section 2.2). Your team assisted in the design of the experiment and building of the test beds; the tested units (power supplies and EMAs) needed to be prognostic enabled, as described in Chapter 6. For analysis purposes, test failures are to be evaluated as though all of the tested units were installed at the same time and failed in the sequences and times indicated by the test.

The numbers to be used in a cost‐benefit analysis are provided by the customer and listed in Tables 7.1 and 7.5. After examining the rest results (see Figure 7.15 and Table 7.6), your customer concludes the cost‐benefit analysis for the power supply scenario is sufficient for evaluation of the five approaches.

Table 7.5 Cost estimates for benefits evaluation of prognostic enabling.

Estimated costs: use for benefits analysis Sustainment (Use)
LRU name Acquisition Scheduled R&R Unplanned failure Expected life (h)/LRU Period (h)/LRU
Power supply $10 000 $2 000 $4 000 3 500 14 400
EMA $25 000 $3 000 $6 000 3 500 14 400
Plots illustrating the test results for six power supplies (top) and 12 EMAs (bottom), displaying a vertical line and coinciding ascending lines, each has markers at both ends for TTF (triangle), LOAD (circle), etc.

Figure 7.15 Plots: test results for six power supplies (top) and 12 EMAs (bottom).

Table 7.6 Summarized list of test results.

Power supply Detect degradation Detect failure EMA Detect degradation Detect failure
Supply #1 1368 4200 EMA #1 540 3168
Supply #2 1320 3624 EMA #2 516 3024
Supply #3 1296 3240 EMA #3 588 3600
Supply #4 1248 3048 EMA #4 636 4032
Supply #5 1224 4584 EMA #5 1104 2760
Supply #6 1416 2856 EMA #6 1080 2472
EMA #7 1152 3192
EMA #8 1200 3624
EMA #9 960 4440
EMA #10 936 4008
EMA #11 1008 4872
EMA #12 1056 5304

7.3.2 Cost Analyses

No PHM Approach

In this scenario, the system runs until a unit fails – a “do nothing until failure occurs” approach. When failure occurs, the failed unit is removed and replaced. The system is restarted and runs until the next unit fails, and so on, until the end of a sustainment period of 14 400 hours (600 days of operation):

  • On average, #1 power supplies would fail and be replaced: 14 400/4200 = 3.4 times:
    • Cost = 3.4*10000 + 0*2000 + 3.4*4000 = 47 600

Repeating this for the remaining five power supplies results in 24 removals and replacements because of unexpected outages due to degradation failure, at a baseline cost of $336 000.

Usage‐Based MTTF Approach

In this scenario, all power supplies are replaced when usage equals 3592 hours (MTTF). Examination of the data in Table 7.6 shows that power supplies #3, #4, and #6 would functionally fail before they are replaced. In the sustainment period, a total of 27 power supplies would be removed and replaced, at a total cost of $354 000 – an increase of $18 000 per system during the sustainment period to reduce the number of unexpected outages from 24 to 14. However, 52% of all repair and removal actions would be attributable to degradation failures. This approach fails to meet requirements.

Usage‐Based 2/3 MTTF Approach

In this scenario, all power supplies are replaced when usage equals 2395 hours. There would be no unexpected outages, but 36 power supplies would be replaced at a cost of $432 000 – an increase in cost of $96 000 and a reduction of unexpected failures to zero. That $96 000 becomes $96.0 million for a population of 1000 systems.

Damage‐Detection Approach

In this scenario, whenever damage is detected in a power supply, it is removed and replaced within 720 hours, which would also, for these supplies, result in zero unexpected outages. This approach is seemingly a good one because there would be no unplanned outages due to degradation leading to failure. But there would be a large increase in removal and replacement activity: a total of 43 power supplies in the 14 400‐hour sustainment period at an estimated cost of $516 000 per system – an increase of $180 000 more than the baseline cost per system, which becomes $180.0 million for a population of 1000 systems.

SoH at 75% or Less Approach

In this scenario, whenever a prognostic SoH estimate is 75% or less for a power supply, it is removed and replaced within 120 hours: again, for this approach, there would be zero unexpected outages. This approach would result in the removal and replacement of 28 power supplies during the sustainment period at an estimated cost of $336 000 per system – neither a savings nor an increase in cost compared to the baseline cost of a “do nothing until failure” approach.

Yes, there is the cost of the sensors and PHM systems, but the SoH approach has the following advantages:

  • It reduces unexpected outages due to degradation failure to zero.
  • It is less expensive than the next‐most‐expensive approach (usage‐based 2/3 MTTF).
  • It has the fewest repair and replace activities (28) compared to the usage‐based 2/3 MTTF approach (36) or the damage‐detection approach (43).
  • It requires the smallest number of repair hours and thereby increases mission‐availability time.
  • It does not have the drawback of unexpected early onset of degradation and/or faster‐than‐expected time from onset to failure – which are drawbacks of the usage‐based 2/3 MTTF approach.
  • It saves $180 million compared to the damage‐detection approach for a 1000‐system network over a 14 400‐hour sustainment period.

Sustainment costs can be further reduced when your PHM system is sufficiently accurate and reliable to let your customer defer maintenance until SoH estimates fall below 50% or even lower.

CBD‐Based Approaches to CBM

Of the five approaches in the cost analyses, two are based on CBD: the damage‐detection approach and the SoH approach. The damage‐detection approach is diagnostic in nature: it processes CBD and detects damage, and maintenance is scheduled. The SoH approach is prognostic in nature: it processes CBD, detects damage, and provides estimates of SoH that are used to trigger scheduling of maintenance. The prediction program, ARULEAV, also provides RUL and PH estimates for use in health management.

7.4 Reliability: Bathtub Curve

A bathtub curve, shown in Figure 7.16, is a statistical depiction of the failure rate over the lifetime of a population of electronic products. There are three distinct regions involved, where the curve depicts the failure rate versus time. Beginning on the left, and moving to the right:

  • The first region is the infant mortality region, where burn‐in cycles can be used to weed out those products susceptible to early failure due to material or manufacturing flaws.
  • The constant‐failure region is where the failure rate is very low. Even so, fielded products are susceptible to fatigue damage.
  • The wear‐out of end‐of‐life region is marked with increasingly high levels of failure rates as the products wear out. However, electronic products rarely fail due to wear‐out; instead, failure is most often due to cyclic stresses and strains that cause fatigue damage that accumulates (Hofmeister et al. 2006 ). As fatigue damage accumulates, one or more measurable signals changes, which can be captured as signature data.
Image described by caption and surrounding text.

Figure 7.16 Bathtub curve showing three regions, MTBF, and a prognostic trigger point.

So, there is really nothing about a bathtub curve that can be used to enable or to support CBD‐based prognostics.

7.4.1 Bathtub Curve: MTBF and MTTF

As you can see in Figure 7.16 , MTBF (between failures) is not a time‐axis value: it is a failure‐rate value. Neither MTTF nor MTBF is seen in a typical view of a bathtub curve, perhaps because of the relationship suggested in Figure 7.17 (Seastrunk 2016).

Graph of failure rate vs. time illustrating the possible relationship of bathtub curve to failure distribution and MTTF.

Figure 7.17 Possible relationship of bathtub curve to failure distribution and MTTF.

7.4.2 Trigger Point and Prognostic Distance

Also shown in Figure 7.16 is a conceptual diagram intended to convey the notion that it is possible to employ a prognostic trigger to provide advance warning of a probable failure within time PD while, at the same time, conveying a notion that useful life does not extend into the wear‐out region. Figure 7.18 conveys a more practical view of prognostic trigger points:

  • Envision multiple sensors attached to a complicated product such as an EMA with a built‐in power supply, as already described in this book.
  • When a population of those EMAs are deployed and put into use, they become damaged due to cyclic stress, weak/defective components, and random events.
  • Continued use increases the level of damage to the point that the attached sensors detect the damage, trigger alerts, then trigger maintenance activity, and then trigger alerts that failure is imminent or has occurred.
Top: Graph of # failure displaying a bathtub curve and 2 vertical lines. Bottom: Amplitude vs. time displaying 5 ascending curve, each with 3 circles on it and 3 horizontal lines labeled failure, perform maintenance, etc.

Figure 7.18 Multiple instances of CBD signatures and trigger points.

7.5 Chapter Summary and Book Conclusion

This is the final chapter in this book. We discussed topics related to the selection, evaluation, and other considerations of prognostic enabling. The introduction briefly touched on critical systems, advance warning, and health management. The bulk of this chapter presented a rationale for not using reliability metrics such as MTTF and MTBF; instead, a rationale was presented for using the time between the onset of degradation and the time when such degradation results in functional failure: TTF. Methods to determine or calculate a value for TTF include service life, end‐use testing, and MTTF‐based methods. A section on cost‐benefit analysis of prognostic approaches included example approaches for the following approaches: (i) no PHM; (ii) usage‐based MTTF; (iii) usage‐based 2/3 MTTF; (iv) damage‐detection; and (v) SoH at 75% or less. The section after that focused on the bathtub curve and how it relates to failure distributions, MTBF, MTTF, trigger points, and CBD signatures.

By no means does this book cover the entirety of information related to PHM and CBD – conditioning, modeling, and processing for CBM. On the other hand, this book contains a wealth of information dealing with basic approaches and, importantly, CBD signatures and how to process and linearize those signatures, which lessens the burden on prediction programs and improves the accuracy of prognostic information. Chapter 6 presented the design of a hypothetical prototype PHM system to illustrate the challenges a designer might face and to demonstrate the application of the approaches discussed in this book.

References

  1. Abernethy, R.B. (2006). The New Weibull Handbook, 5e, 2006. Berringer and Associates.
  2. Astfalck, L., Hodkiewicz, M., Medjaher et al. (2016). A modelling ecosystem for prognostics. Annual Conference of the Prognostics and Health Management Society.
  3. DAU (2018). Failure modes & effects analysis (FMEA) and failure modes, effects & criticality analysis (FMECA). Acquipedia, Defense Acquisition University, 9820 Belvoir Road, Fort Belvoir, VA 22060. https://www.dau.mil/acquipedia.
  4. Ellerman, P. (2012). Calculating Reliability Using FIT & MTTF: Arrhenius HTOL Model, MicroNote™ 1002, Rev 0. MicroSemi Corp. https://www.microsemi.com/document‐portal/doc_view/124041‐calculating‐reliability‐using‐fit‐mttf‐arrhenius‐htol‐model (accessed 2018).
  5. Hofmeister, J.P., Lall, P., and Graves, R. (2006). In‐situ, real‐time detector for faults in solder joint networks belonging to operational, fully programmed field programmable gate arrays (FPGAs). In: Proceedings of the IEEE AUTOTESTCON, Anaheim, CA, 18–21 Sept. 2006, 237–243. IEEE.
  6. Johnston, A. (2010). Reliability and Radiation Effects in Compound Semiconductors, 117–132. Singapore: World Scientific Publishing Co. Pte. Ltd.
  7. O'Connor, P. and Kleyner, A. (2012). Practical Reliability Engineering. Chichester, UK: Wiley.
  8. RAC (2005). Reliability Toolkit: Commercial Practices Editions. Reliability Analysis Center, Rome Laboratory. https://www.dsiac.org/sites/default/files/journals/2Q2005.pdf (accessed 2018).
  9. Ridgetop Group. (2018). Sentinel Network, view of the graphical user interface (GUI) for an electronic power supply (EPS). Courtesy and permission of Ridgetop Group, Inc., 3580 West Ina Road, Tucson, AZ, 85741.
  10. Seastrunk, B. (2016). Reliability, warranty, and why nothing lasts as long as it used to. http://opinionbypen.com/reliability‐warranty‐and‐why‐nothing‐lasts‐as‐long‐as‐it‐used‐to (accessed 2018).
  11. Seymour, B. (1993). MTTF, failrate, reliability and life testing. Application Bulletin, AB‐059, Burr‐Brown Corporation. Tucson, AZ.
  12. Speaks, S. (2005). Reliability and MTBF overview. Vicor Reliability Engineering. http://www.vicorpower.com/documents/quality/Rel_MTBF.pdf (accessed August 2015).
  13. Tobias, P. (2003). Assessing product relability. In: Engineering Statistics Handbook. National Institute of Standards and Technology. https://www.itl.nist.gov/div898/handbook/apr/apr.htm.
  14. Weibull. (2008). MTTF, MTBF, mean time between replacements and MTBF with scheduled replacements. HotWire 94. https://www.weibull.com/hotwire/issue94.
  15. Wilkins, D.J. (2002). The bathtub curve and product failure behavior, part two – normal life and wear‐out. HotWire 22. https://www.weibull.com/hotwire/issue22/hottopics22.htm (accessed 2018).

Further Reading

  1. Filliben, J. and Heckert, A. (2003). Probability distributions. In: Engineering Statistics Handbook. National Institute of Standards and Technology. http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm.
  2. Tobias, P. (2003a). Extreme value distributions. In: Engineering Statistics Handbook. National Institute of Standards and Technology. https://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm.
  3. Tobias, P. (2003b). How do you project reliability at use conditions? In: Engineering Statistics Handbook. National Institute of Standards and Technology. https://www.itl.nist.gov/div898/handbook/apr/section4/apr43.htm.

Note

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset