Maintainability Theory

In reliability, one is concerned with designing an item to last as long as possible without failure; in maintainability, the emphasis is on designing an item so that a failure can be corrected as quickly as possible. The combination of high reliability and high maintainability results in high system availability. Maintainability, then, is a measure of the ease and rapidity with which a system or equipment can be restored to operational status following a failure. It is a function of the equipment design and installation, personnel availability in the required skill levels, adequacy of maintenance procedures and test equipment, and the physical environment under which maintenance is performed. As with reliability, maintainability parameters are also probabilistic and are analyzed by the use of continuous and discrete random variables, probabilistic parameters, and statistical distributions. An example of a discrete maintainability parameter is the number of maintenance actions completed in some time t, whereas an example of a continuous maintainability parameter is the time to complete a maintenance action.

A good way to look at basic maintainability concepts is in terms of functions which are analogous to those in reliability. They may be derived in a way identical to that done for reliability, by merely substituting t (time-to-restore) for t (time-to-failure), μ (repair rate) for λ (failure rate), and M(t) probability of successfully completing a repair action in time t, or P(T ≤ t) for F(t) probability of failing by age t. In other words, the following correspondences prevail in maintainability and reliability engineering functions.

  1. To the time-to-failure probability density function (pdf) in reliability corresponds the time-to-maintain pdf in maintainability.
  2. To the failure rate function in reliability corresponds the repair rate function in maintainability. Repair rate is the rate with which a repair action is performed and is expressed in terms of the number of repair actions performed and successfully completed per hour.
  3. To the probability of system failure, or system unreliability, corresponds the probability of successful system maintenance, or system maintainability.

These and other analogous functions are summarized in the following table.

Comparison of Reliability and Maintainability Functions

As illustrated in the figure below, maintainability can be expressed either as a measure of the time (T) required to repair a given percentage (P%) of all system failures, or as a probability (P) of restoring the system to operational status within a period of time (T) following a failure.

Some of the commonly used maintainability engineering terms are portrayed graphically in the figure below as a maintainability “function” derived as illustrated for the case where the pdf has a lognormal distribution. Points 1, 2, and 3 shown in the figure identify the mean, median, and maximum corrective time-to-repair, respectively.

Points 1, 2, and 3 are defined as follows:

1. Mean Time to Repair, Mct, the mean time required to complete a maintenance action, i.e., total maintenance downtime divided by total maintenance actions for a given period of time, given as:

where

λi = failure rate for the ith repairable element of the item for which maintainability is to be determined, adjusted for duty cycle, catastrophic failures, tolerance and interaction failures, etc., which will result in deterioration of item performance to the point that a maintenance action will be initiated

Mcti = average corrective time required to repair the ith repairable element in the event of its failure

2. Median Time to Repair, Mct  the downtime within which 50% of all maintenance actions can be completed

3. Maximum Time to Repair, Mmax ct , the maximum time required to complete a specified percentage of all maintenance actions. For example, if a system specification indicated Mmax ct(95%) = 1 hour, this means that 95% of all maintenance actions must be completed within one hour.

A smaller number of statistical distributions is used for maintainability analysis than for reliability analysis. This may be due to the fact that maintainability has traditionally lagged reliability theory in development. The most commonly used distributions for maintainability analysis have been the normal, lognormal, and exponential. In fact, as the exponential distribution has been the one most widely used in reliability analysis of   equipment/systems, the lognormal distribution is the most commonly used for equipment/system maintainability analysis. This section will concentrate on the use of the normal, exponential, and lognormal distribution, and give examples of their use in maintainability analysis.

The lognormal distribution applies to most maintenance tasks and repair actions comprised of several subsidiary tasks of unequal frequency and time duration. The probability density function is given by

where

t = Mcti = repair time from each failure

 

SlnMct = standard deviation of the natural logarithm of the repair times

t’ = ln(Mcti) = ln(t)

N = number of repair actions

 

Mean Time to Repair (MTTR)

The mean time to repair is given by

 

Median Time to Repair

 

Maximum time to Repair

The maximum time to repair is given by

where is the value from the normal distribution function corresponding to the percentage point (1-α) on the maintainability function for which Mmax ct is defined. The most commonly used values of φ or Z(t1-α) are:

 

Reliability Analytics Toolkit Example

Given the following system repair times and frequencies of repair time observations, what is the mean time to repair and the maximum time to repair at the 95% level, Mmax95%?

Time Freq.
0.2    1
0.3    1
0.5    4
0.6    2
0.7    3
0.8    2
1       4
1.1    1
1.3    1
1.5    4
2       2
2.2    1
2.5    1
2.7    1
3       2
3.3    2
4       2
4.5    1
4.7    1
5       1
5.4    1
5.5    1
7       1
7.5    1
8.8    1
9       1
10.3  1
22     1
24.5  1

Solution:

Paste the above input data into box 1 of the maintainability analysis tool and select 95% for the MMax percentile in the item 2 pull-down, as shown below. Note, the instructions below the box 1 input mention the input format is a repair time followed by a single space, then the number of observations; however, the tool will properly parse data with multiple spaces separating these two values.

Maintainability analysis inputs

These inputs result in the following results table showing the time-to-restore and associated probability density function and cumulative density function. Looking down the fourth column, we see 0.9501 at a time of 12.17 hours, indicating that 95% of the repairs will take 12.17 hours, or less.  (Note, the exact time for 0.9500, as calculated by the tool, is 12.08 hours.)

The tool calculates the following system maintainability parameters:

Mode Time to Repair = 0.5586 hours
g(t = 0.5586) = 0.3447

Median Time to Repair = 1.9325 hours
M(t = 1.9325) = 0.5000

Mean Time to Repair (MTTR) = 3.5955 hours

The time within which 95% of the maintenance actions are completed, MMax 95% = 12.0841 hours

The tool also generates the following plots showing the lognormal PDF and lognormal CDF plots based on the repair times.

PDF of repair times

CDF of repair times

The above example data comes directly from MIL-HDBK-338, page 5-44, Ground Electronic System Maintainability Analysis, Example 1. Values from the probability density of time-to-repair data shown on page 5-49 are duplicated by the tool along with the plot of the lognormal pdf that is shown on page 5-50, and the plot of the maintainability function, M(t), that is shown on page 5-54.  The MTTR is calculated on page 5-52 as 3.595 hours, the median time to repair is calculated on page 5-52 as 1.932 hours and the 95 % value for Mmax are shown on page 5-55 is 12.08 hours.

 

 Normal Distribution

In maintainability, the normal distribution applies to relatively straightforward maintenance tasks and repair actions (e.g., simple removal and replacement tasks) which consistently require a fixed amount of time to complete. Maintenance task times of this nature are usually normally distributed, producing a probability density function given by:

where

Mcti = repair time for an individual maintenance action

and the average repair time for N observations is:

The standard deviation of the distribution of repair times, based on N observations is

The mean time to repair is given by

The median time to repair is given by

which is equal to the mean time to repair because of the symmetry of the
normal distribution.

The maximum time to repair is given by

where is the value from the normal distribution function corresponding to the percentage point (1-α) on the maintainability function for which Mmax ct is defined. The most commonly used values of φ or Z(t1-α) are:

 

Example

An equipment whose repair times are assumed to be normally distributed was monitored and the following repair times observed (in minutes): 6.5, 13.25, 17.25, 17.25, 19.75, 23, 23, 24.75, 27.5, 27.5, 27.5, 32, 34.75, 34.75, 37.5, 37.5, 40.25, 42.5, 44.75, 52.

Find the following parameters:

1. The pdf of g(t) and its value at 30 minutes

2. The MTTR and median times to repair

3. The maintainability for 30 minutes

4. The time within which 90% of the maintenance actions are completed

5. The repair rate, u(t), at 30 minutes

 Solution

1. The pdf of g(t) and its value at 30 minutes

= 583.25/20 = 29.16 minutes

= 11.5 minutes

g(t = 30) = 0.035

2. The MTTR and median times to repair

These are the same for the normal distribution because of its symmetry, and
are 29.16 minutes as calculated in step 1 above.

The maintainability for 30 minutes

3. The maintainability for 30 minutes

From a standard normal table (z table):

φ(0.07) = 1 – .4721 = 0.53

This means that there is a 53% chance of making a repair in 30 minutes.

Another way of making this calculation is to use the normal distribution tool in the Reliability Analytics Toolkit, with the following inputs (ignore the units of hours mentioned in the tool):

The tool generates the following curves, with the M(30) represented by the pink shaded area. The second curve shows the reliability at 30, which is represented by the green shaded area in the pdf plot.  We are interested in the pink shaded area, which is one minus this value, or 0.53.

4. The time within which 90% of the maintenance actions are completed

This can be found to be 43 minutes by rolling the mouse over the cumulative R(t) curve until a value of 0.1 is found.  Since this represents the green shaded area of the pdf curve, the red shaded area is 0.90.  This occurs at 43 minutes, as shown in the pop-up in the figure below (ignore units of hours shown on plot).

5. The repair rate, u(t), at 30 minutes

μ(30) = g(30)/(1 – M(30)) = 0.035/(1 – 0.53) = 0.074 repairs/minute

 

Exponential Distribution

In maintainability analysis, the exponential distribution applies to maintenance tasks and maintenance actions whose completion times are independent of previous maintenance experience (e.g., substitution methods of failure isolation where several equally likely alternatives are available and each alternative is exercised, one at a time, until the one which caused the failure is isolated), producing a probability density function given by:

The fundamental maintainability parameter is repair rate, u(t), which is the reciprocal of Mct, the mean time to repair (MTTR). Thus, another expression for g(t) in terms of u(t), the repair rate is

where u is the repair rate (which is constant for the exponential case)
The maintainability function, or probability of repair in time t,  is given by:

The MTTR is given by

If the maintainability function, M(t), is known, the MTTR can also be
obtained from

The median time to repair is given by:

The maximum time to repair is given by

ke = value of Mct./MTTR at the specified percentage point α on the
exponential function at which Mmaxct is defined. Values of ke are:

α            ke
95%       3.00
90%       2.31
85%       1.90
80%       1.61

 

Example

For a large computer installation, the maintenance crew logbook shows that over a period of a month there were 15 unscheduled maintenance actions or downtimes, and 1200 minutes in emergency maintenance status. Based upon prior data on this equipment, the maintainability analyst knew that the repair times were exponentially distributed. A warranty contract between the computer company and the customer calls for a penalty payment of any downtime exceeding 100 minutes. Find the following:

1. The MTTR and repair rate

2. The maintainability function M(t) for 100 minutes, or the probability that the warranty requirement is being met

3. The median time to repair

4. The time within which 95% of the maintenance actions can be completed

Solution

1. The MTTR and repair rate

MTTR = 1200/15 = 80 minutes

the repair rate, μ is 1/80 = 0.0125 repairs/minute

2. The maintainability function M(t) for 100 minutes

M(100)  = 0.714

There is a 71.4 % chance of meeting the warranty requirement.

3. The median time to repair

The median time to repair is (0.69)(80) = 55.2 minutes

4. The time within which 95% of the maintenance actions can be completed

ke is 3 for 95%, as shown in the list above.

MMAXct is (3)(80) = 240 minutes

 

References:

1. MIL-HDBK-338, Electronic Reliability Design Handbook, 15 Oct 84
2. Bazovsky, Igor, Reliability Theory and Practice
3. O’Connor, Patrick, D. T., Practical Reliability Engineering
4. Birolini, Alessandro, Reliability Engineering: Theory and Practice