top of page


In God we trust, all others must bring data - W. Edwards Deming

I recently posted a survey on LinkedIn regarding the replacement of two identical pumps. The question is below with more details, but the intent of exercise was to get everyone to look at problem solving with a strategic lens.

Question: Which pump would you replace (LinkedIn survey)?

  • Pump #1 - Six failures in two years, $6 million of downtime

  • Pump #2 - Eight failures in two years, $8 million of downtime

Without further ado, the answer is Neither.

Whomp, Whomp. Sorry, it was a trick question ... but I won't leave everyone hanging.

Pump #2 has a much higher financial impact than the other pump, so the simple solution would be replacing that pump. 74% of the survey participants selected this answer, however, how would you know if the return on your investment is justified?

The second pump has more failures and the highest lost revenue, but if you approach this challenge with a reliability mindset, then your intuition would lead you towards asking for more data.

At its core, equipment reliability is R(t) = e-λt, where λ is the failure rate and t is the period of time over which reliability is measured. The key variables are failure rate and time.

It can be deceiving but averaging failures over time does not tell the whole story. A more statistical method is needed to capture the true reliability profile. Methods like cumulative failure plots, Crow-AMSAA reliability growth models, etc., provide a normalized distribution of failure data.

Here's the other side of the story that was missing from the problem statement:

Pump #1 has sustained more failures in recent years than from the beginning of the study, hence over time, the initial Mean Time Between Failure (MTBF) is larger than the final MTBF (figure A).

With this failure rate, if you extrapolate out for two years, then the final number of failures within this study is 59 (figure B).

Figure A: Pump #1 Reliability Growth Analysis
Figure B: Pump #1 Cumulative Failures Plot

Pump #2 has a lower MTBF initially but a change within this system has driven an increase in reliability performance (figure C). Maybe the team started applying preventive predictive maintenance (PPM) or they improved how they operate the pump or the craftsmen received more training. We don't know exactly what changed but when you extrapolate the data out for two years, the final number of failures is 10 vs 59 from Pump #1 (figure B).

That's $10 million vs. $59 million of lost revenue!!

Now I ask the question again (with more data), which pump would you replace if you only had funding for one pump? Pump #1 !!!

Figure C: Pump #2 Reliability Growth Analysis
Figure D: Pump #2 Cumulative Failures Plot

In conclusion, reliability analytics is one of the core fundamentals of improving our equipment strategies which ultimately improves our system performance. It enables us to shift from basic to precision decision making and it accentuates the voice of the equipment.

As reliability practitioners, all we have to do is LISTEN.

*all data used in this exercise is fictitious and was only assembled to be used as a basis for the explanation of reliability principles.

95 views0 comments

Recent Posts

See All
bottom of page