FEATURED BLOG

Root Cause Analysis in Healthcare

Posted by Lou Conheady on Thu, Mar 26, 2015 @ 13:03 PM

Author: Gary Tyne CMRP

Following the release of a report by economic consultants Frontier Economics(Oct 2014), it was highlighted that the cost of errors in patient safety, which includes the cost of extra treatment, bed space and nursing care as well as huge compensation pay-outs, costs the NHS between £1billion and £2.5billion a year.

In a speech to staff at Birmingham Children's Hospital (Oct 2014), Jeremy Hunt (Health Secretary) said:

hospital“World class care is not just better for patients it reduces costs for the NHS as well. More resources should be invested in improving patient care rather than wasted on picking up the pieces when things go wrong.”

As far back as 2010 Dame Christine Beasley,  chief nursing officer for England said “using Root Cause Analysis (RCA) tools to understand adverse events is “critical” to improving safety across the NHS.”

The National Patient Safety Agency (NPSA) developed a set of root cause analysis guidelines and instruction documents which were taken over by the NHS Commissioning Board Special Health Authority in 2012.

Although the NPSA did not identify a specific RCA process to be used the toolkit advocates the use of the Fish-bone or Ishikawa diagram as a key tool for identifying contributory factors and root causes. Another method utilized within the NHS is a method called ‘5 Whys’

Whilst both Fishbone and 5 Whys are tools that can be utilized in basic problem solving, both methods have received criticism from within other industries for being too basic and not complex enough to analyze root causes to the depth that is needed to ensure that solutions are identified and the problem is fixed.

There are several reasons for this criticism:

  • Tendency for investigators to stop at symptoms rather than going on to lower-level root causes
  • Inability to go beyond the investigator’s current knowledge – cannot find causes that they do not already know
  • Lack of support to help the investigator ask the right “why” questions
  • Results are not repeatable – different people using Fishbone and 5 Whys come up with different causes for the same problem
  • Tendency to isolate a single root cause, whereas each question could elicit many different root causes
  • Considered a linear method of communication for what is often a non-linear event

Many companies we work with successfully utilize the 5 Why technique or Fishbone for very basic incidents or failures. By utilizing the correct placement of triggers, organizations can use the 5 Why or Fishbone for its basic problem solving and then move to a form of Cause and Effect analysis like the Apollo Root Cause Analysis methodology for more complex problems.

A disciplined problem solving approach should push teams to think outside the box, identifying root causes and solutions that will prevent reoccurrence of the problem, instead of just treating the symptoms.

Apollo Root Cause Analysis methodology – A New Way of Thinking

ApolloFourSteps

 

 

 

 

 

 

 

 

The Apollo Root Cause Analysis methodology provides a simple structured approach that can be applied by anyone, at any time on any given event. One of its most powerful attributes is its ability to create a common understanding of contributing causes, and provide a platform to explore a range of creative solutions. Through a simple charting process, everyone involved in an investigation can contribute which generates enthusiasm for the process, resulting in positive problem solving outcomes and experiences.

The key factor for successful problem solving is the inclusion of cause and effect as part of the analytical process.

Root Cause analysis identifies causes, so that solutions are based on controlling those causes, rather than treating the symptoms.

There are many features of the Apollo Root Cause Analysis methodology which naturally fit within any Problem Solving Excellence program.

The Apollo Root Cause Analysis methodology was developed in 1987 by Dean Gano and is utilized across the world in various industries from petrochemical, aerospace, utilities, manufacturing, healthcare and others.

The Apollo Root Cause Analysis process is a 4-step method for facilitating a thorough incident investigation. The steps are:

  • Define the Problem
  • Analyze Cause and Effect Relationships
  • Identify Solutions
  • Implement the Best Solutions

The Apollo Root Cause Analysis methodology is supported by software called RealityCharting™ which is available in full version (standalone or enterprise) or as RealityCharting™ Simplified. The RealityCharting Simplified can be utilized on smaller issues and allows the user to build a cause and effect chart that is no greater than 4 causes high and 5 causes deep. This allows the user of a 5 Whys approach the ability to create a chart using the same thought process adopted in the Apollo Root Cause Analysis methodology. It also demonstrates a non-linear output to what was originally considered a linear type problem.

Training in the NHS

In the study titled: ‘Training health care professionals in Root Cause Analysis: a cross-sectional study of post-training experiences, benefits and attitudes’ by Bowie, Skinner, de Wet. A few interesting statistics begin to arise when it comes to training of RCA with the respondents.

When asked ‘What type of training did you receive?’ 81.1% of respondents had said they had received in-house training compared to 6.6% who had received external training.

When asked ‘How long was the training?’ 89% of respondents said they had less than one day training compared to 1.3% who had received more than 2 days.

From industry experience these statistics are quite surprising and can only contribute to poor quality investigations with low prevention success.

Within industry, Apollo Root Cause Analysis methodology trained facilitators are required to take minimum two day in-class training course with a follow up exam. This is also supported by a pathway for accreditation. RCA participants are given awareness training of the Apollo Root Cause Analysis methodology but only the trained facilitators can lead investigations.

Case Study

bandage

A National Health Service Trust hospital was experiencing patient complaints and was exceeding waiting time targets in the antenatal clinic. Several solutions had previously been implemented to solve this problem. However, the problem continued and it was therefore decided to run a thorough investigation utilizing the Apollo Root Cause Analysis methodology.

The root causes of the problem were identified during the investigation along with effective solutions. The solutions were implemented over a period of time. With the solutions implemented an immediate improvement was seen and   waiting time targets were being met.

“We had tried to solve this problem on a number of occasions and stress levels were increasing within the antenatal team. We had previously only dealt with the symptoms and not the root causes. Only after applying the Apollo Root Cause Analysis methodology were we able to see the evidence based causal relationships. I found the tool simple but effective and one that should be utilized in other areas across the NHS” – Midwife/Deputy Manager, Antenatal Clinic, NHS Trust Hospital

Conclusion

In the study titled ‘The challenges of undertaking root cause analysis in health care’ by Nicolini, Waring, and Mengis, (2011) it was concluded that:

“Health services leaders need to provide open endorsement of root cause analysis and of the staff carrying it out; enhance staff participation within learning activities and new analytic tools; and develop capabilities in change management”

Apollo Root Cause Analysis methodology has been taught to well over 100,000 people worldwide over the last 22 years. It has become known as the preeminent RCA methodology and is used in many fortune 500 companies and US government agencies like the Federal Aviation Authority and NASA.

If you are interested in what the Apollo Root Cause Analysis methodology can do for you and would like further information on the methodology please visit the website: http://www.apollorootcause.com

 

 publictraining

 

 

Topics: Root Cause Analysis in Healthcare

Calculating the ROI of Root Cause Analysis in Terms of Safety

Posted by Susan Rantall on Thu, Feb 05, 2015 @ 09:02 AM

ROI_on_RCA_for_Safety_blog_imageAuthor: Kevin Stewart

At some point, most companies will want to see quantifiable metrics showing that their Root Cause Analysis (RCA) program has resulted in a positive return on investment (ROI).

ROI is relatively easy to calculate as a dollar value when it comes to tangibles such as equipment or production time. Things can seem trickier when trying to assign a dollar value to safety improvements resulting from an RCA program. Try to keep it simple.

This formula -

Cost of the Problem x Likely Recurrence / Cost of the Fix = ROI

is a straightforward way to begin quantifying the ROI of your RCA program, including its effects on safety.

Let’s look at how we might calculate these costs. 

Cost of the Fix

  • Cost of an RCA investigation (you may need to include the initial training, though this should drop off as it is amoritized out over the program, as well as whatever time, resources, and people are required to conduct the investigation itself).
  • Cost of whatever resources are needed to implement a solution. Don’t forget to include new equipment, parts, additional training, and anything else that is directly attributable to the implementation.

When you eliminate a problem, calculating what you have saved depends a lot on the problem itself and what its rate of reoccurrence is. For instance, if you figure out what was causing a particular machine to fail at a rate of once/year, you won’t see the benefits of your solution for another year. It can take several years and solving many different problems to see the total value of an RCA program. 

Improved safety isn’t as impossible to quantify as it might seem. While most companies don’t publicly discuss this type of equation because it can seem insensitive, chances are your company does calculate the monetary cost of an injury or death on the job. These figures may be a bit outdated, but the Mine Safety and Health Administration at the US Department of Labor offers an online calculator, which takes into account both direct costs (like workers’ comp claims) and indirect costs (like training a new worker and lower morale), as one example.

Cost of the Problem Reoccurring

Cost of the initial problem in equipment, production delays, man hours, workers’ comp claims, medical costs, absenteeism, turnover, training new employees, lower productivity, decreased morale, legal fees, increased insurance costs.

At first glance the equation doesn’t quite make sense for a safety “near miss.” If it missed then what did it cost? Is the answer nothing? So the ROI is:  0 x likely recurrence/cost of the fix = 0? The answer obviously must include the potential cost. The cost to the business if the issue was on target and hadn’t missed. It all becomes subjective then. How do you put a cost on maybes? 

It might help to look at the statistics of how an incident occurs. Take the cost to the business if a single major accident occurred (every business has this unspoken cost locked away somewhere) and then very simply do the math. One near miss will be worth 0.003 of that cost. Tally up your near misses and now go back to the formula.

AccidentPyramid_V2

As an example, say your data indicates you have 3000 near misses in two years, or 4.1 incidents per day. Then you put a program in place and now you have 3000 near misses in four years, or 2.1 incidents per day. This translates to 3000 fewer near misses in two years time. Per the above calculations, this would generate 3000 x 0.003 or nine fewer major incidents at whatever cost your company assigns to that type of incident. This becomes the savings for your ROI (or the Cost of the Problem in our equation) and can be attributed to the safety program of which the RCA process is a part.

This formula will assist in calculating an ROI on an individual RCA, which is necessary to show that the process is working and providing value so you can justify the program. However, since most safety programs track TRIR (Total Recordable Injury Rate) or something to that effect, you will also need to show that the RCA program affects this, too. This will be difficult because the safety program is in place and doing other things to prevent safety incidents before they happen. How do you attribute a reduction in near misses to preventive programs versus items put in place from an RCA?

You may never be able to separate these items. Even with detailed records, it is not always clear why people do what they do. The best thing you can do is to track when an RCA program was incorporated and then show the improvement in your safety metric, in TRIR, or near misses. 

You can use this information to justify the program with the argument that the RCA process is part of the overall safety program and it really doesn’t matter which gets the credit as long as we have continued to drive safety improvements. The RCA program should be a small part of the overall safety program costs since there are usually several full time safety people involved, committee meetings, safety initiatives, programs, etc.

It doesn’t matter how you slice and dice it, the return on investment for your RCA program boils down to: What will it cost me to fix the problem now? – versus – What is the cost if this problem happens again?

 

training_footer 

Top 6 Sure-Fire Ways to Kill off a Root Cause Analysis Program

Posted by Lou Conheady on Wed, Sep 24, 2014 @ 15:09 PM

Author: Jack Jager

An effective root cause analysis process can improve business outcomes significantly. Why is it then that few organisations have a functioning root cause analysis process in place? 

Here are the top 6 sure-fire ways to kill off a Root Cause Analysis program

1. Don’t use it.

stop-hand

The company commits to the training, creates an expectation of use and then doesn’t follow through with commitment, process and resources! Now come on, how easy is it to devalue the training and deliver a message that the training was just to tick someone’s KPI box and that the process doesn't really need to be used.

2. Don’t support it.

Success in Root Cause Analysis would be the ultimate goal of each and every defect elimination program. To achieve success however, requires a bit more than just training people in how to do it. It requires structures that initially support the training, that mentor and provide feedback on the journey towards application of excellence and thereafter have structures that delineate exactly when an investigation needs to take place and that delivers clear support in terms of time and people to achieve the desired outcome. Without support for the chosen process the expected outcomes are rarely delivered.  

3. Don’t implement solutions.

To do all of the work involved in an investigation and then notice that there have been no corrective actions implemented, that the problem has recurred because nothing has changed, has got to be one of the easiest ways to kill off a Root Cause Analysis process. What happens when people get asked to get involved in RCAs or to facilitate them when the history indicates that nothing happens from the efforts expended in this pursuit? “I’m too busy to waste my time on that stuff!”  

 

4. Take the easy option and implement soft solutions.

Why are the soft controls implemented instead of the hard controls? Because they are easy and they don’t cost much and we are seen to be doing something about the problem. We have ticked all the boxes. But will this prevent recurrence of the problem? There is certainly no guarantee of this if it is only the soft controls that we implement. We aren’t really serious about problem solving are we, if this is what we continue to do?   

5. Continue to blame people.

The easy way out! Find a scapegoat for any problem that you don’t have time to investigate or that you simply can’t be bothered to investigate properly. But will knowing who did it, actually prevent rectraining your staff urrence of the problem?

Ask a different question! How do you control what people do? You control them or more correctly their actions by training them, by putting in the right procedures and protocols, by providing clear guidelines into what they can or can’t do, by creating standard work    instructions for everyone to follow and by clearly establishing what the rules are in the work place that must be adhered to.

What sort of controls are these if we measure them against the hierarchy of controls? They are all administrative controls, deemed to be soft controls that will give you no certainty that the problem will not happen again. We know this! So why do we implement these so readily? Because it is the easy way out! It ticks all the boxes, except the one that says “will these corrective actions prevent recurrence of the problem?”

We all understand the hierarchy of controls but do we actually use it to the extent that we should?  

6. We don’t know if we are succeeding because we don’t measure anything.

You get what you measure! When management don't implement or audit a process for completed RCAs it sends a strong message that there is no interest, or little, in the work that is being done to complete the analysis.

Tracking KPIs like, how many RCAs have been raised against the triggers set? How many actions have been raised in the month as a result and, of those actions raised, how many have been completed? If management is not interested in reviewing these things regularly along with the number of RCAs subsequently closed off in a relevant period, then it won't be long before people notice that no one is interested in the good work being done.

The additional work done to complete RCAs will not be seen as necessary, as it's not important enough to review and the work or the effort in doing this will then drop away until it's no longer done at all.

measuring success

Another interesting point is that if only the number of investigations is reported, and there is no check on the quality of the analysis being completed, then anything can be whipped up as no one is looking! If a random audit is completed on just one of the analyses completed in a month then this implies that the quality of the analysis is important to the organisation. 

What message do we send if we don’t measure anything?

 

 

In closing, the first step on the road to implementing an effective and sustainable Root Cause Analysis program is to pinpoint what's holding it back. These Top 6 sure-fire ways to kill off a Root Cause Analysis program will help you identify your obstacles, and allow you to develop a plan to overcome them. 

 

Webinar Elements to Sustain a RCA Program
 

 

Topics: root cause analysis, rca success, rca skills, root cause analysis skills, root cause investigation, root cause of success, root cause analysis tips, success definition, root cause analysis program

How to Judge the Quality of an RCA Investigation

Posted by Lou Conheady on Tue, Sep 02, 2014 @ 14:09 PM

Author: Kevin Stewart

 
This question was posed to a discussion group and it got me thinking how do you grade an investigation?

The overall success will be whether the solution actually prevents recurrence of the problem.  One definition of Root Cause Analysis is: “A structured process used to understand the causes of past events for the purpose of preventing recurrence.” So a reasonable assessment of the quality of the analysis would be to determine whether the RCA addressed the problem it set out to fix by ensuring that it never happens again (this may be a lengthy process to prove if the MTBF of the problem is 5 years, or has only happened once). bigstock-Blank-checklist-on-whiteboard--68750128.jpg


Are there some other tangibles that can help you assess the quality of an RCA?  RCAs use some sort of process to accomplish their task. If this is the case then it would stand to reason that there will be some things you can look for in order to gauge the quality of the process followed. While this is no guarantee of a correct analysis, ensuring that due diligence was followed in the process  would lend more credibility to the solutions.


What are some of these criteria by which you can judge an analysis?


  • Are the cause statements ‘binary’? By this we mean unambiguous or explicit. A few words only and precise language use without vague adjectives like “poor” since they can be very subjective.

 

  • Are the causes void of conjunctions? If they have conjunctions there may be multiple causes in the statement. Words such as: and, if, or, but, because.

 

  • Is there valid evidence for each cause? If causes don't have evidence they may not belong in the analysis or worse yet solutions may be tied to them and be ineffective.

 

  • Does each cause path have a valid reason for stopping that makes sense? It is easy to stop too soon and is sometimes obvious. For example, if a cause of “no PM” has no cause for it so that the branch stops, it would seem that an analyst in most cases would want to know why there was no PM.

 

  • Does the structure of the chart meet the process being used? If it is a principle-based process then it should be easy to check the causal elements to verify that they satisfy those principles. These might be causal logic checks or space time logic checks or others that were associated with the particular process.

 

  • Is the chart or analysis completed? Does it have a lot of unfinished branches or questions that need to be answered or action items to complete?

 

  • Is the chart or analysis completed? Does it have a lot of unfinished branches or questions that need to be answered or action items to complete?

 

  • Are the solutions SMART (Specific, Measurable, Actionable, Relevant, and Timely)? Or do they include words like: investigate, review, analyze, gather, contact, observe, verify, etc.

 

  • Do the solutions meet a set of criteria against which they can be judged?

 

  • Do the solutions address specific causes or are they general in nature?  Even though they may be identified against specific causes if they don’t directly address those causes then it may still be a guess.

 

  • If there is a report, is it well written, short, specific and cover just the basics that an executive would be interested in? Information such as cost, time to implement, when will it be completed, a brief causal description and solutions that will solve the identified problem are the requisites.

 

These are some of the things that I currently look at when I review the projects submitted by clients. I’d be interested to know about other things that may be added to the list.

describe the image

Topics: root cause analysis, rca facilitator, rca success, rca skills, root cause analysis skills, rca facilitation, root cause investigation, critical rca skills, root cause of success, root cause analysis tips, facilitation skills

Honing your Facilitation Skills: Part 2

Posted by Jo Quinn on Tue, Aug 12, 2014 @ 16:08 PM

By Kevin Stewart

With all the preparation work (Honing your Facilitation Skills: Part 1) behind you, you’re now ready to start facilitating an Apollo Root Cause Analysis. Follow the steps below to ensure a smooth process and successful outcome.

facilitation

  
Step 1. Introductions  

First, do some simple introductions and housekeeping. Cover things like:  

  • Introductions all around
  • The meeting guidelines: when to take breaks, phone and email policy, and so on
  • The objective: we’re here to fix the problem, not appoint blame
  • A review of the Apollo Root Cause Analysis methodology for those who may not be familiar with it (spend 15 – 45 minutes depending on the audience)
  • Your role as facilitator: you may need to ‘direct traffic’ or change the direction of discussions to help them discover more causes or to reach effective solutions

Step 2. Timeline


It’s now time to capture the ‘story’. What has happened that brought you all here? Get several people to provide a narrative, and develop a timeline of events as you go.  

This timeline will prove very useful. It should reveal the event or issue that becomes your primary effect or starting point – and ensures that all the items beyond this starting point capture the group’s issues.    

In the example below, if I start from T1 I’ll discover why I left my iPad in the bathroom.  However if I start at T7 I will also discover why my check process didn’t function as desired.

Date Time Event Comment
  T1 Leave iPad in department restroom stall  
  T2 Meet wife  
  T3 Have lunch  
  T4 Return to car to leave  
  T5 Wife asks if we have everything before we leave  
  T6 Pat pocket and look, run through check list  
  T7 Head home without iPad  
  T8 Get call halfway home asking if i have iPad  

While the time that each event occurs is important, it might not always be known. In these instances, you can represent the time sequence as simply T1, T2 and so on.

Step 3. Define the problem

You’re now ready to define the problem. Often, the problem definition comes out easily and everyone agrees. However, sometimes you’ll find that the group can’t arrive at a Primary Effect. In this case, as facilitator, it’s your job to regroup and ask some questions about why everyone is interested. Often, it’s about money.

One thing you don’t want to do is get stuck trying to find the perfect starting point. I’m reminded of a saying I heard once:

Dear Optimist and Pessimist,

While you were trying to decide if the glass was half empty or half full, I drank it!

Sincerely,

The Realist

The Apollo Root Cause Analysis methodology is robust enough to handle an imperfect starting point. If the problem changes or evolves as you go, just put it down as the new starting point, adjust the chart and go on!

Now that you have a defined problem, with its significance well understood, you’re now ready to start the charting process. The team should also know by now why they’re here, and how much time and money can be spent on the investigation. 

If you missed Part 1 of this article, you can read it here.

Would you like to learn more about the Apollo Root Cause Analysis methodology? Our 2 Day Root Cause Analysis Facilitators course is perfect for anyone needing to understand fundamental problem solving processes and how to facilitate an effective investigation.

Topics: root cause analysis, rca facilitator, rca skills, root cause analysis skills, rca facilitation, root cause investigation, facilitation skills, root cause analysis program, root cause facilitation, rca facilitators, root cause analysis reporting

Honing your Facilitation Skills: Part 1

Posted by Melanie Bennett on Mon, May 26, 2014 @ 08:05 AM

By Kevin Stewart

A facilitator conducting a Root Cause Analysis using the Apollo method performs a crucial role throughout an investigation. Here are some tips and steps to keep in mind when facilitating:RCA facilitation

Over many years, I repeatedly hear that the ‘Apollo Root Cause Analysis methodology is only used for big, serious investigations.’ This statement always makes me smile – because it is completely untrue. 

An RCA using the Apollo Root Cause Analysis methodology can be performed on any problem, large or small, as long as the right facilitator is on board. This article, part 1 of 2, explores the strategies and processes a facilitator should keep in mind when an investigation proceeds.

ANYONE CAN FACILITATE

In my Apollo Root Cause Analysis methodology training classes, I always ask whether anyone is a certified facilitator. I’ve only received one ‘yes’ from the 2,000 or so students that have attended my courses. This sole person will have been trained in how to manage a group of different personalities; how to progress a group towards its goal; how to be firm and fair; and so on.

Yes, these are valuable skills to learn. And, in an ideal world, every facilitator would have the time and resources to complete the training. But you can facilitate a Root Cause Analysis using the Apollo Root Cause Analysis methodology without this certification.

Facilitating RCAs requires flexibility – yet it also requires that you follow a standard outline. While every RCA has its own path, it will generally adhere to these main steps:

  1. Gather information
  2. Define the problem
  3. Create a Realitychart
       a. Phase one: Create the draft RealityChart™
       b. Phase two: finish and formalise the RealityChart™
  4. Identify solutions
  5. Finalise the report

The process – as laid out above in its basic format – may look a little daunting to someone who has never facilitated an RCA before. Particularly, if you are contending with other feelings – like being anxious in front of a crowd, or feeling responsible for the outcome. You will need to deal with these latter issues in your own way.

What you can take charge of is finding a way to shape a group of disparate people into a highly functioning team, who share the common goal of reaching a solution. By following the steps below, you can prepare for a smooth facilitation process.

PREPARING FOR A FACILITATION

Step 1. Familiarise yourself with the Apollo Root Cause Analysis methodology.

First, ensure you are familiar with the Apollo Root Cause Analysis methodology – after all, it’s what you’re trying to facilitate. If you need a review, the RealityCharting™ learning centre is a great place to visit to recap on the basics. Here, you can complete a simulated scenario to really fine-tune your understanding of the process.  It would also be a good idea to review the facilitation guidelines in the manual that you received with your original training.  It gives an excellent overview of the entire process.

Step 2. Gather your supplies.

Stock up on post-it notes – and get the good, super-sticky ones that will stay on the wall.

We suggest that you use post-it notes instead of a computer to perform the analysis, as these help to enhance the common reality.  With post-it notes, all participants can see what’s happening.

If you think the analysis will take a few days, get multiple colours of post-it notes so you can easily distinguish between the changes to the chart created on different days or at different times.

Ensure the room you’re working in has plenty of wall space. And, if the walls are unsuitable for post-it notes, tape poster paper to the wall first and then adhere your post-its. Using paper can provide the extra advantage of making the chart easy to remove and take with you.  If it’s sensitive subject matter, you can roll it up and take it with you at the end of the day.

Step 3. Prepare the participants.

Ensure that all participants know what to expect before beginning an RCA. An RCA can require a significant time commitment, so make it clear from the outset how much time is needed from them. 

Step 4. Gather information.

The more information you have at the outset, the smoother the journey.

You may already have information at hand in the form of pictures, emails, reports, write-ups, witness statements, and so on.  There may be some useful physical evidence. Request evidence from the right people, collect it and store in the one file.

You may also choose to take the entire team to see the area under investigation, so that everyone has a clear picture in mind about what you’re discussing.

Be aware that, no matter how hard you try, there will always be some missing information.  This is not a problem. You can call someone, look it up at the time, or make an action item for someone to gather the evidence later. 

 

Read Honing your Facilitation Skills: Part 2

The 'Problem is Fixed' Syndrome

Posted by Melanie Bennett on Wed, May 21, 2014 @ 11:05 AM

By Kevin Stewart

One definition of Root Cause Analysis  is:
Root Cause Analysis is any structured process used to understand the causes of past events for the purpose of preventing recurrence.

describe the imageThis basic premise is the reason that the RCA is done.

On the surface, it always appears to be a simple matter, however there are always pitfalls and nuances.

One such pitfall that RCA investigators or facilitators face is something I call the “problem is fixed” syndrome. In my work at plants I would run across situations where a problem occurred and a solution was implemented. The particular solution used may or may not have been arrived at by using RCA. In either case the solution is implemented and the “problem is fixed."

How is this statement validated as being true? Those involved will justify the solution by the simple fact that the problem hasn’t recurred, at least not in the immediate future, which unfortunately is sometimes the focus of plant management due to pressures, career goals or other reasons. On the surface this may seem to be difficult to argue – after all the problem is fixed – or is it?

In the cases I have been involved with, what has really happened is that the MTBF (Mean Time Between Failure) of the problem is actually a long time, say 5 years or greater. I was involved in two investigations where the incident hadn’t happened in the previous 5 years and most likely wouldn’t happen for another 5 years. Investigations had been performed and solutions were offered and implemented.

When asked about the effectiveness of the solutions the evidence given was that the incident hadn’t recurred so the solution must have been effective. On the surface this may appear to be difficult to argue back, since it is true that the problem hasn’t recurred. However by looking at the MTBF of the incident, you can point out that since the MTBF is long the effectiveness of the solution put in place will not be known until the problem recurs at some time in future. So at this particular time no solution, or any other proffered solution would be just as effective since the problem won’t recur anyway. You can easily see where if a facility is not careful they could be “fixing problems” with long MTBF’s claiming success and in reality not have actually provided effective solutions. This argument supports a thorough and complete RCA that is based on the cause and effect principle and are supported by evidence to insure an effective solution is implemented.

In one of the cases above the solution was to do more frequent maintenance to insure the problem was identified. While this would have worked for anything that had a MTBF longer than the frequency chosen it would not have worked for something that had a MTBF less than the frequency chosen. In addition to a solution that would not work in all cases it would have increased the cost of maintenance significantly. In this particular case a little more investigation and adding some additional causes to the chart identified that some external damage had been done and not reported, which caused the issue. If they could fix the unreported damage issue then an effective solution would be found that covered the situation that brought this incident on, it also would most likely fix other incidents that hadn’t even happened yet.

In this case you can see that the offered solution would have appeared to work just fine and since they did “something” everyone feels good about the work and “effective” solution.

The other incident was caused by someone who had recently returned to work after an extended leave. During an operating situation this employee correctly followed the incorrect procedure that was posted at the unit. The solution was to replace the incorrect posted procedure that was found to be incorrect at an operating unit. While replacing the procedure was necessary, they would not know if it is effective for quite a while. Again a little more investigation and a few more causes identified that there was no process to replace modified procedures around the plant. If this was fixed then an effective solution would be identified. You can see that here also the plant management would be thrilled because and investigation was done, something was put in place and the problem hasn’t happened again. I’m sure you can see that this situation very well could happen again either at this unit or other similar pieces of equipment.

Both of these examples also point out that a good RCA must be done using valid principles and evidence for the causes and you must not stop too soon! Stopping too soon is another common mistake in RCA – but that is another tip.

In the meantime be aware of incidents with long MTBF and offered solutions that are not based on good analysis or inappropriate causes.

 

RCA DISCUSSION

What are your thoughts on conducting an RCA facilitation / Investigation and how much time have you spent preparing the analysis and implementing solutions?  Do you have a successful tip worth sharing or discussing? We look forward to reading your feedback and perspective via comments below or let’s connect on our LinkedIn Group – ARMS Reliability - Reliability & RCA for further discussion.

 

Root Cause Analysis and 'The Blame Game'

Posted by Melanie Bennett on Mon, Apr 14, 2014 @ 09:04 AM

By Jack Jager

How often have you looked at corrective actions and thought that they would have little, if any impact in preventing the problem from reoccurring? It wasn’t just once…. and it continues to happen.

The Question is Why?

Ypointing finger 300x199et the answer is not a simple or straight forward one.  Do we believe that the person(s) creating these corrective actions aren't trying to do their best? No, I don’t think so. I firmly believe that almost all people are trying to do their best. So where does that leave us?

I think that we are caught up in a system where the reactive, quick fixes are the goal, the way of dealing with incidents on a day to day basis. If you were to have a downtime incident and you were  to bring the  power  back on quickly after an outage, or the machine is back in operation after a short space of time, then the reaction from the management group and from all of your peers is typically….”Well done! Great job!”  A pat on the back for those who have performed the job well.  In other words we give respect and accolades to those who can fix it quickly.  Conversely there is often little reward or acknowledgement for hours of diligent work in the pursuit of actions that will resolve the issue once and for all. We reinforce the quick fixes.
Now don’t get me wrong here because the ability to do the quick fix is and always will be a valuable skill, but the real challenge is to understand whether we have prevented the problem form reoccurring?

What happens after the initial fix is put into place? Where do you go to from there? In the completely reactive model, the fire-fighting model,  where breakdown maintenance often takes precedence over planned maintenance (which then sets you up for the next round of failures), there is always a fire that needs tending, so we will typically tend to jump to that fire, to the next problem on the list. “I have dealt with that one, what’s next?”

The Blame Game

From my conversations with people who attend the courses that I present covering the Apollo Root Cause Analysis methodology, something else becomes blatantly clear. We still seem, on many different levels, to be playing the “blame game”.  The question of “who” still seems to be of paramount importance to some, perhaps many people.  The question I would put forward to these people is “Will knowing who did it, stop it from happening again?” Now to my way of thinking by far the most common answer to this question will be “No”(although there are exceptions). So why do we feel that we need to focus on the “who”? If the goal of doing Root Cause Analysis is to prevent recurrence of the problem the challenge lies not so much in who was involved but rather emphasising, or focusing, on what you can do to stop it from happening again. This focus will lead to gathering more factual information which is the essence of understanding the problem first and foremost.

The “who” side of the question is pretty easy to determine, but if that is what we focus on then it is likely to limit thorough questioning,  and leads quickly and easily down a blame path. Sanctions are given or jobs lost, all based on the knowledge of “who” was at fault. But where does this lead? Wouldn’t this lead to a lack of reporting mistakes or faults as there will be unwanted consequences because of the report? Doesn’t it elevate risk as there would now be a culture of hiding or covering up mistakes? When you ask questions, what are likely to get? The truth?

Something else to consider is whether people intend to cause damage, create failures, injure themselves or hurt others? Again the overwhelming answer is still “NO”.  That people are often involved in many incidents, and make mistakes, is seemingly the constant part of the equation. But that is the nature of the beast. People are fallible, they do make mistakes and no matter how hard we try to control this aspect, the “human error” side of causes, it is forever doomed to failure. If we rely on trying to control people then our solutions will have no certainty in their outcome. Going down this path is simply not reliable.

Hierarchy of Control

This is echoed in the concept of the “Hierarchy of control” where corrective actions are placed within the Hierarchy, as being either a form of Elimination, Substitution, Engineering,  Administrative or P.P.E.  controls.

The first three of these are perceived to be very strong controls, or hard controls, with almost guaranteed, reliable, consistent results. They are however more time consuming and typically involve spending money to achieve your desired outcome. Administrative controls or the use of PPE as a form of control are perceived to be soft controls. They are relatively quick to implement and don’t cost too much and yet if you were to ask the question “will they prevent recurrence”, almost universally the response will be “NO”!

They may however satisfy the need to report.  I have “ticked the box” and created a perception of having done something about the incident. To take this a step further these “soft options”, now get signed off by management who are fully cognisant of the “Hierarchy of control”. If we keep taking the soft options however is it any wonder that we are still “fire fighting”. If we don’t fundamentally change or control causes that create the  problem then the problem still has an ability to happen again, regardless of the “who”, the person involved. This could be anyone.

Creating another Procedure

How often have you heard or seen, as a response to a problem …….”create another procedure”? Would you be certain that this will prevent recurrence of the problem? It could be said that you have tried to control the problem. You can certainly show that you have done something. Would it however be defensible in a court of law if someone were to subsequently get hurt? If you expect someone to remember every single procedure, of every single task, of the many tasks that they need to perform in every single day, is this feasible? And we all know it is a soft control! An administrative one. So do the courts.

The Argument about Sanctions…

Who learns the most from the mistakes that are made? Isn’t it the person or the people involved? This was put into perspective for me by another Apollo instructor at a conference in Indianapolis. He said to me “if someone makes a mistake for instance and the cost of that mistake might be say $500,000, and you are so angered by this that you then sack the person who made the mistake (quite possible, even probable)……it is like sending someone on a $500,000 training course and then sacking them the next day”.

Does this make any sense?


RCA DISCUSSION

What have you learned from conducting an RCA? Do you have any successful tips or feedback woth sharing or discussing? We look forward to reading your feedback via comments below or let’s connect on our LinkedIn Group – ARMS Reliability - Apollo Root Cause Analysis for further discussion.

 

FREE eBook - 6 Steps Beyond the 5 Whys

Posted by Melanie Bennett on Wed, Jan 22, 2014 @ 09:01 AM

describe the image

When an incident or accident occurs at your workplace, what do you do to fix the problem?

In many cases, the "5 Whys process" is a proven and accepted means to get to the root cause of the incident. But what do you do if this technique doesn't dive deep enough - and only presents further symptoms rather than the real cause or, indeed, causes?

Ths eBook reveals the benefits and limitations of the 5 Whys process; and then presents a useful method for taking the analysis further.

Get My Copy

Investigating Incidents: How To Avoid Pitfalls and Perfect Your Process

Posted by Melanie Bennett on Sat, Jan 18, 2014 @ 07:01 AM

By Ned Callahan

Everybody agrees, don’t they, that the whole point of the investigation of safety incidents, whether injuries have actually been suffered or the potential for them was high is to prevent their recurrence? Regrettably, the tendency to blame is more apparent in these cases than in mechanical failures or supply chain deviations, for example, presumably because of the deeper emotional responses from the affected parties.

Tblog RCA health and safetyhe significance of the particular event can then be intensified because the variety and depth of the participants’ emotional responses are undeniably “real” and can, if not appropriately accommodated in the total incident management process, cloud the judgement of the investigator/s and even complicate the task for the team of analysts assembled for the RCA.  Minimising the risk of friction, avoiding undue “heat” being generated by the harm (nearly) caused, can be achieved by the prompt application of an investigation process which both encourages and relies upon the frank sharing of information in order to achieve the agreed objective.

A mature business will have a risk matrix which pre-determines the level at which the investigation is undertaken and therefore, which “tool” or methodology may be prescribed for the particular event. The previous deliberations about which method to use for what level/type of event will have been influenced by the organisation’s previous analysis history, incorporating the relative success or otherwise of previous investigations. These results will have been generated by multiple factors such as the quality of evidence, determined by the care taken in its collection and preservation, the rigour of the facilitation process, the relative “influence” of stakeholders and significantly, the co-operation of the incident actors, being the victim/s and witness/es.

An event, being the first of its type in the organisation, with a very minor injury and no time lost may only require a “trouble-shooting” type approach. The expectations of regulatory authorities in hazardous industries can be another influence on the choice.
But then all that experience, positive, negative or mixed can be neutralised by the emergence of a different principal with responsibility for the RCA process who has experience of another method or specific training and expertise and has the clout to sway the choice. It may well be simply based on a personal preference arising from familiarity rather than an objective assessment of alternatives.

Regardless of the methodology selected, the purpose must be to prevent recurrence and not to blame. If the investigation focuses primarily on “who” did or did not do something or other, the tenor of the subsequent analysis may become negative and the opportunity to really learn from the experience will be subordinate to the search for a culprit. By the way, this “no blame” attitude does not exempt personnel who are repeatedly and wilfully negligent in the performance of their duties or associated activities in the workplace. The owners have a duty of care to provide a workplace for all and if misbehaviours increase the probability of increased risk of harm they are obliged to respond. Reprimand is a reasonable sanction. Or, in the most severe but rare cases, dismissal might be reasonably justified. The justification would be the thorough, objective analysis. Otherwise the organisation could find itself liable to unfair dismissal or similar charges.

The need for objectivity cannot be over-stated and explains why best practice for significant events is to engage a third party facilitator who has no “skin in the game”. If the broad business context for deep analysis is Continuous Improvement, the enhanced safety of the workplace and all processes and equipment operations used by its employees must be the outcome.

Keeping in mind that every event is unique in some respects – the most obvious being that it happened at a different time to every other one (you know of) – the purpose of the RCA is to discover what is different or distinctive about this event. What are the other unique causes which might be effectively controlled or negated in order to significantly reduce the likelihood of a repetition or similar occurrence?

So, after the exhaustive process has been followed, with the facts associated with the incident having been recorded, the consequences measured and documented, the timeline and sequence of events mapped, any cans of worms expertly opened and explored, you have discovered a number of causes. Typically and ideally, you will have discovered causes of which you were ignorant at the beginning of the analysis. And these will only be discovered if the event is sliced thinly, if every phase is considered very carefully. These ought to be documented in some graphical form so that the team’s understanding of the event can be shared and agreed as complete. The cause and effect chart or tree is the most common display form employed and there needs to be provision for the display of the pertinent evidence for each cause.

It is imperative that all of the causes are revealed before you can be confident that prevention is assured. Being persistent in the quest for causes is a very desirable trait. Don’t stop too soon. Then, the existence of clearly defined relationships between the causes and their effects will provide the clarity necessary to instil confidence that the consequent solutions will be effective. It is the solutions, targeting specific causes, which combine to assure prevention, or at least, serious mitigation of the consequences.

But the job is incomplete. The solutions need to be implemented in a timely fashion to have an effect on the probability of recurrence. If, for example, one of the causes is the failure of some mechanism then identifying a solution for that may also entail deeper investigation to determine other failure modes which could have similar, potentially harmful effects. Note however that the investigation is not per se a solution even though it may provide data which leads one to alternative or complementary solutions.

Establishing the priorities for that implementation, giving ownership and due dates for completion are the closure everybody needs. It will be a learning experience for all intimately concerned but can and should be shared more widely in a large organisation. Nobody disagrees with a safe workplace and that attitude will reflect well on the organisation and community regard may well be heightened.  A safe workplace also reduces the likelihood of interruptions to business and therefore this increased reliability will strengthen relationships with customers and suppliers alike long-term. 

training footer ad resized 600