The Role of Quality in Crisis

This is the second installation of the discussions that we had presented following the 2011 Japan disaster (earthquak, tsunami, nuclear power crisis).

Here we look at the role of Quality in crisis. What lessons can we learn from such unthinkable events?

1. Centralized consensus vs. triage leadership in disaster preparedness and decision making.

One of the tenets of quality management is "Plan-Do-Check-Act." We find that when the planning has been done properly and consensus built among constituents, most processes will fulfill requirements, and the Check-Act serves to fine tune the process. In Japan, this consensus building is called "ne-mawashi" or going around the roots of a tree before transplanting it to make sure everything is ok.

While TQM experts praise consensus as good for planning, there is a downside that Dr. Deming warned about in chapter 6 of his book The New Economics. That is — "with shared responsibility, no one is responsible." Thus, ne-mawashi can lead to finger pointing and blame instead of collaboration, as well as increased murkiness in accountability and delay in critical actions.

2. This raises these quality questions:

(A) In a disaster, do we go back to Plan or do we go directly to Do-Check-Act (sometimes called Do-Redo) at the local level?

'Planning' may require subject matter experts who may not be optimally located since the exact location of the disaster may be unknown until after it occurs, and time which may be limited by threat to life or subsequent failures in other systems.

Also, in terms of 'planning' resources, are the same resources being competed for various emergency operations (such as fire, police, medical), or should different resources be planned? From a time perspective, should the priority be given to allocating the resources to take care of those who are still alive and need immediate assistance, or should the resources be expedited first to cooling nuclear fuel to address the medium term risk to the life and livelihood of survivors?

In the case of Japan, were certain needs more urgent than others? Such as the need to verify the emergency level vs. the need to issue a quick evacuation order; the need to determine resources for disaster relief vs. the need to add resources to prevent a nuclear event, etc. And how should those priorities be made, by whom, and when? Should such priorities have changed the way the leaders approach the 'planning,' 'doing,' and 'checking'?

(B) In disaster preparedness, how has the extent of the disaster be predicted?

If the disaster falls within the predicted parameters, the planned response may be sufficient. If the disaster rises to unanticipated levels, however, as is the case in Japan, the response plan can easily become insufficient.

"Beyond expectation" was how virtually everyone — from Tokyo Electric Power Company (the operator of Fukushima power plant) to the government nuclear power regulators and safety commission— described the March 11 earthquake and tsunami in Tohoku region, although retrospective review of historic data begins to hint otherwise.

The probability of a nuclear fatality was set in 2003 by the Japanese Nuclear Commission (JNC) to not exceed 1 × 10-6 per year or about 1 in a million years. On the Japanese nuclear event, Nassim Nicholas Taleb, author of The Black Swan, cautions, that model error causes underestimation of small probabilities and their contribution (see his web site). This highly improbable event with massive consequences is what Taleb calls a "Black Swan."

(C) Is standard FMEA practice adequate for for a Black Swan event?

In FMEA (Failure Modes and Effects Analysis) we try to account for this Black Swan by looking at not only frequency of occurrence, but also impact and detection. Assuming JNC's probability estimate for a nuclear fatality of 1 × 10-6, the likelihood of a M9.0 earthquake at less than 1 per 100 years or 1 × 10-2 (worst case prediction), and the likelihood of a 20 meter tsunami at less than 1 per 100 years or 1 × 10-2 (worst case prediction), the probability of all three occurring simultaneously would be 1 × 10-10, or 1 in 10,000,000,000 (one in ten billion).

In Design FMEA, we might calculate a risk priority number (RPN) for such an event as:

  • Severity of Impact: Hazardous - without warning. Ranking 10 out of 10 (scale maximum) ;

  • Frequency of Occurrence: Remote - failure is unlikely (<1 in 1,500,000). Ranking 1 out of 10 ;

  • Detection: High chance of detecting failure mode (Japan has some of the best earthquake and tsunami detectors in the world, but radiation detection has proven to be less competent). Ranking 3 out of 10.

When calculating the RPN, the standard FMEA approach is to multiply the three rankings, 10 x 1 x 3 = 30 (out of a possible 1,000 points or 3%). This low score would not normally catch the attention of engineers. This is due to the low ranking of frequency of occurrence which has reduced the overall RPN.

(D) More appropriate way to address this kind of FMEA is by weighting the criteria of Severity, Frequency, and Detection by using Analytic Hierarchy Process (AHP) -- a method long suggested by Dr. Akao and other QFD experts.

As an example for nuclear plant design, this weighting might work out to Severity 77.7%, Frequency 15.5%, and Detection 6.9%, with an inconsistency ratio of 0.07. Then to preserve accuracy, instead of the traditional RPN calculation which improperly multiplies ordinal rankings, it would be necessary to convert the FMEA rankings to ratio scale first.

The RPN using AHP would then equal 40.1%, making it much likely to catch the attention of engineers (see below). The AHP tools to do this kind of FMEA are detailed in the QFD Black Belt® Course.

3. "Fu-an" System for Reporting Concerns

Perhaps, a new technique which I call "fu-an" system (uneasiness reporting system) might be useful for workers who worry something might be wrong but cannot articulate the problem or solution.

Japanese TQM uses a suggestion system called "tei-an" which allows front-line workers to identify problems and suggest improvements. Of course, this requires some process knowledge by the workers so that they can test the improvements before suggesting them. Often, this is combined with another Japanese TQM technique called "pokayoke" or mistake-proofing a process.

In complex systems like nuclear reactors, and where some workers are contractors or subcontractors as in this case, however, such knowledge and experience may be lacking or not communicated adequately among all levels of workers. In fu-an system, workers might still be able to register with management their uneasy feeling about anything that is related to their job even when they do not have expertise to come up with a solution to suggest. The management then has an obligation to follow up.

In the U.S., we do have "whistle blowers" protection in many organizations, but that is an adversarial relationship. Like tei-an, fu-an system should be collaborative.

(This particular discussion was followed by newsletter “The Voice of Customer Issues in High Impact Projects".)

© QFD Institute | Glenn Mazur