Warning to Reader: This piece has a lot of free-association flow to it!
Oops. A few weeks ago a story emerged in the press that a B-52 had flown from North Dakota to Louisiana with half-a-dozen nuclear armed missiles under its wing. The aircrew thought they were transporting disarmed missiles. This is a rather major oh-oh for the USAF, as in general, they are supposed to keep track of nuclear warheads. (Yeah, I am understating this. I, by the way, can speak from a small amount of experience as I once held a certification to deal with these things, so I have some idea how rigorous the procedures are.)
Normally the military deals with nuclear weapons issues with a simple “We do not confirm or deny…” but in this case they have released an unprecedented amount of information, including a confirmation that nukes were on a particular plane in a particular location at a particular time.
The news story of the report summarized a culture of casual disregard for the procedures – the standard work – for handling nukes. I quote the gist of it here:
A main reason for the error was that crews had decided not to follow a complex schedule under which the status of the missiles is tracked while they are disarmed, loaded, moved and so on, one official said on condition of anonymity because he was not authorized to speak on the record.
The airmen replaced the schedule with their own “informal” system, he said, though he didn’t say why they did that nor how long they had been doing it their own way.
“This was an unacceptable mistake and a clear deviation from our exacting standards,” Air Force Secretary Michael W. Wynne said at a Pentagon press conference with Newton. “We hold ourselves accountable to the American people and want to ensure proper corrective action has been taken.”
So what’s the point, and what has this got to do with lean manufacturing?
The right process produces the right result.
As true as this is, it isn’t the point. The point is that the Airmen didn’t follow the procedures. And now the Air Force will apply the “Bad Apple” theory, weed out the people who are to blame, re-emphasize the correct procedures everywhere else, and call it good.
How often do you do this when there is a quality problem, an accident or a near miss? How often to you cite “Human Error” or “not following procedures” or “didn’t follow standard work” as a so-called root cause?
You need to keep asking “why” some more, probably three or four more times.
To this end, I believe Sydney Dekker’s book “Field Guide To Understanding Human Error” should be mandatory reading for all safety and quality processionals.
Dekker has done most of his research in the aviation industry, and mostly around accidents and incidents, but his work applies anywhere that people’s mistakes can result in problems.
In the USAF case cited above, there was (according to the reports in the open press) a culture of casual disregard for the established procedures. This probably worked for months or years because there wasn’t a problem. The “norms” of the organization differed from “the rules” and I would speculate there was considerable peer pressure, and possibly even supervisory pressure, to stick with the “norms” as they seemed to be adequate.
Admittedly, in this case, things went further than they normally do, but let’s take it away from nuclear weapons and into an industrial work environment.
Look at your fork truck drivers. Assuming they got the same training I did, they were taught a set of “rules” regarding always fastening seat belts, managing the weight of the load, keeping speed down and under control, checking what is behind and to the sides before starting a turn (as the rear-end swings out.. the opposite of a car). All of these things are necessary to ensure safe operation.
Now go to the shop floor. Things are late. The place is crowded. The drivers are under time pressure, real or perceived. They have to continuously mount and dismount. The seatbelt is a pain. They get to work, have the meeting, then are expected to be driving, so there is no real time for the “required” mechanical checks. They start taking little shortcuts in order to get the job done the way they believe they are expected to do it. The “rules” become supplemented by “the norms.” This works because The Rules apply an extra margin of safety that is well above the other random things that just happen around us every day. The Norms – the way things are actually done erode that safety margin a little bit, but normally nothing happens.
Murphy’s Law is wrong. Things that could go wrong usually don’t.
The “Bad Apple” theory suggest that accidents (and defects) are the fault of a few people who refuse to follow the correct procedures. “If only ‘they’ followed ‘the rules’ then this would not have happened.” But that does not ask why they didn’t do it that way.
Recall another couple of catastrophes: We have lost two Space Shuttle crews to the same problem. In both the Challenger and Columbia accident reports, the investigators cite a culture where a problem which could have caused an airframe loss happened frequently. Eventually concern about it became routine. Then, one time, other factors come into play and what usually happens didn’t happen and we are wringing our hands about what happened this time. Truth is it nearly happened every time. But we don’t see that because we assume that every bad incident is an exception, the result of something different this time. In reality, it is usually just bad luck in a system which eroded to the point where luck was relied upon to ensure a safe, quality outcome. In this case they didn’t single out “bad apples” because the investigations were actually done pretty well. Unfortunately the culture at NASA didn’t adjust accordingly. (Plus Space Flight involves the management of unimaginable amounts of energy, and sometimes that energy goes where we don’t want it to.)
So – those quality checks in your standard work. Do you have explicit time built in to the work cycle to do them? Are your team members under pressure real or perceived to go faster?
What happens if there is an accident or a defect? Does the single team member who, today, was doing the same thing that everyone does every day get called out and blamed? Just look at your accident reports to find out. If the countermeasure is “Team Member trained” or “Team Member told to pay more attention” or just about anything else that calls out action on a single Team Member then… guilty.
What about everybody else? Following an incident or accident, the organization emphasizes following The Rules. They put up banners, have all-hands meetings, maybe even tape signs up in the work place as reminders and call them “visual controls.” And everything goes great for a few weeks, but then the inevitable pressure returns and The Norms are re-asserted.
Another example: Steve and I were watching an inspection process. The product was small and composed of layers of material assembled by machine. Sometimes the machine screwed up and left one out. More rarely, it screwed up and doubled something up. As a countermeasure, the Team Member was to take each item and place it on a precise scale, note the weight, and compare the weight to a chart of the normal ranges for the various products.
There were a couple of problems with this. First, the human factors were terrible. The scale had a digital readout. The chart was printed and taped to the table. The Team Member had to know what product it was, reference the correct line on the chart, and compare a displayed number with a set of displayed numbers which were expressed to two decimal places. So the scale might say “5.42” and she had to verify whether that was in or out of the range of “5.38 – 5.45”
Human nature, when reading numbers, is that you will see what you expect to see. You might recall that it was different after five or six more reads. So telling the Team Member to “pay more attention” if she made a mistake was unreasonable. Remember, she is doing this for a 12 hour shift. There is no way anyone could pay attention continuously in this kind of work. If a defective item got through, though, there would be a root cause of “Team Member didn’t pay attention.” She is set up to fail.
But wait, there’s more!
She was weighing the items two at a time. Then she was mentally dividing the weight by two, and then looking it up. Even if she was very good at the mental math and had the acceptable range memorized, that isn’t going to work. Plus, and this is the key point, in the unlikely but possible scenario where the machine left out a layer in one item, then doubled up the next, the net weight of the two defective items together would be just fine.
“Why do you weight two at a time?” Answer: “It’s faster.” This is true, but:
- It doesn’t work.
- She doesn’t need to go faster.
Her cycle time for weighing single items was well within the required work pace. But the supervisor was under pressure for more output because of problems elsewhere, and had translated that pressure to the Team Member in a vague “work faster if you can” way. It was the norm in that area, which was different from the rules.
Where is all of this going?
The Air Force has ruined 70 careers as a result of the cruise missile incident. They may have been right to do so, I wasn’t there, and this was a pretty serious case. But the fact that it got to this point is a process and system breakdown, and it goes way beyond the base involved.
Go to your own shop floor. Stand in the chalk circle. Watch, in detail, what is actually happening. Compare it with what you believe should be happening. Then start asking “Why?” and include:
“Why do people believe they have to take this shortcut?”