Bryan Zeigler has a great post on his “Lean is Good” blog site. Titled “Andon Calls and Muri,” he describes Toyota’s phenomenal capacity for responding to problems, and then takes us back to where the rest of us are with some really great questions:
If it is physically impossible to answer every andon call in order to work on every problem, is it best to fix the first one that comes sequentially? Then do work arounds and rework until we can respond to another one?
I have always used systems to prioritize what problems we work on whether it be pareto charts, value stream maps, or just plain standing in the circle. Once directions, or as Toyota Kata describes them, process target conditions, are established and the highest priority items are “fixed” and then we move on the the next most important challenge.
Working on all problems in the process would overburden the organization’s problem solvers. This would be a form of dreaded muri right? I’ve read and heard much about the Toyota staffing levels required to operate TPS effectively. Most range from 5 to 7 employees under each level of leadership position. Again, my experiences are more like 30 to 40 employees under a 1st line leader.
Two questions:
- What percentage of daily problems are organizations that you work in staffed to handle?
- What philosophies do you utilize to ensure you don’t introduce Muri to the problem solving teammates in the organization?
Great observation, and great questions. And Bryan is certainly not the only one who has had the insight to ask them.
At this point, I have to issue a bit of a disclaimer. I have spent a full day on the shop floor in Bryan’s workplace. I can visualize exactly what he is talking about. Unfortunately schedules didn’t allow me to meet him, but I have a pretty good sense of the situation he is dealing with.
Since pretty much everyone has these issues when they start to contemplate andon calls, let’s start by reviewing the theoretical base, then moving into reality.
The core principle of jidoka is “Stop and respond to every abnormality.” That is what we are trying to do here. This means there must be a clear definition of what is “normal” and what is “abnormal.”
In the strictest sense, if it isn’t clearly defined as “normal” it is ambiguous.
In a system where the most basic fundamental is to define processes, ambiguity is a problem as well. Put another way, “ambiguity” is, by definition, “abnormal.”
So, in effect, we are asking the team member to let us know when anything is not clearly consistent with the defined norms.
The first response to an andon call is to clear the problem. If the “problem” is lack of clarity then deal with that. Replace uncertainty with clarity. Set some kind of an hard criteria for what does, and does not, need your attention. This means take management responsibility for the fact that the problem exists, and you aren’t going to do anything about it right now.
The next step is critical: If you can’t solve it right now, contain it.
“Containing the problem” means establish a temporary standard of some kind. Some kind of action that allows the team member to resume safe, defect free work. You might have introduced some inefficiency into the process, but safety and quality take priority.
And here is the dangerous part. You have the problem more or less under control. You can easily walk away and move on to the next one.
But consider this: You have the tiger in a cage. You are in the cage with it. You have to keep feeding the tiger (with time, resources) so it doesn’t eat your process. The only way to make the tiger go away is to get to the root cause.
This temporary standard does, however, give you a measure of stability. You can organize your problem solving efforts and focus on the ones that are the most critical to you. Meanwhile, however, you are burning resources feeding all of those tigers.
Typically a temporary countermeasure (problem containment) is some adjustment to the process. You have set a new standard work sequence that includes the steps required to keep this problem from affecting customers or escaping downline.
Yes, it is a work-around. But it is one you developed deliberately for a specific reason, until you can get to clearing that issue for good.
As you continue to identify problems and at least get them contained in some way, continue to refine the things you want to call attention to.
First, be explicitly clear on what things must trigger an andon call. These are the things you really want to know about when they happen. For sure it should be any safety issue and any issue that threatens quality. It could be an issue you are currently focused on resolving, such as late parts delivery, an upstream quality issue, a piece of unreliable equipment.
Then establish the time trigger. To do this you need to have three things pretty clear in your mind.
- A good idea of how long the process is supposed to take.
- A method for the team member to know when he is behind, and how much.
- A standard for how much delay you are willing to tolerate. Put another way, how long to are you willing to let the team member get behind before he tells someone? My suggestion here is no more time than you can help him catch up. If he gets further behind than that, you are going to pass the problem to another part of the process downstream in the form of a late delivery.
Now you have some simple rules.
Please try to perform the standard work so we can see any problems with it.
- If any unsafe condition exists, stop and pull the andon. Wait until we can clear the hazard.
- Do not knowingly ship bad quality to the next process. Pull the andon so we can come, assess, and decide how we are going to deal with that.
- If you have this problem, this problem, or this problem, stop and pull the andon so we can come and clear the problem as well as understand it as soon as it happens.
- If you accumulate any delays longer than xx minutes, pull the andon.
This puts you in control. You get to decide how much excess capacity (how many extra people) to pad delays. You get to decide what problems trigger a call. You get to decide what you can handle.
All I ask is:
- Do not tolerate unsafe conditions. Always stop the process and always initiate a call.
- Do not tolerate a process that routinely passes bad quality down stream. Always initiate a call. Don’t put the team member in a position where he has to judge what “good enough” is. Have a hard standard and stick to it.
- Always thank the team member for bringing problems to your attention. Never discourage an andon call.
- Never allow an andon call to go unanswered. Set a response time standard, measure it, and apply the same problem solving principles to that.
The other thing I would suggest is a system to manage problem solving. There are some suggestions in this post on morning markets.
The key point is that any problem you decide not to work on has to have some kind of temporary countermeasure incorporated into your expectations. If you do something that adds time, you must allow time for it to be done. Doing otherwise is introducing overburden – or to Bryan’s point, shifting the overburden from your problem solving back to the team member.
If you pay attention to what is really happening, and take management responsibility for all of the problems that the team members encounter, then (and only then) can you set rules about which ones you will, and will not, work on right now.
The hardest part of all of this? It is the “taking management responsibility” part. Getting an effective andon call process into place requires as much (actually more) process discipline in the leader’s ranks as it does on the shop floor. This is discipline not to panic, not to wish problems away, and to respond as though the team member is doing you a favor for calling out a problem vs. causing it.
An andon call process is a vital step toward truly engaging the team members. And it begins the shift from intermittent improvement to continuous improvement.
By setting up your rules for andon pulling, you are basically setting up a continuous pareto. The problem with using a pareto to solve problems is that it is a batch and queue operation. You have to build up a batch of information before you can start solving the problems. This means all of the problems are old. By setting up standard rules for your andon that basically say, “when the problem is this big, do this”, you get to work on the biggest problems.
You CAN respond to every problem without causing muri because you get to define what a problem is. Even Toyota does not respond to EVERY problem. They just happen to be responding to problems that seem a lot smaller then the ones that we are resonding to. My factories andon will go off when someone is 30 minutes behind while Toyota’s will go off when they are 30 seconds behind. They are not choosing to respond to the problem that made them 3 seconds behind however…
I have found this to be a very effective way of setting inventory levels. Ask the team that will respond to problems how many andon calls they can physically respond to and solve to root cause in a week. Then set your abnormal inventory levels to hit that number of times based upon historic fluctuations. Then keep lowering waters as fluctuation decreases.
Great insights guys. Sometimes in striving for perfection I get lost from what is normal for the current state.
Thanks again, Bryan