Andon – The Lean Thinker

August 13, 2010January 9, 2019

“We CAN’T Just Stop The Line!”

I suppose, at some level, that makes emotional sense. After all, the idea is to keep production moving.

But the logical follow-on question is: “OK, so if the team member encounters aproblem that is going to force her to work around things, to do the work in a way that wasn’t planned, what do you want her to do?“

August 11, 2010January 9, 2019

5S Audits – Part III

I would like to thank everybody for a really engaging dialog in the previous two posts about 5S audits.

Now I would like to dig in and look at what an “audit” is actually finding, and how we are responding to those issues.

Our hypothetical production area is getting an audit. The checklist says things like “There are no unnecessary items in the work area” and “There is a location indicated for all items.”

If there are unnecessary things in the work area, or things are not in their designated locations, what happens?

Of course, the checklist is filled out and a score is assigned.

But what has been learned about the process?

In one of the comments, I asked something like “When was the problem first noticed?”

The core purpose of 5S is to establish a testable condition that asks the question: “Does the team member have everything he needs, and nothing he doesn’t, where he needs it, when he needs it, to carry out his process as we understand it?”

One of the primary purposes of marking out the locations is to indicate the standard so that someone can notice right away that the standard has been broken. What should happen right then and there?

Since we define a “problem” as “any departure from the standard or specification,” and we have taken the first step of removing ambiguity from the situation (by deciding what should be here, and marking it out), we want an immediate response to the problem.

Ideally this means that the team member would indicate trouble (andon call, or other means) as soon as he discovered that his air gun was missing, or didn’t work.

The back-up to this is the team leader’s standard work. His eyes should be scanning for situations where there is a problem that the team member hasn’t called out. This is why the standards are marked out, posted, etc. To make this job easy for him. His immediate response would be to (1) Seek to understand the situation – what pulled the team member off his standard work, where did the problem originate, (2) Correct the situation. Sometimes that’s it. Other times, there is another problem to dig into.

It could be that something about the work process or conditions has changed and the team member is improvising a bit. That would bring extra stuff into the area, for example. I recall a great example where we pulled all of the thread cutting tools out of assembly so we could better detect when assembly was getting defective fabricated parts. It worked by forcing the process to stop and an andon call since assembly could not proceed if the threads were not cut.

At the same time, if a thread cutter found its way back into the assembly area, we would know we had two problems. First, we had defective parts. But more important, the process of telling us about that problem had been bypassed.

The back-up to the team leader’s standard work is the supervisor’s standard work. She is looking two levels down, but her response is going to be different. Unless safety or quality is jeopardized, the supervisor is going to find the team leader and (1) Seek to understand what pulled the team leader off of his standard work, and (2) correct the situation.

If the next level up is spending any time at all out on the shop floor, it is the same thing – maybe once a day – seeking out verifiable evidence that things are working as they should be. In the lack of positive evidence of control, we must assume there are hidden problems.

Now, if the audit finds something like this (click on the image for a bigger one):

Then it isn’t about the tape being out of place, nor is it a question about where the screwdriver is. What we have discovered is that none of the checks have been made, or if they have, no one has done anything about them.

Someone said “If we don’t do audits then 5S deteriorates.” OK – but why does 5S deteriorate?

Simply put, it is going to deteriorate, just as your process does, a little bit every day. Disorder is always being injected into everything. Your process will never, ever be stable on its own. No matter how good you are, the next level of granularity will show up as deterioration.

This is the “chatter” that Steven Spear talks about.

The question comes down to your core intention for the audit.

If you are assessing how well the area manager is coaching and teaching his people to see and respond to problems so that you can establish a target condition for his learning, and then develop his capabilities accordingly… there are better ways (in my opinion) to do that.

If you are assigning a numeric score in the hope that, by measuring something you can influence behavior, it might work, but people can come up with ingeniously destructive ways to achieve the numeric goals. As a thought experiment – how might an area manager get a high score on his 5S audit in ways that run completely counter to the goals of 5S, people development or “lean?”

The bottom line is that “Audit 5S” is not something that you should accept as a given. Rather, it is a proposed countermeasure to some problem. But if you start with a clear problem statement like “Team members are bringing thread taps into the assembly line,” and start asking “Why” five times, get to a root cause to that problem, you are unlikely to arrive at a monthly or periodic 5S audit as a countermeasure – nor are you ever going to need one.

The problem?

I think we feel the need to do audits because we have no process to immediately detect, correct and solve the little problems that happen every day. These little issues are the ones that cause the 5S erosion. Because we don’t have a process to deal with them one-by-one, we have to have an elaborate process that disrupts our normal work flow and takes them on in big batches.

Does that sound like a “lean” process to you?

How might we relentlessly drive the “audit” process closer to the ideal of one-by-one confirmation?

That would be “lean thinking.”

January 20, 2010January 20, 2025

Overburdened with Andon Calls

Bryan Zeigler has a great post on his “Lean is Good” blog site. Titled “Andon Calls and Muri,” he describes Toyota’s phenomenal capacity for responding to problems, and then takes us back to where the rest of us are with some really great questions:

If it is physically impossible to answer every andon call in order to work on every problem, is it best to fix the first one that comes sequentially? Then do work arounds and rework until we can respond to another one?

I have always used systems to prioritize what problems we work on whether it be pareto charts, value stream maps, or just plain standing in the circle. Once directions, or as Toyota Kata describes them, process target conditions, are established and the highest priority items are “fixed” and then we move on the the next most important challenge.

Working on all problems in the process would overburden the organization’s problem solvers. This would be a form of dreaded muri right? I’ve read and heard much about the Toyota staffing levels required to operate TPS effectively. Most range from 5 to 7 employees under each level of leadership position. Again, my experiences are more like 30 to 40 employees under a 1st line leader.

Two questions:

What percentage of daily problems are organizations that you work in staffed to handle?

What philosophies do you utilize to ensure you don’t introduce Muri to the problem solving teammates in the organization?

Great observation, and great questions. And Bryan is certainly not the only one who has had the insight to ask them.

At this point, I have to issue a bit of a disclaimer. I have spent a full day on the shop floor in Bryan’s workplace. I can visualize exactly what he is talking about. Unfortunately schedules didn’t allow me to meet him, but I have a pretty good sense of the situation he is dealing with.

Since pretty much everyone has these issues when they start to contemplate andon calls, let’s start by reviewing the theoretical base, then moving into reality.

The core principle of jidoka is “Stop and respond to every abnormality.” That is what we are trying to do here. This means there must be a clear definition of what is “normal” and what is “abnormal.”

In the strictest sense, if it isn’t clearly defined as “normal” it is ambiguous.

In a system where the most basic fundamental is to define processes, ambiguity is a problem as well. Put another way, “ambiguity” is, by definition, “abnormal.”

So, in effect, we are asking the team member to let us know when anything is not clearly consistent with the defined norms.

The first response to an andon call is to clear the problem. If the “problem” is lack of clarity then deal with that. Replace uncertainty with clarity. Set some kind of an hard criteria for what does, and does not, need your attention. This means take management responsibility for the fact that the problem exists, and you aren’t going to do anything about it right now.

The next step is critical: If you can’t solve it right now, contain it.

“Containing the problem” means establish a temporary standard of some kind. Some kind of action that allows the team member to resume safe, defect free work. You might have introduced some inefficiency into the process, but safety and quality take priority.

And here is the dangerous part. You have the problem more or less under control. You can easily walk away and move on to the next one.

But consider this: You have the tiger in a cage. You are in the cage with it. You have to keep feeding the tiger (with time, resources) so it doesn’t eat your process. The only way to make the tiger go away is to get to the root cause.

This temporary standard does, however, give you a measure of stability. You can organize your problem solving efforts and focus on the ones that are the most critical to you. Meanwhile, however, you are burning resources feeding all of those tigers.

Typically a temporary countermeasure (problem containment) is some adjustment to the process. You have set a new standard work sequence that includes the steps required to keep this problem from affecting customers or escaping downline.

Yes, it is a work-around. But it is one you developed deliberately for a specific reason, until you can get to clearing that issue for good.

As you continue to identify problems and at least get them contained in some way, continue to refine the things you want to call attention to.

First, be explicitly clear on what things must trigger an andon call. These are the things you really want to know about when they happen. For sure it should be any safety issue and any issue that threatens quality. It could be an issue you are currently focused on resolving, such as late parts delivery, an upstream quality issue, a piece of unreliable equipment.

Then establish the time trigger. To do this you need to have three things pretty clear in your mind.

A good idea of how long the process is supposed to take.
A method for the team member to know when he is behind, and how much.
A standard for how much delay you are willing to tolerate. Put another way, how long to are you willing to let the team member get behind before he tells someone? My suggestion here is no more time than you can help him catch up. If he gets further behind than that, you are going to pass the problem to another part of the process downstream in the form of a late delivery.

Now you have some simple rules.

Please try to perform the standard work so we can see any problems with it.

If any unsafe condition exists, stop and pull the andon. Wait until we can clear the hazard.
Do not knowingly ship bad quality to the next process. Pull the andon so we can come, assess, and decide how we are going to deal with that.
If you have this problem, this problem, or this problem, stop and pull the andon so we can come and clear the problem as well as understand it as soon as it happens.
If you accumulate any delays longer than xx minutes, pull the andon.

This puts you in control. You get to decide how much excess capacity (how many extra people) to pad delays. You get to decide what problems trigger a call. You get to decide what you can handle.

All I ask is:

Do not tolerate unsafe conditions. Always stop the process and always initiate a call.
Do not tolerate a process that routinely passes bad quality down stream. Always initiate a call. Don’t put the team member in a position where he has to judge what “good enough” is. Have a hard standard and stick to it.
Always thank the team member for bringing problems to your attention. Never discourage an andon call.
Never allow an andon call to go unanswered. Set a response time standard, measure it, and apply the same problem solving principles to that.

The other thing I would suggest is a system to manage problem solving. There are some suggestions in this post on morning markets.

The key point is that any problem you decide not to work on has to have some kind of temporary countermeasure incorporated into your expectations. If you do something that adds time, you must allow time for it to be done. Doing otherwise is introducing overburden – or to Bryan’s point, shifting the overburden from your problem solving back to the team member.

If you pay attention to what is really happening, and take management responsibility for all of the problems that the team members encounter, then (and only then) can you set rules about which ones you will, and will not, work on right now.

The hardest part of all of this? It is the “taking management responsibility” part. Getting an effective andon call process into place requires as much (actually more) process discipline in the leader’s ranks as it does on the shop floor. This is discipline not to panic, not to wish problems away, and to respond as though the team member is doing you a favor for calling out a problem vs. causing it.

An andon call process is a vital step toward truly engaging the team members. And it begins the shift from intermittent improvement to continuous improvement.

June 15, 2009January 9, 2019

Get Specific

A couple of days ago I had an interesting session with an improvement team in a fairly large company. They have been working on this for almost 10 years, and believe that while they have made some spot progress, they are clear that they have spent a lot of money but not yet established what they call a “lean culture.” Their implied question was “How do we get there?”

My question was “When you say ‘a lean culture,’ exactly what are you thinking about?”

What do people do? How do they behave?

“People find and eliminate waste every day.”

OK, so if they were doing that, what would you see if you watched?

There was a bit of a struggle to articulate an answer.

I see this all of the time. We rely on the jargon or general statements to define the objective, without really digging down the next couple of layers and getting clear with ourselves about what the jargon means to us. This is especially the case when we are talking about the people side of the system.

But the people are the system. They are the ones who are in there every single day making it all happen. It is people who do all of the thinking.

Consider these steps:

Define Value.
Map your value streams.
Establish flow.
Pull the value through the value stream.
Seek perfection.

This is the implementation sequence from Lean Thinking by Womack, Jones and Roos, that has been the guideline for a generation+ of practitioners.

Learning to See taught that generation (and is teaching this one) to establish a current-state map of the value stream, and then a design the future state to implement as flow is established. The follow-on workbooks focused on establishing flow and pull, and did it very well.

While not the only way to go about this, it does work for most processes to establish flow in materials and information.

But what do people do every day to drive continuous improvement, and how are those efforts organized, harnessed, and captured to put the results where they can truly benefit customers and the business?

Here are some things to think about.

What exactly is the target condition for your organization? Can you describe what it will look like? Can you describe it in terms of what people experience, and do, every day?

When your people go home to their families and share what they did at work today, what will they talk about? And I don’t just mean the engineers and managers. What will the front-line value-creating people remember from the workday?

How will they talk about problems?

If your target future state now includes changes in how people work, ask yourself more questions.

When, exactly, are they going to do these things you described? By “when” I mean what time, starting when, ending when.

What, exactly, do they do when they encounter a problem during production?

How, exactly, do you expect the organization to respond to that problem? Who, exactly, is responsible to work through the issue and get things back on track? How long do they have to do it? If the problem is outside their scope, what is supposed to happen? How, exactly, does additional support get involved?

If these new activities involve new skills, when and how, exactly, are people supposed to learn them and practice them to get better at it? Who is supposed to teach them, when, where, and how? How will you verify that the new skills are being used, and are having the effect you intend?

“If we do this, what will happen?”

And then what? And then what?

Think it through.

The “people” future state is far more important than future state of the material and information flow.

December 3, 2007January 9, 2019

Really Long Takt Times

One question I see coming up a lot in various forums is how to deal with issues unique to very long takt times. By “very long” I usually hear about many hours, sometimes days, occasionally weeks. Because it comes up fairly often, I thought I would take a shot at addressing it here.

I think the biggest hurdle for people to get over is the issues are largely the same as shorter takt times. They are just harder to see because the work starts to lose a human time scale. The trick is to get it back onto a time scale that people can relate to.

By this I mean that a person, generally, loses a sense of how long something is taking once it goes beyond a dozen minutes or so. In contrast, the stereotypical automobile line has a takt of about 60 seconds. Once an auto assembly worker loses 3 or 4 seconds of time, there is really no way she will be able to complete the programmed work cycle without help or stopping the line for at least a few seconds.

As work cycles get longer, though, the work remaining until “done” gets more and more disassociated from “now” and the idea of the necessity to maintain a particular work pace becomes abstract. This is less of a technical issue than one of human psychology. People, in general. tend to believe they can finish something in time long after that is no longer true. (Ask any college freshman working on a term paper.)

The countermeasure is the same as a manager would apply to any long project: milestones.

When the takt times are relatively short, the “milestones” are the takt intervals themselves. Each takt time signals a stage of work that must be complete. If this is not true, the line will (should) be stopped at that point. (Remember – “Never pass along a defect” and this includes incomplete work.) The problem will be corrected, and the cause understood. Oh – actually this is not quite true. A Toyota assembly line has the work zone divided into 10 sub-intervals, and the worker has a good idea what work should be completed at what point.

However, since most of us are likely just beginning – If your takt time is longer than a couple of dozen minutes, then, define the work in stages. In one operation I suggested the following:

Take about 85% of your long takt time, and divide that into quarters. Define what job should normally be complete by the time each of those check points comes up. As an example – if the takt time was 100 minutes, then determine the expected work completion at 20, 45, 65 and 85 minutes. Give the Team Member a way to know where he is at that point vs. the expectation, and a way to call for help if he is off by even a little bit. He should also call for help before that point if he is disrupted by something that he knows will cause a delay.

This is just a starting point to start to stabilize the system and build your support structure. If you reach the point where things are running smoothly at this level of granularity, then cut those time intervals in half.

At each point you will find more problems. The problems are likely to be smaller, but there will be more of them. All of those problems are sources of friction, and therefore wasted motion and time, on your system.

BUT – before you start down this road, have a few things in place first.

Establish credibility for the concept that you are genuinely doing this to see problems that are making the worker’s jobs difficult. If you use it, just once, to initiate a negative consequence for “not working fast enough” then forget it.
Actually work the problems. This means work them to eliminate the causes. Put in a process for managing the problems, make it visible so that the people working can see you are working on them. Again, this is to maintain credibility. If problems get recorded and sunk into a black hole (like a database in a computer somewhere), then you are not assuring the people on the line that you actually care.
Build your immediate responses (escalation) system. This mean team leaders (first responders) who can, and do, respond to help calls quickly. The only thing worse than having no way to call for help is to call and have no one respond. Again, the system loses credibility after about the third andon pull with no response.
Don’t worry too much about every detail within the work interval. The important thing, at first is to make sure that the same things get done within that interval. Detailed sequence standardization will come in time.

Summary: The key to managing really long takt times is to break the work into time-based intervals, and manage to those, rather than the entire work cycle itself.