Escalation – The Lean Thinker

September 13, 2012January 9, 2019

High-Speed Automation

While we lean practitioners seem to have earned a reputation of distain for high-speed automation, industries like mass production consumables, and the food and beverage industry, would not be viable without that approach.

These plants are capital intensive, and the main focus of the people is to keep the equipment running. I hinted at some of these things a couple of years ago from the Czech Republic.

Here are some more recent thoughts.

Even though it is about equipment, it is still about people.

This is not a paradox at all. People are the ones who are getting cranky equipment to run, scurrying about clearing jams, clearing product that got mangled. Until you have a “lights out” plant, people are critical to keeping things running.

Robust problem solving and improvement skills are more critical.

In a purely manual world, you can get away with burying issues under more people and more inventory.

With interconnected automated equipment, not so much. The hardware has to run. It all has to run or there is no output.

How the organization responds to a technical problem makes the difference between quickly clearing the issue, or struggling with it for a couple of hours while everything else backs up.

This is where standard processes are critical, not only to short-term success, but also to capture new information as it is learned. This is the “chatter as signal” issue I have written about a couple of times.

Quoting from the above link:

Most organizations accept that they cannot possibly think of everything, that some degree of chatter is going to occur, and that people on the spot are paid to deal with it. That is, after all, their job. And the ones that are good at dealing with it are usually the ones who are spotlighted as the star performers.

The underlying assumptions here are:

Our processes and systems are complex.

We can’t possibly think of and plan for anything that might go wrong.

It is not realistic to expect perfection.

“Chatter is noise” and an inevitable part of the way things are in our business.

Those underlying assumptions say “Our equipment is complicated and difficult to get adjusted. All we can do is try stuff until it runs.”

That assumption actually lets people off the hook of actually understanding the nuances of the equipment; as well as letting them off the hook of a disciplined approach to troubleshooting. The assumption essentially says “We can’t do anything about it.”

A dark side of this designed ignorance is that the only thing leaders are really able to do is hover about and apply psychological pressure to “do something” or, at best, contribute to the noise of “things to try.”

Neither of those is particularly helpful for an operator who is trying to get the machine running. Both of those actually have a built-in implication that the operator (1) Does not know how to do his job or (2) Is somehow withholding his expertise from the situation.

But we get a different result from the alternative assumptions:

On the other hand, the organizations that are pulling further and further ahead take a different view.

Their underlying assumptions start out the same, then take a significant turn.

Our processes and systems are complex.

We can’t possibly think of and plan for anything that might go wrong.

But we believe perfection is possible.

“Chatter is signal” and it tells us where we need to address something we missed.

What does this look like in practice?

A known starting condition for all settings, that is verified.

A fixed troubleshooting checklist for common problems (that starts with “Verify the correct initial settings).

What things should be verified, in what sequence? (Understand the dependencies).

If a check reveals an issue, what immediate corrective action should be taken?

I would also strongly recommend using the format of a Job Breakdown (from TWI Job Instruction) for all of this. It is much easier to teach, but more importantly, it really forces you to think things through.

Of course, the checklist is unlikely to cover everything, at least at first. But it does establish a common baseline, and documents the limit of your knowledge.

The end of the operator checklist then defines the escalation point – when the operator must involve the next level of help.

It takes robust problem solving skills (and willingness to take the time to use them) to develop these processes; but doing so can save a mountain of time that pays back many times over.

The alternative is taking the time to mess with things until it sort of works, and never really understanding what was done or what had an effect – every single time there is an issue.

Cry once, or cry every day.

What does this have to do with improvement?

The obvious answer is that, if done well, it will save time.

The more subtle effect is that it sharpens the organization’s knowledge base, as well as their ability to really understand the nuances of the equipment. But this must be done on purpose. It isn’t going to happen on its own.

By getting things up and running sooner; and reducing the time of stoppages; it increases equipment capacity.

But more importantly, all of this increases people capacity.

It gives people time to think about the next level of problems rather than being constantly focused on simply surviving the workday. Of course you need the right organizational and leadership structure to support that.

September 14, 2009January 9, 2019

If you want to go faster, stop.

Mark’s post on The Whiteboard tells a pretty common story. The good news is that this company has more business than they can handle. Pretty good results in these times. The bad news is that they are having problems ramping up production to meet the demand. In Mark’s words:

I’m working for a company that is very, very busy. They developed a new process that is the first of it’s kind and have taken the market share away from their competition. But they have not spent enough time making the process robust enough to handle the increase demand and the scrap costs are going out of the roof. Currently about 65K a day. Any suggestions? Our number 1 scrap producer is a machine that can not perform at the same capability as when Engineering did their run off…

At the risk of coming across as flip, the very first thing to do if a machine starts producing scrap material is to shut it down. It is better to make nothing because that is a cheaper alternative than making stuff you can’t use.

However, it goes deeper than that.

Engineering had done a “run off” (which I presume was a test on theoretical speeds). Now actual performance isn’t meeting expectation. This is a problem.

But let’s rewind a bit and talk about how to manage a production ramp-up. Hopefully it is a problem more people will be having as the economy begins to recover.

Although this is in the context of the machine, exactly the same principles apply to any type of production. Only the context and the constraint changes.

Presumably there was some speed for this machine where it didn’t produce scrap, or the scrap was minimal. Going back to that time, here is what should have happened.

Promise production at the rate the machine is known to support.

Now crank up the speed a bit and see what happens. In the best case, you are overproducing a bit, but you are learning what the machine is actually capable of doing.

Crank it up a little more. Oops, scrap.

STOP!

Because you have been running a little faster than required, you have bought a little time. Understand why that scrap happened. Try to replicate it. Dig into the problem solving. Try to replicate the problem under controlled conditions. LEARN.

Hopefully you can find the cause and fix it.

Try it. Run the machine again, at the faster speed. Scrap? Back around to the “problem solving” cycle. Repeat until you can reliably run at the faster speed without scrap.

Then, and only then, promise the higher rate, because now you can reliably deliver it.

Then notch it up a bit until you encounter the next problem.

This cycle of promising only what you can actually deliver protects the customer while you are pushing the envelop internally to discover the next problem.

The alternative? Make a promise knowing you actually have no clue whether or not you can meet it.

But that’s what they did. So now they are burning a lot of money every day making scrap material.

The same principles apply, however. They are already not delivering what they promised. So throttle things back to the point where they can predict the results, and go from there. Pretending they can run faster than they can is not accomplishing anything other than burning money. Deal with facts, no matter how uncomfortable.

If you make a schedule based on what you wish you could do, you will have a schedule you wish you could meet.

No matter what, each time scrap is produced, the fact must be acknowledged. That allows the immediate response that is framed around a simple question:

“How the hell did this happen?”

Put another way, “What have we just learned about the limits of this process?”

It is only within that framework that you actually get any better. Anything else is relying on luck, and in this case at least, that didn’t work.

January 7, 2009January 9, 2019

Andon Leadership

On a world-class automobile assembly line, the actual work is continuously being compared to the planned work. In each work zone, there is a planned sequence of tasks which are expected to produce a specified output.

If there is any departure at all from the planned sequence, if things get behind the planned timeline, if the necessary conditions are not there, if any process step does not complete as required then either the Team Member or an automated system turns on a help call – an andon.

The response is immediate. The first response is within seconds with a priority of clearing the problem. The line itself is still moving, but if the problem is not cleared before the end of the takt time the line automatically stops – things go from “yellow” to “red.” When that happens, the responsibility for the problem also shifts up a level in the response chain.

The priority is still to clear the problem quickly – and clearing the problem means to restore conditions required for safety and quality without compromise.

Once the problem is cleared, and the line re-started, the rest of the problem solving process engages to find the cause of the problem and address it in the system itself. If the problem is outside of the bounds, or outside of the capability, of the original Team Leader to solve, then his chain-of-support will engage to help him solve it so that his skills can be improved.

In summary – we have a sequence of tasks to be accomplished over about six hours that, in the end, will result in a car. That sequence of tasks is further broken down into sub-intervals where progress against the plan is checked. If problems develop, the system responds, swarms the problem to clear it, understand it, and adjust the process if necessary.

None of the above should be a surprise.

But what happens in your management monthly reviews?

Huh? How are management monthly reviews related to an assembly line?

Let’s see. In a reasonably functional organization the business plan is (or should be!) a series of tasks to be accomplished over a period of time that, in the end, will deliver specified results. That sequence of tasks is further broken down into sub-intervals where progress of the plan is checked.

But in a typical monthly review, the analogy ends there. When problems develop, the reasons are discussed, mentioned, and worked around. In dysfunctional organizations only the objectives are discussed, and in truly dysfunctional organizations the objectives themselves are adjusted to match what is accomplished.

Shift your thinking a bit, and apply the assembly line analogy to its end-state.

The monthly review is the fixed-stop position. Up to this point, the person responsible for the task “owns” it. He should be checking progress and ensuring that things are getting done as specified, and that the results being achieved are as anticipated (predicted). He should be monitoring conditions and making sure that the operating assumptions are holding. Ideally those checks are built-in to the process and planning so that they are at least semi-automatic.

If all is “green” going into the monthly review, then things are great.

But if something is “off” – yellow – that is equivalent to an andon call. The responsible person is that first-responder. He has until the next review to clear the problem. Think about the ramifications here. This means that if the reviews with the boss are every month, the responsible manager better not wait until the week before to find out what is happening so he can report on it. He has to stay in touch with what is happening.

The monthly review is the fixed-stop-position at the review, and the “andon” goes red.

The reviewing manager now “owns” the problem. His job is now to ensure that there are effective countermeasures in place to get things back on track. That does not absolve the responsible manager, but rather, this becomes a “check” and an “act” for his professional development by the more senior leader.

Again, think of the ramifications. The responsible manager must not only be in touch with what is happening, he also needs to make sure his people are being developed and pushed to fully understand the problems as they occur.

The job of these leaders, at each level, is not only to keep intimately in touch with what is going on, but also be fully aware of the skills, gaps and development of their people. If someone should be able to handle an issue, but can’t, that is a skill gap that, like any other gap, must be addressed (in this case by developing the person).

In mediocre organizations, professional development is owned and overseen by H.R. and may be tied to the annual review process. In organizations that “get it” professional development is a natural part of the leadership process, and happens at all levels in the natural course of getting things done.

November 24, 2007January 9, 2019

A Systematic Approach to Part Shortages – Part 2

For kanban to work well, there has to be a solid foundation under it. That foundation is production leveling or heijunka.

Before I get to far into this, though, I would like to point something out: At the mention of leveling, people who are only just learning about kanban will point out all of the good reasons why leveling is difficult. Here is a key point: The problems caused by running kanban without good leveling pale in comparison to the total chaos that ensues if you try to run MRP without leveling. I’ll stay out of that little rabbit hole until another day though.

Production leveling has two parts.

Leveling the production volume.
Leveling the production mix.

The operation I described in Part 1 was relatively small, so it was a simple matter to set up a totally manual system to do this. By small I mean they had two major assembly lines running at a rates on the order of 10 units / day. The product was about the size and complexity of a medium to large-sized photocopier (though not a photocopier). The assembly lines had about half a dozen positions each. There were several hundred parts from about as many suppliers. (Different story.)

The objective in leveling volume is for the production line to see demand as an image of the takt time, and to protect that signal from variation in actual orders and shipping. At the same time, the shipping dock was to see deliveries to the finished goods buffer at takt time, regardless of minor and medium problems in production.

To accomplish this they separated the “big lump” of inventory that typically existed in shipping into two physically separate buffers.

The Withdrawal Loop

Customers, unfortunately, rarely order at takt time. The purpose of the buffer in shipping was to absorb this variation and make the actual demand appear as if it arrived exactly at takt. The organization also tried to take out some of the bigger spikes in customer orders by working with dealers to get more transparency into actual customer order patterns; as well as trying to level actual promise-to-ship dates at least weekly if they couldn’t get it to daily. That helped a lot. A more sophisticated order entry system would have worked better, but that luxury wasn’t in place yet.

Back to the buffers. Each unit in shipping had a withdrawal kanban card attached to it. As orders were released, a unit would be pulled from this buffer and shipped. The withdrawal card went back to the production control department. Those cards were placed in the inventory management box. This box had series of slots that indicated authorized inventory levels. A card in one of the slots indicated inventory we didn’t have, an empty slot indicated inventory on-hand.

There were limit markers at near each end of the row of slots. As long a the end of the row of cards stayed between those limit markers, everything was regarded as OK. They did not try to chase a particular level of inventory with production.

The scheduled production rate was 10 units / day.

Each morning Production Control would take 10 cards from their box and put them into the leveling box in shipping. That box had slots that corresponded to times of day. The cards were evenly distributed at the takt-time interval. As that time came up, shipping would take the withdrawal card from the box, go to the end of the production line, attach their card to a unit, and move it to the shipping buffer.

This seemed like a lot of trouble, but it served a purpose. It was to hide the irregularities of shipping schedules and actual order dates from assembly. They saw a clean, paced signal exactly at takt time. The process was designed so that assembly saw a perfect customer, even if the customers were far from perfect.

If management didn’t like the size of the shipping buffer, they knew exactly what problem(s) must be solved to reduce it – they needed to improve the dealer ordering and management processes so dealers would stop using deep reorder points and ordering weeks worth of product at once.

The Production Loop

When units were withdrawn from the end of the line, they were actual pulled from a FIFO buffer. In this case, the buffer held about 4 hours of production. Why? Most problems in production were cleared within that time. Only a bigger problem would starve the buffer and affect the withdrawal loop. Thus the purpose of this buffer was to make assembly appear as a perfect supplier to their perfect customer. They could supply exactly at the agreed-upon takt time.

Each of these units had a production kanban card attached to it. When shipping came to pull a unit, they would pull the production card and leave it in a kanban post. They would attach their withdrawal card and take the unit. Thus switching the cards transfers ownership of the product from one loop to the next. Since a kanban card authorizes a specific quantity to be in a specific location, if someone wants to take something somewhere else they need to attach a card authorizing them to do so. That was the case here.

The production cards went to the front of the assembly line. There were three slots there. One green, one yellow, one red. If everything was running smoothly, the card would go into the green slot, and when the next unit was started, the card would be pulled from the box and attached to the unit.

If the line were a little bit behind, there might still be a card in the green slot. Then the next card would go into the yellow slot. This would automatically signal the assembly manager that there was something that needed some attention.

The next card would end up in the red slot. This was the point when, if they weren’t already there for a known problem, they were in “line stop” mode. Anyone who could be helping to clear the problem should be helping to clear the problem. Why? The money machine has stopped running. Everyone is now being paid only because the shareholders are lending them money. The idea is to get the money machine running as quickly as possible, and it is the most important thing. This was a simple phased escalation process, and was part of their overall andon / escalation system.

Did it work?

All I can say is that it worked a hell of a lot better than what they were doing before. It took two or three serious tries to get this into place and keep it working, and they probably fell off the wagon a couple of times after that. There were always immense pressures to “reduce inventory” at the end of the quarter, for example, which would have management directing to starve out the shipping buffer, or push it out early. But, in general, when it was working, overtime was lower, things were more predictable, problems were identified very quickly.

But…

Yes, it looks like a lot of manual work involved. But I want to be really clear – the total time spent moving all of these cards around was a fraction of the time that had previously been spent investigating status, working action messages, making calls to find out what was happening, etc, etc. For some reason people seem to think that deliberate activities raise the total amount of labor involved, and that somehow, the time spent running after information and chasing problems is free.

Setting a standard and following it injects an element of stability and calm into an otherwise chaotic workplace. Once this basic foundation is in place it is far easier to improve overall efficiency because now there is an actual process to improve.