Chatter as Signal

As I promised, I am going to continue to over-play the afternoon my team spent with Steven Spear.

In his forthcoming book “Chasing the Rabbit” (to be published in the fall), he profiles what is different about those companies which seem to easily be increasing their lead against competitors when there is no apparent external advantage.

One of the core concepts he discussed was the nature of complexity in organizations, processes and products. It is the way this complexity is managed and handled that distinguishes the leaders from the pack of competitors that are fighting and jostling for second place.

In a complex system, there are invariably things people miss. Something is not defined, is ambiguous, or just plain wrong. These little things cause imperfection in the way people do things. They encounter these unexpected issues, and have to resolve them to get the job done.

This is “chatter” in Spear’s words. The sound made when imperfect parts try to mesh together.

Most organizations accept that they cannot possibly think of everything, that some degree of chatter is going to occur, and that people on the spot are paid to deal with it. That is, after all, their job. And the ones that are good at dealing with it are usually the ones who are spotlighted as the star performers.

The underlying assumptions here are:

  • Our processes and systems are complex.
  • We can’t possibly think of and plan for anything that might go wrong.
  • It is not realistic to expect perfection.
  • “Chatter is noise” and an inevitable part of the way things are in our business.

On the other hand, the organizations that are pulling further and further ahead take a different view.
Their underlying assumptions start out the same, then take a significant turn.

  • Our processes and systems are complex.
  • We can’t possibly think of and plan for anything that might go wrong.
  • But we believe perfection is possible.
  • Chatter is signal” and it tells us where we need to address something we missed.

We have all heard about Toyota’s jidoka and andon processes, so let me bring out another example, again, that was used by Spear.

The U.S. Navy has been operating nuclear reactors with a 100% (reactor) safety record for nearly (over?) 50 years. And they operate a lot of nuclear reactors. When they started, they were in totally new and unfamiliar territory – they were doing things that had never been done before. In fact, no one was even sure if it was possible.

They asked the question: How should this nuclear reactor be operated? They answered it with a set of incredibly specific procedures which everyone was expected to follow – exactly, without deviation in any way. These procedures represent the body of experience and knowledge of the U.S. Navy for operating nuclear reactors.

Here is the key point: ANYONE who departs from the procedure, in any way, no matter how trivial or minor, must report “an incident” which rockets up the chain of command. The reasons for the departure are understood. If there was something outside the scope of the procedure, the new procedure covers it. If something was unclear, it is clarified.

This may not be the Toyota Production System at work, but it is a version of something that makes it work: Jidoka.

If the process is not working, can not work, or conditions are not exactly as specified for the process to succeed, then STOP the process, understand the condition, correct it, restore the system to safe, quality operation, and address the reason it was necessary to do this.

Chatter is signal.

So – at a Toyota assembly line in Japan some years ago, I observed a Team Member drop a bolt. He pulled the andon cord and signaled a problem.

Really Long Takt Times

One question I see coming up a lot in various forums is how to deal with issues unique to very long takt times. By “very long” I usually hear about many hours, sometimes days, occasionally weeks. Because it comes up fairly often, I thought I would take a shot at addressing it here.

I think the biggest hurdle for people to get over is the issues are largely the same as shorter takt times. They are just harder to see because the work starts to lose a human time scale. The trick is to get it back onto a time scale that people can relate to.

By this I mean that a person, generally, loses a sense of how long something is taking once it goes beyond a dozen minutes or so. In contrast, the stereotypical automobile line has a takt of about 60 seconds. Once an auto assembly worker loses 3 or 4 seconds of time, there is really no way she will be able to complete the programmed work cycle without help or stopping the line for at least a few seconds.

As work cycles get longer, though, the work remaining until “done” gets more and more disassociated from “now” and the idea of the necessity to maintain a particular work pace becomes abstract. This is less of a technical issue than one of human psychology. People, in general. tend to believe they can finish something in time long after that is no longer true. (Ask any college freshman working on a term paper.)

The countermeasure is the same as a manager would apply to any long project: milestones.

When the takt times are relatively short, the “milestones” are the takt intervals themselves. Each takt time signals a stage of work that must be complete. If this is not true, the line will (should) be stopped at that point. (Remember – “Never pass along a defect” and this includes incomplete work.) The problem will be corrected, and the cause understood. Oh – actually this is not quite true. A Toyota assembly line has the work zone divided into 10 sub-intervals, and the worker has a good idea what work should be completed at what point.

However, since most of us are likely just beginning – If your takt time is longer than a couple of dozen minutes, then, define the work in stages. In one operation I suggested the following:

Take about 85% of your long takt time, and divide that into quarters. Define what job should normally be complete by the time each of those check points comes up. As an example – if the takt time was 100 minutes, then determine the expected work completion at 20, 45, 65 and 85 minutes. Give the Team Member a way to know where he is at that point vs. the expectation, and a way to call for help if he is off by even a little bit. He should also call for help before that point if he is disrupted by something that he knows will cause a delay.

This is just a starting point to start to stabilize the system and build your support structure. If you reach the point where things are running smoothly at this level of granularity, then cut those time intervals in half.

At each point you will find more problems. The problems are likely to be smaller, but there will be more of them. All of those problems are sources of friction, and therefore wasted motion and time, on your system.

BUT – before you start down this road, have a few things in place first.

  • Establish credibility for the concept that you are genuinely doing this to see problems that are making the worker’s jobs difficult. If you use it, just once, to initiate a negative consequence for “not working fast enough” then forget it.
  • Actually work the problems. This means work them to eliminate the causes. Put in a process for managing the problems, make it visible so that the people working can see you are working on them. Again, this is to maintain credibility. If problems get recorded and sunk into a black hole (like a database in a computer somewhere), then you are not assuring the people on the line that you actually care.
  • Build your immediate responses (escalation) system. This mean team leaders (first responders) who can, and do, respond to help calls quickly. The only thing worse than having no way to call for help is to call and have no one respond. Again, the system loses credibility after about the third andon pull with no response.
  • Don’t worry too much about every detail within the work interval. The important thing, at first is to make sure that the same things get done within that interval. Detailed sequence standardization will come in time.

Summary: The key to managing really long takt times is to break the work into time-based intervals, and manage to those, rather than the entire work cycle itself.

A Systematic Approach to Part Shortages – Part 2

For kanban to work well, there has to be a solid foundation under it. That foundation is production leveling or heijunka.

Before I get to far into this, though, I would like to point something out: At the mention of leveling, people who are only just learning about kanban will point out all of the good reasons why leveling is difficult. Here is a key point: The problems caused by running kanban without good leveling pale in comparison to the total chaos that ensues if you try to run MRP without leveling. I’ll stay out of that little rabbit hole until another day though.

Production leveling has two parts.

  1. Leveling the production volume.
  2. Leveling the production mix.

The operation I described in Part 1 was relatively small, so it was a simple matter to set up a totally manual system to do this. By small I mean they had two major assembly lines running at a rates on the order of 10 units / day. The product was about the size and complexity of a medium to large-sized photocopier (though not a photocopier). The assembly lines had about half a dozen positions each. There were several hundred parts from about as many suppliers. (Different story.)

The objective in leveling volume is for the production line to see demand as an image of the takt time, and to protect that signal from variation in actual orders and shipping. At the same time, the shipping dock was to see deliveries to the finished goods buffer at takt time, regardless of minor and medium problems in production.

To accomplish this they separated the “big lump” of inventory that typically existed in shipping into two physically separate buffers.

The Withdrawal Loop

Customers, unfortunately, rarely order at takt time. The purpose of the buffer in shipping was to absorb this variation and make the actual demand appear as if it arrived exactly at takt. The organization also tried to take out some of the bigger spikes in customer orders by working with dealers to get more transparency into actual customer order patterns; as well as trying to level actual promise-to-ship dates at least weekly if they couldn’t get it to daily. That helped a lot. A more sophisticated order entry system would have worked better, but that luxury wasn’t in place yet.

Back to the buffers. Each unit in shipping had a withdrawal kanban card attached to it. As orders were released, a unit would be pulled from this buffer and shipped. The withdrawal card went back to the production control department. Those cards were placed in the inventory management box. This box had series of slots that indicated authorized inventory levels. A card in one of the slots indicated inventory we didn’t have, an empty slot indicated inventory on-hand.

There were limit markers at near each end of the row of slots. As long a the end of the row of cards stayed between those limit markers, everything was regarded as OK. They did not try to chase a particular level of inventory with production.

The scheduled production rate was 10 units / day.

Each morning Production Control would take 10 cards from their box and put them into the leveling box in shipping. That box had slots that corresponded to times of day. The cards were evenly distributed at the takt-time interval. As that time came up, shipping would take the withdrawal card from the box, go to the end of the production line, attach their card to a unit, and move it to the shipping buffer.

This seemed like a lot of trouble, but it served a purpose. It was to hide the irregularities of shipping schedules and actual order dates from assembly. They saw a clean, paced signal exactly at takt time. The process was designed so that assembly saw a perfect customer, even if the customers were far from perfect.

If management didn’t like the size of the shipping buffer, they knew exactly what problem(s) must be solved to reduce it – they needed to improve the dealer ordering and management processes so dealers would stop using deep reorder points and ordering weeks worth of product at once.

The Production Loop

When units were withdrawn from the end of the line, they were actual pulled from a FIFO buffer. In this case, the buffer held about 4 hours of production. Why? Most problems in production were cleared within that time. Only a bigger problem would starve the buffer and affect the withdrawal loop. Thus the purpose of this buffer was to make assembly appear as a perfect supplier to their perfect customer. They could supply exactly at the agreed-upon takt time.

Each of these units had a production kanban card attached to it. When shipping came to pull a unit, they would pull the production card and leave it in a kanban post. They would attach their withdrawal card and take the unit. Thus switching the cards transfers ownership of the product from one loop to the next. Since a kanban card authorizes a specific quantity to be in a specific location, if someone wants to take something somewhere else they need to attach a card authorizing them to do so. That was the case here.

The production cards went to the front of the assembly line. There were three slots there. One green, one yellow, one red. If everything was running smoothly, the card would go into the green slot, and when the next unit was started, the card would be pulled from the box and attached to the unit.

If the line were a little bit behind, there might still be a card in the green slot. Then the next card would go into the yellow slot. This would automatically signal the assembly manager that there was something that needed some attention.

The next card would end up in the red slot. This was the point when, if they weren’t already there for a known problem, they were in “line stop” mode. Anyone who could be helping to clear the problem should be helping to clear the problem. Why? The money machine has stopped running. Everyone is now being paid only because the shareholders are lending them money. The idea is to get the money machine running as quickly as possible, and it is the most important thing. This was a simple phased escalation process, and was part of their overall andon / escalation system.

Did it work?

All I can say is that it worked a hell of a lot better than what they were doing before. It took two or three serious tries to get this into place and keep it working, and they probably fell off the wagon a couple of times after that. There were always immense pressures to “reduce inventory” at the end of the quarter, for example, which would have management directing to starve out the shipping buffer, or push it out early. But, in general, when it was working, overtime was lower, things were more predictable, problems were identified very quickly.

But…

Yes, it looks like a lot of manual work involved. But I want to be really clear – the total time spent moving all of these cards around was a fraction of the time that had previously been spent investigating status, working action messages, making calls to find out what was happening, etc, etc. For some reason people seem to think that deliberate activities raise the total amount of labor involved, and that somehow, the time spent running after information and chasing problems is free.

Setting a standard and following it injects an element of stability and calm into an otherwise chaotic workplace. Once this basic foundation is in place it is far easier to improve overall efficiency because now there is an actual process to improve.

A Systematic Approach to Part Shortages – Part 1

The short story of assembly problems is lack of parts. Part shortages drive all kinds of waste, including: juggling the schedule; expediting; bigger lots or batches – and all of these things end up causing shortages later on in a self-reinforcing death spiral.

So how did an assembly shop which built about 10 units / day, and suffered between a dozen and 20 line-stopping part shortages a day end up eliminating all but a few (3-5) a week?

Three things, more or less at the same time. This post talks about the first:

Implement a kanban system to replace MRP ordering. They systematically studied how kanban is supposed to work, and, over a few months, put in a kanban system which I am proud to say was really pretty good. The assembly line was fed by kit carts which were picked at takt time from a small supermarket on the shop floor. The supermarket held a day or two of parts. The parts with local suppliers were replenished right from the receiving dock. Parts which had to still be purchased in larger quantities were stored in a warehouse area, and the shop floor supermarket was replenished from the warehouse daily.

The daily warehouse replenishment established the concept of isolating the problem. Their daily replenishment allowed them to set up the shop floor supermarket as if all of their suppliers were delivering daily.

All parts in the shop-floor supermarket and the warehouse were under kanban control. This means they had kanban cards physically attached to the parts (if they were separate) or the containers.

Some things they learned over time:

  • The rules of kanban state that the card should be pulled and placed in the post when the first part is removed from the container. The quickly learned this was far more likely to happen if they secured the card in a place where it was in the way of either opening the container (over the folding lid, for example) or had to be moved (e.g. picked up) to get the first part out. At that point the card is in the person’s hand and he has to put it somewhere.
  • The number 1 reason for lost cards was that “put it somewhere” was a pocket.
  • The number 1 reason why the card ended up in a pocket was that the kanban post was more than a step away from the place where the card was pulled. That meant the person put it in the pocket “for a second” while he got the parts.

In the above case the countermeasure was simple. Put kanban collection points everywhere where parts are handled.

  • The first time they tried putting a pull system in for parts ordering they hadn’t put in heijunka (production volume and mix leveling) first. That was a problem, and “problem” is an understatement.

The countermeasure was (obviously) to simultaneously implement a schedule leveling system to drive the upstream system at takt. More about that in Part 2.

  • They invariably had some parts where they had more than they needed.

The countermeasure was “black cards” (though I would have preferred bright orange cards) that signified “excess inventory.” These cards allowed them to maintain kanban control of all inventory, but they did not signal replenishment.

When a card was pulled, the shop floor coordinator would scan a barcode on the card. This scan triggered an order release to the supplier, and authorized the supplier to ship the indicated quantity.

Actual card from this organization being scanned.
Actual card from this organization being scanned.

They had agreements with the suppliers that there would be an email acknowledgment of the order within 2 hours. When the card was scanned, it was placed in a slot labeled with the time when the acknowledgment was expected. When (if) that time passed and the acknowledgment had not been received, the card went to the buyer who phoned the supplier. “Did you get my order? I need the acknowledgment within 2 hours like we agreed.”

This served two purposes. First, it verified receipt of the order and eliminated a known cause of shortages. Second it “trained” the suppliers that this time the customer actually expected them to honor the agreement. They really didn’t want that call from the buyer who had better things to do.

Once the acknowledgment was received, the card went to the receiving dock. Here it was placed in a slot that indicated the day (and later on, the time window) when those parts were supposed to arrive.

Like the above case, if the time passed, the card went to our poor hapless buyer. He phoned the supplier with a simple question: “Where’s my parts?”

This reinforced that, once again, there was an expectation to honor agreements. They really didn’t care that much (at this point) what the supplier’s lead time was. Only that it was honored. The main objective when starting out was simple consistent execution.

When the parts came in, the card was retrieved, matched with the order to verify, then scanned again to trigger a receipt transaction. If there were exceptions – guess what – another phone call.

The card was then attached to the container. Since the card specified the storage location, put-away was fairly straight forward. No location lookups required.

The previous condition had been that there was no matching of receipts against expectations. Thus if parts were late, or didn’t show up at all, no one noticed until they ran out. Big problem. By trapping and surfacing problems at the two main failure points in the system, most of those problems went away after a few months.

For the purists who are reading – yes, this process has some compromises and probably is a bit obsessive on checks. Call those training wheels until there is a sense of balance. All I can say is that it worked and, in the long run, ended up to be a lot less work than chasing down and expediting shortages all day.

The Seventh Flow

Those of you who are familiar with Shingijutsu’s materials and teaching (or at least familiar with Nakao-san’s version of things) have heard of “The Seven Flows.” As a brief overview for everyone else, the original version, and my interpretations are:

  1. The flow of people.
  2. The flow of information.
  3. The flow of raw materials (incoming materials).
  4. The flow of sub-assemblies (work-in-process).
  5. The flow of finished goods (outgoing materials).
  6. The flow of machines.
  7. The flow of engineering. (The subject of this post.)

A common explanation of “the flow of engineering” is “the footprints of the engineer on the shop floor.” I suppose that is nice-sounding at a philosophical level, but it doesn’t do anything for me because I still didn’t get what it looks like (unless we make engineers walk through wet paint before going to the work area).

Common interpretations are to point to all of the great gadgets, gizmos and devices that it does take an engineer (or at least someone with an engineer’s mindset, if not the formal training) to design and produce.

I think that misses the point.

All of those gizmos and gadgets should be there as countermeasures to real, actual problems that have either been encountered or were anticipated and prevented. But that is not a “flow.” It is a result.

My “put” here is that “The Flow Of Engineering” is better expressed as “The Flow of Problem Solving.”

When a problem is encountered in the work flow, what is the process to:

  • Detect that there even is a problem. (“A deviation from the standard”)
  • Stop trying to continue to blindly execute the same process as though there was no problem.
  • Fix or correct the problem to restore (at a minimum) safety and protect downstream from any quality issues.
  • Determine why it happened in the first place, and apply an effective countermeasure against the root cause.

If you do not see plain, clear, and convincing evidence that this is happening as you walk through or observe your work areas, then frankly, it probably isn’t happening.

Other evidence that it isn’t happening:

At the cultural and human-interaction level:

  • Leaders saying things like “Don’t just bring me the problem, bring a solution!” or belittling people for bring up “small problems” instead of just handling them.
  • People who bring up problems being branded as “complainers.”
  • A system where any line stop results in overtime.
  • No simple, on/off signal to call for assistance. No immediate response.
    • If initially getting help requires knowing who to phone, and making a long explanation before anyone else shows up, that ain’t it.
  • “Escalation” as something the customer (or customer process) does when the supplying organization doesn’t respond. Escalation must be automatic and based on elapsed-time-without-resolution.

Go look. How is your “Flow of Problem Solving?

What Nukes?

Cruise Missiles

Warning to Reader: This piece has a lot of free-association flow to it!

Oops. A few weeks ago a story emerged in the press that a B-52 had flown from North Dakota to Louisiana with half-a-dozen nuclear armed missiles under its wing. The aircrew thought they were transporting disarmed missiles. This is a rather major oh-oh for the USAF, as in general, they are supposed to keep track of nuclear warheads. (Yeah, I am understating this. I, by the way, can speak from a small amount of experience as I once held a certification to deal with these things, so I have some idea how rigorous the procedures are.)

Normally the military deals with nuclear weapons issues with a simple “We do not confirm or deny…” but in this case they have released an unprecedented amount of information, including a confirmation that nukes were on a particular plane in a particular location at a particular time.

The news story of the report summarized a culture of casual disregard for the procedures – the standard work – for handling nukes. I quote the gist of it here:

A main reason for the error was that crews had decided not to follow a complex schedule under which the status of the missiles is tracked while they are disarmed, loaded, moved and so on, one official said on condition of anonymity because he was not authorized to speak on the record.

The airmen replaced the schedule with their own “informal” system, he said, though he didn’t say why they did that nor how long they had been doing it their own way.

“This was an unacceptable mistake and a clear deviation from our exacting standards,” Air Force Secretary Michael W. Wynne said at a Pentagon press conference with Newton. “We hold ourselves accountable to the American people and want to ensure proper corrective action has been taken.”

So what’s the point, and what has this got to do with lean manufacturing?

The right process produces the right result.

As true as this is, it isn’t the point. The point is that the Airmen didn’t follow the procedures. And now the Air Force will apply the “Bad Apple” theory, weed out the people who are to blame, re-emphasize the correct procedures everywhere else, and call it good.

How often do you do this when there is a quality problem, an accident or a near miss? How often to you cite “Human Error” or “not following procedures” or “didn’t follow standard work” as a so-called root cause?

You need to keep asking “why” some more, probably three or four more times.


Field Guide to Understanding Human ErrorTo this end, I believe Sydney Dekker’s book “Field Guide To Understanding Human Error” should be mandatory reading for all safety and quality processionals.

Dekker has done most of his research in the aviation industry, and mostly around accidents and incidents, but his work applies anywhere that people’s mistakes can result in problems.

In the USAF case cited above, there was (according to the reports in the open press) a culture of casual disregard for the established procedures. This probably worked for months or years because there wasn’t a problem. The “norms” of the organization differed from “the rules” and I would speculate there was considerable peer pressure, and possibly even supervisory pressure, to stick with the “norms” as they seemed to be adequate.

Admittedly, in this case, things went further than they normally do, but let’s take it away from nuclear weapons and into an industrial work environment.

Look at your fork truck drivers. Assuming they got the same training I did, they were taught a set of “rules” regarding always fastening seat belts, managing the weight of the load, keeping speed down and under control, checking what is behind and to the sides before starting a turn (as the rear-end swings out.. the opposite of a car). All of these things are necessary to ensure safe operation.

Now go to the shop floor. Things are late. The place is crowded. The drivers are under time pressure, real or perceived. They have to continuously mount and dismount. The seatbelt is a pain. They get to work, have the meeting, then are expected to be driving, so there is no real time for the “required” mechanical checks. They start taking little shortcuts in order to get the job done the way they believe they are expected to do it. The “rules” become supplemented by “the norms.” This works because The Rules apply an extra margin of safety that is well above the other random things that just happen around us every day. The Norms – the way things are actually done erode that safety margin a little bit, but normally nothing happens.

Murphy’s Law is wrong. Things that could go wrong usually don’t.

The “Bad Apple” theory suggest that accidents (and defects) are the fault of a few people who refuse to follow the correct procedures. “If only ‘they’ followed ‘the rules’ then this would not have happened.” But that does not ask why they didn’t do it that way.

Recall another couple of catastrophes: We have lost two Space Shuttle crews to the same problem. In both the Challenger and Columbia accident reports, the investigators cite a culture where a problem which could have caused an airframe loss happened frequently. Eventually concern about it became routine. Then, one time, other factors come into play and what usually happens didn’t happen and we are wringing our hands about what happened this time. Truth is it nearly happened every time. But we don’t see that because we assume that every bad incident is an exception, the result of something different this time. In reality, it is usually just bad luck in a system which eroded to the point where luck was relied upon to ensure a safe, quality outcome. In this case they didn’t single out “bad apples” because the investigations were actually done pretty well. Unfortunately the culture at NASA didn’t adjust accordingly. (Plus Space Flight involves the management of unimaginable amounts of energy, and sometimes that energy goes where we don’t want it to.)

So – those quality checks in your standard work. Do you have explicit time built in to the work cycle to do them? Are your team members under pressure real or perceived to go faster?

What happens if there is an accident or a defect? Does the single team member who, today, was doing the same thing that everyone does every day get called out and blamed? Just look at your accident reports to find out. If the countermeasure is “Team Member trained” or “Team Member told to pay more attention” or just about anything else that calls out action on a single Team Member then… guilty.

What about everybody else? Following an incident or accident, the organization emphasizes following The Rules. They put up banners, have all-hands meetings, maybe even tape signs up in the work place as reminders and call them “visual controls.” And everything goes great for a few weeks, but then the inevitable pressure returns and The Norms are re-asserted.

Another example: Steve and I were watching an inspection process. The product was small and composed of layers of material assembled by machine. Sometimes the machine screwed up and left one out. More rarely, it screwed up and doubled something up. As a countermeasure, the Team Member was to take each item and place it on a precise scale, note the weight, and compare the weight to a chart of the normal ranges for the various products.

There were a couple of problems with this. First, the human factors were terrible. The scale had a digital readout. The chart was printed and taped to the table. The Team Member had to know what product it was, reference the correct line on the chart, and compare a displayed number with a set of displayed numbers which were expressed to two decimal places. So the scale might say “5.42” and she had to verify whether that was in or out of the range of “5.38 – 5.45”

Human nature, when reading numbers, is that you will see what you expect to see. You might recall that it was different after five or six more reads. So telling the Team Member to “pay more attention” if she made a mistake was unreasonable. Remember, she is doing this for a 12 hour shift. There is no way anyone could pay attention continuously in this kind of work. If a defective item got through, though, there would be a root cause of “Team Member didn’t pay attention.” She is set up to fail.

But wait, there’s more!

She was weighing the items two at a time. Then she was mentally dividing the weight by two, and then looking it up. Even if she was very good at the mental math and had the acceptable range memorized, that isn’t going to work. Plus, and this is the key point, in the unlikely but possible scenario where the machine left out a layer in one item, then doubled up the next, the net weight of the two defective items together would be just fine.

“Why do you weight two at a time?” Answer: “It’s faster.” This is true, but:

  • It doesn’t work.
  • She doesn’t need to go faster.

Her cycle time for weighing single items was well within the required work pace. But the supervisor was under pressure for more output because of problems elsewhere, and had translated that pressure to the Team Member in a vague “work faster if you can” way. It was the norm in that area, which was different from the rules.

Where is all of this going?

The Air Force has ruined 70 careers as a result of the cruise missile incident. They may have been right to do so, I wasn’t there, and this was a pretty serious case. But the fact that it got to this point is a process and system breakdown, and it goes way beyond the base involved.

Go to your own shop floor. Stand in the chalk circle. Watch, in detail, what is actually happening. Compare it with what you believe should be happening. Then start asking “Why?” and include:

“Why do people believe they have to take this shortcut?”

“Sticky” Visual Controls

The textbook purpose of visual controls is “to make abnormal conditions obvious to anyone.” But do your visual controls pass the Sticky test, and compel action?

Simple: Does your control convey a single, simple message? Or does it “bury the lead story” in an overwhelming display of interesting, but irrelevant, information. According to Spear and Bowen (“Decoding the DNA of the Toyota Production System”) information connecting one process to another is “binary and direct.” The signal is either “On” – something is required of “Off” – nothing is required. There is no ambiguity.

Take a look at some of your visual controls. Do they pass the test? Do they clearly convey that something needs attention, or is that fact subject to interpretation?

Unexpected: Why would a visual control need to be “unexpected?” Consider the opposite. Who pays attention to car alarms these days? Yes, they are annoying, but because they so often mean nothing, nobody pays attention to them. We expect car alarms to be false alarms. If your visual control is to mean something, you must respond each time it tells you to. If it is a false alarm, you have detected a problem. Congratulations, your system is working. But it will only continue to work if you follow-through: STOP your routine; FIX or correct the condition; INVESTIGATE the root cause and apply a countermeasure. All of this jargon really means you must adjust your system to prevent the false alarm. Failure to do so will render the real alarm meaningless. It will “Cry Wolf” and no one will take it seriously.

Concreteness: Is it very clear? Do people relate to what your visual control is telling them? Does the Team Leader know that the worker in zone 4 needs help, and that the line will stop in a few minutes if he doesn’t get it?

Credibility: If the condition is worsening, does your visual control show it? Does it warn of increased risk? A typical example would be an inventory control rack with a yellow and red control point on it. Yellow means “Do something” Red means “You better start expediting or making alternate plans because you are going to run out.” Setting the red limit too far up, though, sends out false alarms (see unexpected), and eventually everyone “knows” the process can eat a little into the red with no problem. Why have yellow? What visual control can you put at the yellow line that tells you someone has seen it and is responding to the problem? (Left as an exercise for the reader.)

Emotions: How does your visual control compel action? Does it penetrate consciousness? A few words of warning on an obscure LCD panel aren’t going to mean very much unless someone reads them. How do you get the attention of the person who is supposed to respond? “He should have paid more attention” is the totally wrong way to approach missed information.

Stories: I really connected with this one. Stories are a great way to teach. Simulations are interactive stories. When teaching the andon / escalation process in a couple of different plants we divided the group into small teams, gave them a real-life defect or problem scenario and had them construct a stick-figure comic book that told the story of what would happen. That has proven a great way to reinforce and personalize the theoretical learning.

I will admit that these analogies can be a bit of a stretch, but the real issue is there. Visual controls are critical to your operation because they highlight things that must compel a response.

Your system is not static, or even really stable. It is either improving continuously through your continuous intervention, correction and improvement based on the problems you discover; or it is continuously deteriorating because those little problems are slowly eroding the process with more and more work-arounds and accommodations.

Go to your work area and watch. What happens when there is a problem or break in the standard? What do people do? Can they tell right away that something is out of the ordinary? How can they tell? For that matter, how can you tell by watching? If you are not sure, then first work to clarify the situation and put in more visuals. That will force you to consider what your standard expectations are, and think about responding when things are different than your standard.

5S – Learning To Ask “Why?”

ShadowboardThis photo could have been taken anywhere, in any factory I have ever seen. The fact that I do not have to describe what is out of place is a credit to the visual control. It is obvious. But one of my Japanese sensei’s once said “A visual control that does not trigger action is just a decoration.”

What action should be triggered? What would the lean thinker do?

The easy thing is to put the tape where it belongs.
But there is some more thinking to do here. Ask “Why?”

Why is the tape out of place? Is this part of the normal process? It the tape even necessary? If the Team Member feels the need to have the tape, what is it used for? If the Team Member needs the tape there has the process changed? Or did we just design a poor shadow board?

That last question is important because when you first get started, it is usually the case. We make great looking shadow boards, but the tools and hardware end up somewhere else when they are actually being used.

Why? Where is the natural flow of the process?

Before locking down “point of use” for things, you need to really understand the POINT where things are actually USED. If the location for things like this does not support the actual flow of the normal process, then you will have no way to tell “the way things are” from “the way they should be.”

The purpose of 5S is not to clean up the shop. The purpose is to make it easier to stand in your chalk circle and see what is really happening. The purpose is to begin to ask “Why?”

By the way – if you see an office chair or a trash can being used as an assembly bench, you need to spend a little more time in your chalk circle. 🙂

Hidden Negative Consequences

“Stop the line if there is a problem” is a common mantra of lean manufacturing. But it is harder than first imagined to actually implement.

The management mindset that “production must continue, no matter what” is usually the first obstacle. But even when that is overcome, I have seen two independent cases where peer social pressure between the workers discouraged anyone from signaling a problem. The result was that only problems that could not be ignored would come to anyone’s attention – exactly the opposite of the intention.

Both of these operations had calculated takt time using every available minute in the day. (Shift Length – Breaks etc.) Further, they had a fairly primitive implementation of andon and escalation: The line stop time would usually be tacked onto the end of the shift as overtime. (The alternative, in these instances, is to fall behind on production, and it has to be made up sometime.)

So why were people reluctant to call for help? Peer pressure. Anyone engaging the system was forcing overtime for everyone else.

Countermeasure?

On an automobile line, the initial help call does not stop the line immediately. The Team Leader has a limited time to clear the problem and keep the line from stopping. If the problem is not cleared in time, the problem stops the line, not the person who called attention to it. This is subtle, but important. And it gives everyone an incentive to work fast to understand and clear the problem – especially if they know it takes longer to clear the problem if it escapes down-line and more parts get added on top.

There are a number of ways to organize your problem escalation process, but try to remember that the primary issue is human psychology, not technology.

Hoopla – Another Quality Story

As I said in a previous post, I am spending the majority of my time in China right now.

As part of his preparation for attending a corporate class, one of my kaizen specialists was reviewing some of the training materials. From the other side of the cubicle wall he asks “What is ‘hoopla?” Although his English is very good, he is Chinese, and “hoopla” just isn’t a word that they teach in the universities.

Now I know for sure he is not the first non-native English speaker to read that material, and I also strongly suspect many before him did not know what “hoopla” means. But the others just made a guess from the context and kept reading.

Derrick, though, applied the first two steps of jidoka – he Detected a problem – something wasn’t right, didn’t meet the standard, or seemed to be in the way. Then he Stopped the process, and called for assistance. Instead of guessing what to do, the Team Member pulled the andon (got his Team Leader’s attention) and pointed out the problem.

The standard in this case is that the person reading the material can understand it, or at least understand the words that are not specifically being explained by the material. In this case, that didn’t happen. “Hoopla” was not understood, so the andon was pulled.

The third step is Fix or correct the problem, restore the standard (without compromising safety or customer quality in any way) and re-start the process. I explained what “hoopla” means, and Derrick could keep reading.

The fourth step is Investigate the Root Cause, Apply a Countermeasure. I sent an email to our training developer and mentioned what has now become “the hoopla incident.” The ensuing discussion among the training developers has resulted in a set of standards and guidelines for writing materials. Among other things, it calls out the need to avoid idioms and slang that might not be understood by non-native speakers. It also addresses other issues which will both make the materials easier for non-native speakers to read and make it easier for translators working to port the material to German, French, Spanish or… Mandarin.

W e should have some hoopla! The process worked – all because someone called out something he didn’t understand instead of just dealing with it on his own.