Systematic Problem Solving

If I were to look at the experience of the organization profiled in the last three posts “A Systematic Approach to Part Shortages” I believe their biggest breakthrough was cultural. By applying the “morning market” as a process of managing problems, they began a shift from a reactive organization to a problem solving culture.

I can cite two other data points which suggest that when an organization starts managing problem solving in a systematic way, their performance begins to steadily improve. Even managing problem solving a little bit better results in much more consistent improvement and less backsliding. Of course my personal experience is only anecdotal. That is certainly true by the time you read it here as I try to filter things. But consider this: The key difference with Toyota’s approach that Steven Spear pointed out in “Decoding the DNA of the Toyota Production System” (as well as his PhD dissertation) was that the application of rules of flow triggered problem solving activity whenever there was a gap between expected and actual process or expected and actual outcome.

Does this mean you should go out and implement morning market’s everywhere? Again, based on my single company data point, no. That doesn’t work any more than attaching kanban cards to all of your parts and calling it a pull system. It is not about the white boards, it is not even about reviewing the problems every day. Reactive organizations do that too. In most cases in the Big Company that implemented morning markets everywhere, that is what happened – they morphed into another format for the Same Old Stuff.

Here are some of the things my example organization did that I think contributed to success in their cultural shift.

  • They separated “containment” and “countermeasure” as two separate and distinct responses. This was to make sure that they all understood “containment” is what is done immediately to re-start production while preserving safety and quality. “Containment” was not a countermeasure as it rarely (if ever) actually addressed the root cause. It only isolated the effect of the problem as far upstream as possible. The rule of thumb was simple:
    • Containment nearly always adds time, cost, resources, etc.
    • A true countermeasure nearly always removes not only the containment, but reduces time, cost, resources.
  • They didn’t use the meetings to discuss solutions. They only addressed two things:
    • A quick update on the status of ongoing problem solving.
    • A quick overview of new problems from yesterday.

I think this is an important point because too many meetings get bogged down with people talking about problems, and speculating what the causes are. That is completely non-productive.

  • The actual people working on problems attended the meeting. I cannot over-emphasize how important this is. They did not send a single representative. Each person with expected activity reported his or her progress over the last 24 hours. It is difficult to stand in front of a group and say “I didn’t do anything.”
  • They blocked out time to work on problems. I probably should have put this one first. The manufacturing engineers and other professional problem solvers agreed not to schedule anything else for at least two hours every morning. This time was dedicated to working on the shop floor to understand the problem, and physically experiment with solutions. There was a lot of resistance to this. But over a couple of months it became close to the norm. It helped a lot because it started to drive the group to consider where they really spent their time vs. what they needed to get done. There was no doubt in the past that solving these problems was important, but it was never urgent. Nobody was ever asked why they weren’t working on a shop floor issue. That had been a “when I am done with everything else activity.” Gradually the group developed a stronger sense of the shop floor as their customer.
  • They didn’t assign problems until someone was available to work on them. This came a little later, when the problem-solvers were missing deadlines. The practice had been to assign a responsible person in the morning when the problem was first reviewed. Realistically a person can work on one problem a time, and perhaps work on another when waiting for something. They established a priority list. The priority was set primarily by manufacturing. When a problem-solver became available, the next item on the list was assigned. Once a problem was assigned, nothing would over-ride that assignment except a safety issue or a defect that had actually escaped the plant and reached a customer.
  • They got everyone formal training on problem solving with heavy emphasis on true root cause. People were expected to follow the method.
  • A problem was not cleared from the board until a long term countermeasure had been implemented and verified as working.

By blocking out time, they were able to establish some kind of expectation for productivity. After that, if problems were accumulating faster than they were being cleared, they knew they had a methods or resource issue. The same was true for their other tasks which were worked on during the rest of the day.

This was the start of establishing a form of standard work for the problem solvers.

21 Oct 08 – There is more on the subject here and here

What Nukes – a little more clear.

I re-read my “What Nukes?” post and realized I was really rambling. I want to reiterate a key point more clearly because I think it is important.

In the “Bad Apple” theory there is an implied assumption that the cause of an accident or other problem was one person who, at that moment in time, was not following the documented rules or procedures.

Except in the most egregious cases, such as deliberate misconduct, that is likely not the case. Most organizations have a set of “norms” that operate at some level of violation of the written or established procedures. The reasons for this are many, but usually it is because good people are doing the best they can, in the conditions they are given, to get the job done.

Failure to follow the rules does not result in an accident or incident.

Have you every run a red light or a stop sign? It happens thousands of times every day. It almost never results in an accident. Only when other contributing conditions are ripe will an accident result. Running a stop sign AND a car coming through the intersection.

The same goes for quality checks, and the more reliable an “almost 100%” process becomes, the more vulnerable you are. If a defect is only rarely produced, it is unlikely that any kind of human-based inspection will catch it. The faster the work cycle, the more this is true. The mind numbs, it is impossible to always pay attention to the detail, and the mind sees what it expects. “Failure to pay attention” is never an adequate root cause. It is blaming an unlucky Team Member for an omission that everyone makes every day just going through life. It is just, in this case, “there was a car coming through the intersection.” It is bad luck. It is being blamed for red beads in Deming’s paddle experiment.

So attaching the failure of an individual, while it is easy, avoids the core issue:

People’s failure in critical processes is a SYSTEM PROBLEM. You must investigate from the viewpoint of the person at the pointy end. What did he see? What did he perceive? What did he believe was happening and why was that belief reasonable given his interpretation of the circumstances at the time.

The post about “sticky visual controls” got to this. Your mistake-alerts or problem signals must penetrate conciousness and demand attention if they do not actually shut down the process.

Standards Protect the Team Members

One of my kaizen-specialists-in-training just came to me asking for help. The Team Members he is working with are not seeing the need to understand sources of work variation.

I hear that a lot, both in companies I have worked in and in the online forums. Everyone seems to think it is a problem in their company, their culture – that they are unique with this problem.

The idea of a unique problem is variation on the “our process / environment / product is different so ____ won’t work here.” Someday I will make a list of the standard management “reasons why not” but that isn’t the topic of this post.

I told him:

  1. This is not unique to China, or to this facility. The same resistance a always comes up, and nearly always comes up the same way once the Team Members begin to realize we are serious.
  2. There is no way to just change people’s minds all at once.

Here is something to explain to the concerned Team Members: The standard process is there to protect the team member. If there is a problem, and the standard process was followed, then the only focus for investigation can be where the process itself broke down. Countermeasures are focused on improving the strength of the process.

If, on the other hand, the process was not followed (or if there is no process), then the team member is vulnerable. Instead of the “Five Why’s” the investigation usually starts with the “Five Who’s” – who did it? Countermeasures focus on the individual who happened to be doing the work when the process failure occurred.

As you introduce the concept of standard work into an area that is not used to it, it is probably futile to try to tighten down everything at once. The good news is that you really don’t have to.

Start with the key things that must be done a certain way to preserve safety and quality. If they are explained well and mistake-proofed well, there is usually little disagreement that these things are important.

The next step is to make it clear that the above are totally mandatory. If anything gets in the way of doing those operations exactly as specified, then STOP. Do not just work around the problem, because doing so makes you (the Team Member) vulnerable to the Five Who’s inquisition.

If you focus here for a while, you will start to get more consistent execution leading to more consistent output, which is what you want anyway.

Then start looking at consistent delivery and all of a sudden the concept of variation in time comes into play. Why was this late? The welder ran out of wire, I had to go get some more, I couldn’t find the guy with the key to the locker…… Go work on that. At each step you must establish that the point of all of this is to build a system that responds to the needs of the people doing the work.