The Value of Failure

When I joined IBM In 1974, Tom Watson Jr. was still its figurehead, despite having resigned from being the second president of IBM three years earlier. He had stepped into the footsteps of his father, Thomas Watson Sr. the founder of IBM, in 1952. Senior was a shrewd businessman, while his son was an innovator.

Thomas Watson Jr. (left) and his father

Tom Jr. saw that computers were the future of IBM and started a huge research program that spent 9% of revenue (versus the typical 5-6) that led to the 1960 introduction of the first commercial-use computer, the IBM 1401, which was mostly responsible for continuing IBM’s dramatic growth rate of 30%. With it came the first high-speed printer the IBM 1403, a machine that I still worked on as an engineer in 1974. I am certain that the hydraulic-driven, ‘high-speed’ printer (600 lines per minute) was essential for the commercial success of computing at IBM.

Tom Watson’s most dramatic gamble was the development of the System/360 that would offer for the first time compatible computers of different power with a variety of I/O (Input/Output such as card readers, printers, tape and disk storage) devices that could be configured to needs. When the system, designed by IBM chief architect Gene Amdahl, was delivered much later than intended in 1964, it turned out to be a huge success. It was the predecessor to the S/370, S/390 and its function set is in principle still the base of IBMs current line of z/Series mainframes.

What IBM Taught Me

My first direct involvement in IBMs mainframe technology was a 1978/9 assignment to IBM’s Havant (UK) plant, to test newly assembled IBM 3033 mainframes. Those were fascinating years, as bugs were still chased through test-routines that created the code loops to measure signal shapes with an oscilloscope on the cables between the processor boards.

Each production test engineer would follow a single machine from initial assembly to shipment. The first 3033 to be shipped from Havant to a German customer was in test for nine month! On my return to Austria, I could fix any problem in a 3033 processor in a fraction of the time of our local specialists, because I had been debugging 8-10 hours a day for over a year. Each functional failure that I had to track down to its ultimate reason – a duff integrated circuit, a broken motherboard, a bent or simply loose trilead (a kind of mini-coax cable) – taught me how things should be working much more than any course or manual could. I would not just follow the test-sequence but I would run tests beyond the first failure and look for patterns. That was in principle wrong, because test results might be caused as a follow-on failure of earlier detected bugs and thus point to the wrong function. But despite ignoring standard procedure I was faster and more successful in fixing problems than other engineers who had been doing this much longer. I didn’t think much of it and just did what worked best for me.

To my surprise I was soon assigned to work as a ‘jumper’ who would be called in to fix difficult problems on different machines on the factory floor. Without initially being aware of it, I employed a kind of holistic systems thinking for debugging the multi-CPU, parallel-processing mainframe. I realized only much later that the hardware alone was just complicated, but the combined layers of hardware function, ALU (Algorithmic Logic Unit) pico-code, processor micro-code, S/370 machine code, and finally the operating system with its thread scheduling, memory addressing and exception handling, actually represented complex emergence. I know today that looking for patterns was intuitively the right approach to identify the events that caused the malfunction. IBM researchers layered their learning experiences on top of each other, employing a hierarchical set of layers each allowing new functionality to emerge higher up. Even on the test floor we would still recommend improvements in hardware and micro-code, thus showing emerging innovation too.

Lessons from a Different Culture: Saudi Arabia

From 1981 to 84, I went on assignment as the Field Support Specialist for Saudi Business Machines, an IBM joint venture with the Juffali family. Part of my job was to train Arab field engineers from many Middle East countries in S/370 technology. Planning the daily course schedule according to Muslim prayer times was difficult but essential. I even had to develop the course materials myself, but despite the challenge of teaching culturally different mindsets, it was one of the most fun things I ever did. Without my technical experience from Havant I could not have done it. Only at some time later I realized that I had tried to give those young engineers a level of understanding that was no longer required.

1980: The IBM 3033 Multi-Processor - less CPU power than an iPhone

For later IBM mainframes such as the S/390, engineers no longer needed to know how a machine actually worked as the hardware was much more integrated and tests would to point to much larger replacable units. Problems were pinpointed by swapping hardware pieces around. The running joke was: ‘How does an IBM hardware engineer fix a punctured flat tire on his car? Simple! He switches the tire with another one and checks if the problem moves.’

To understand why for example a different I/O load would change processor utilization needed a grasp on the complexity of a large mainframe system with its processors, caches, memory and I/O systems and how the software linked all of them together. The hardware function was technically predictable but not how the customer programs would actually use it, especially in terms of how the operating system would react to a variety of events. That was the reason that on return from Saudi Arabia I left hardware engineering to become a systems consultant. Fixing problems in complex processors had taught me invaluable lessons and changed my thinking, but I was no longer interested to work on ‘perfect hardware’ as all it required was to perfectly follow a maintenance procedure. You can see that I had this anti-process perspective already in the 80’s!

A $10 Million Dollar Education

But I feel that mine were not just chance experiences. When IBM had hired me right out of college in 1974, the HR manager told me an anecdote about Tom Watson Jr.  According to it, Tom Watson had called a VP to his office to discuss a failed development project that lost IBM in the range of $10 million. Expecting to be fired, the VP presented his letter of resignation. Tom Watson Jr. just shook his head: “You are certainly not leaving after we just gave you a $10 million education.” In those days, failure was not a problem at IBM as long as it was turned into a learning experience. The important point is that the financial focus has to take the backstage. Before Steve Jobs returned, Apple was run by bean counters who demanded profits over customer value. On his return the company was just three months away from bancruptcy. Allowing people to fail and learn is not a blanket job guarantee for those who don’t care! Steve Jobs fired the exec responsible for the MobileMe disaster on the spot when he botched another service launch.

During my tenure in IBM’s Havant plant I had learned that I needed to turn my thinking upside down: Not failure is the outlier, but success is! Trying to understand why we couldn’t fully control and predict how a complex system would work led me to learn about evolutionary concepts and complex adaptive systems. Biologist E. O. Wilson’s book ‘Consilience‘ was a milestone for me, as was Douglas R. Hofstadter’s ‘Gödel, Escher, Bach – An Eternal Golden Braid’ that led me to focus on artificial intelligence.

When I finally left IBM to start my own software business in 1988, not punishing my employees for failure was one of many IBM company culture principles that I took along. I also told stories such as the one of the 3M Post-It! note that was created from a glue that wouldn’t really stick. This kind of free experimentation can also bring surprisingly positive side effects such as Pfizer’s Viagra, originally intended to be a high-blood pressure drug. The rest is history, as beautifully depicted in the recent movie ‘Love and Other Drugs’ with Anne Hathaway.

After centuries of Tayloristic command and control management approaches, adaptive concepts are finally being considered in the business world. Tim Harford wrote in ‘Adapt: Why Success Always Starts with Failure’ that we can’t rely on expert advice, command economies, and top-down organizational structures when human endeavors exceed a certain complexity. It is an illusion to expect to be able to anticipate and plan for all consequences of decisions taken. Complexity doesn’t allow predictability and therefore hinders planned improvements. Innovation is only found in environments that support variation through experimentation on a small scale in randomized trials. Selection of successes is achieved through competition and diverse feedback.

Enter Adaptive Case Management

My own life experiences are the reason that I turned a scientific perspective into the business and software concept that is today at the core of our software solutions. Today I can use the example of the Apple Appstore social network as a vivid and verifiable proof of the success-through-failure approach. It provides an ecosystem of autonomous innovators who thrive through the power of evolution. The best apps will succeed, while many won’t. Steve Jobs himself failed multiple times until he succeeded. Thomas Edison invented 6000 failures to finally discover a usable light bulb. Both are well documented history. While a guiding vision is certainly essential, one has to be willing to admit failure, learn from it and try again, and again, and again …

Quite obviously, wanting innovation is not just inventing new successful things, but much rather a focus on customer value. Innovation does and must happen continuously on all levels, in the small and in the large, while not all innovations will succeed. Tom Watson Jr. supposedly said, “If you want to succeed faster, double your failure rate.” As an executive you have to allow and even promote the opportunity to fail, which is diametrically opposed to perfect business processes as demanded by BPM or SixSigma. James March pointed to the importance of exploration and exploitation of knowledge in organizations in 1991. But how do you fail fast and ensure that the newly gained knowledge isn’t lost? Nobody likes to share his failures, right? That is why the concept of Adaptive Case Management offers a radical departure from the perfectly-optimized-process illusions. ACM enables large organizations to fail and innovate faster by ensuring that gained knowledge becomes transparent and reusable without needing a bureaucracy. The ability to ADAPT (change future process execution through learning by doing) is very different to Ad-Hoc or Dynamic processes.

A process optimization bureaucracy of any kind (i.e. BPM, Six Sigma or Lean) might ensure cost cutting but it will not support, promote or provide true knowledge-from-failure innovation. Perfect and cheap processes designed by an outside consultant are stale and dead. Yes, code-freeze kills the germs of infectious innovation! Giving the process owner authority to pursue assigned goals any way he wants as long as he achieves outcomes, operational targets and handovers is the kind of social empowerment needed for success. Autonomy is further a key element in employee (and thus customer) satisfation. Allow for a variety of processes and tasks to fail or succeed until the best ones sustain. For effectiveness you need to allow processes to be improved by the people who perform them. That is additionally the most natural and efficient approach to optimization. Governance should at most define the high-level Business Architecture and ontology to reduce ambiguity but not nail down low-level processes. As I posted recently: “Let’s face it, orthodox process flowcharting won’t survive the social and mobile revolution.” I am going to stick with that prediction.

Outside manufacturing, we deal with a business complexity and speed of change that makes it near impossible to tell others exactly what to do. It is ridiculous that predictive analytics promote statistical correlation as causal decision points for processes. Knowledge workers improve outcomes through emotional inspiration and not Boolean if/then/else logic. We need to trust their skill. They listen to customers, translate goals into needed activities, and then execute based on their intuition and experience. Yes, many people in large organizations don’t care about outcomes because they are jaded by bureaucracy. And now we punish and mistrust them for our failure as executives to empower them? How will a rigid flowchart allow for innovation and make them less jaded? Yes, your people need understandable and documented objectives, targets and goals, but then authority, autonomy and means will be the only thing that takes them from careless to caring. ACM is therefore not about cost cutting but about empowering people to deliver value to their customers and make doing so transparent to management.

Soichiro Honda said: “Success represents the ONE percent of your work that results from the 99 percent that is called failure.”

17 Comments on “The Value of Failure

  1. Sorry writing in German, but the article I am referencing is written in German too.

    Hallo Herr Pucher,
    ja, immer mehr Menschen werden sich bewusst, dass wir unsere Intuition nutzen und immer mehr Wissenschaftler empfehlen, sich mehr darauf zu verlassen. Ich hatte letzte Woche einen Vortrag von Alexande Tornow aufgegriffen, der gut verdeutlicht, warum wir unsere Intuition gerade in komplexen Situtationen nutzen sollten. Unser Bewusstsein kann gerade mal 3-4 zusammenängende Parameter bewerten. Dagegen ist unser Ko-Pilot in der Lage ein deutlich Vielfaches zu meistern: http://www.saperionblog.com/lang/de/bpm-unsere-welt-ist-komplex-und-nicht-berechenbar-daher-wurden-wir-mit-intuition-ausgestattet/5196
    Interessant ist dann seine Ableitung daraus:
    Der Teamleiter sollte sich auf seinen Ko-Pilot Team verlassen (setzt “mündige” Teammitglieder voraus)
    Die Regierungen sollten sich auf den Ko-Pilot Bürger stützen (wenn man sieht, wie unsere gewählten Repränsentanten im Nebel rumzustochern scheinen …)

    Viele Grüße, Martin Bartonitz

    Like

  2. Unfortunately too many people today are scared of failure, and what that may mean (often with companies getting rid of you). Because of this, so many then chose to ” be safe ” and never really innovate or think outside of the box…

    If we learn from something, is it really a failure after all? Great article…

    Like

  3. Max – Wonderful post.

    Failure is the flip-side of innovation. Failing is a function of trying, endeavoring to succeed. If you don’t fail on occasion you aren’t pushing boundaries and you don’t gain experience.

    Unfortunately risk-adverse organizations try to ‘plan-away’ failure and put an emphasis on blame. Despite the extraordinary failure rate of this approach, 75% and above in software projects over $1 million, the cost of bureaucracy, and the slow movement – this approach prevails.

    It’s a problem of organizational/societal culture – false assurances for perception of control, rather than experimentation in the endeavor to succeed.

    The time does seem ripe for change. While ACM can’t change culture directly, it is a technology enabler for this new world.

    Like

  4. The fear of failure from the top down stiffles everything, so much so that the actual fear itself makes organisations too bereacratic and therefore, more likely to fail…

    I am a strong believer that failure is a part of moving forward. If you want to be innovative, get ahead of the competition and strive to be better / do something new, then you have to accept that you may well fail. Acceptance is key, once you accept that you may fail, or that you have, then you can learn, grow and move on…

    Like

  5. Max – really great post.

    You have hit on something very interesting and very important. I was reading the HBR Article on failure this morning (http://hbr.org/2011/04/strategies-for-learning-from-failure/ar/1) after prompting from Jacob Ukelson’s post (http://ukelson.wordpress.com/2011/04/23/preventable-faillure-unavoidable-failure-intelligent-failure/)

    IBM had a knowledge work culture, and your story of the $10M education is perfect to illustrate that. Not all companies have such a culture, and I believe it is one of the biggest barriers to adoption of ACM. Today, working in their inefficient way, all their mistakes are hidden, and they believe that is critical for continued employment. Taking up ACM means that their mistakes will be far more visible than before.

    Still, that is the point of ACM isn’t it? Organizational learning. Reminds me of Peter Senge and “The Fifth Discipline”. You can’t learn if you are not allowed to fail, and yet many of our IT systems seem designed to prevent failure at all costs. If you are running a factory you want to prevent failure. But who was it who said that business is a factory?

    Steven Spear is calling this a “High Velocity Organization” and that is actually the same thing, but with better marketing. ‘Learning’ sounds so nerdy, while ‘high velocity’ can appeal to any board of directors or investors.

    see: http://social-biz.org/2010/01/25/chasing-rabbits-with-bpm/

    So what Jacob says it right, the ACM crowd has been too technology focused. We need instead to figure how to convince organizations to be open to learning from failure. Then, and only then, will technology to support knowledge work be employed.

    Like

  6. Pingback: Failure is Essential to Knowledge Work | Collaborative Planning & Social Business

  7. Max, your article really resonated with me. Thank you for posting. I personally tend to drive too hard for optimization (euphemism for perfection). That leads me towards gold-plating and really slows down my personal creativity and ability to be generative. I have to think about how I can incorporate this “willingness to fail” mindset into my daily habits.

    Like

  8. Great post Max,
    It reminds me of a phrase i read in of the Lean books where mr. Toyota said that if nothing goes wrong (an employee should pull the cord to hold the line and take counter measures to improve – Jidoka principe) you have to check yourself the floor as most likely they hide something for you.

    Like

  9. Pingback: Business Process and Adaptive Case Management News and Information » Failure is Essential to Knowledge Work | Collaborative Planning …

  10. Ausgezeichnet – dem ist nichts hinzuzufügen. Das sollte ein verbindlicher Teil unserer Business- und Managementkultur sein. In diese Richtung muessen wir unsere Anstrengungen, Werte und Grundsaetze richten und weiterentwickeln (leider sind wir in den vergangenen Jahren durch “Manger” etwas vom Weg abgekommen)

    Like

  11. Pingback: Preventing Failure vs Fixing Failure « Jacob Ukelson's Blog

  12. Hello Max:

    Glad you touch the ontology management to process execution. As ACM tools are hitting the market, I cannot understand yet, how they will support ontology management. When you build a case and start executing it people will need to link data that has meaning. Thing is my understating of a concept differs from others and after the case is closed and latter I need to revisit it. How ACM tools (and other complicated BPMS tools that force people to design data models to execute processes) can truly support ambiguity reduction?

    Rather thank linking to my blog where I reflect on the need of ontology management, I would like to share this: amazing challenge Why the Semantic Web will never work by Jim Hendler www.http://trunc.it/iwcir what we need still to accomplish

    Now how ACM systems are being prepared to deal with concept drift? any toughs? I think they are sending objects to the propeller …

    Like

  13. Pingback: Process Quotes of the week « Adam Deane

  14. Pingback: Adapt: Why Success Always Starts With Failure | Adaptive Case Management

  15. Pingback: Business Process and Adaptive Case Management News and Information » Why Success Always Starts With Failure

  16. Pingback: Understanding Failure of the Process Kind » Process for the Enterprise

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 6,118 other followers

%d bloggers like this: