What I learned about personas and context from designing a life-critical system.
Everything we design affects people, but designing life-critical software requires a higher level of attention to human factors. I learned a lot from redesigning a better workflow for nuclear reactor emergencies.
LESSON: GOOD DESIGN IS ABOUT REAL PEOPLE
For most of us, the phrase “nuclear reactor safety” conjures images of dangerous radiation and the threat of Chernobyl-like meltdowns. Human error is the #1 source of traffic accidents, nuclear meltdowns and airplane crashes. It’s not that humans suck. It’s simply that most things are designed for how people are supposed to act instead of how they actually act. While designing for real human behavior is important in e-commerce and dating apps, it’s critical in nuclear safety.
APPARENT PROBLEM: PREVENT NUCLEAR MELTDOWN
I wasn’t sure how I felt about nuclear power. As part of my work at the government ADAPT group (Advanced Decision Aids and Productivity Tools), I was asked to dream up a design to help reactor operators prevent a meltdown. As the project progressed, I learned that most reactors were well engineered, almost never had operational problems and that nuclear power was the natural, efficient source that keeps the sun shining. However, it was also super dangerous and subject to human error.
We’d like to think that the tech is so secure and automated that our fate is not in the hands of a Homer Simpson running a massive control room. It isn’t that simple. At that time, nuclear operators were smart, well-trained individuals. Almost 100% of the time, operators monitored and made small adjustments on control panels. Emergencies were rare. During an emergency, doing the right thing quickly is essential, but determining the right thing was so complex that it required a set of policies and procedures in an emergency operations manual (EOP) over 300 pages long. These guys knew NORMAL operations inside out, but emergencies were a different thing. The rarity of nuclear emergencies meant that operators suddenly had to do things quickly that didn’t come up 99.5% of the time.
Normal operations were very complex. Even emergencies required a 300 page manual.
APPROACH: REFRAME THE PROBLEM
This is how I came to reframe the problem. Good engineering and exhaustive training can’t stop human error. Designers need to account for real human behavior to mitigate it. In PHASE 1, we assembled an on-screen simulation of the reactor that was updated with live feeds from sensors. The idea was that it would be simpler to understand what was going on by looking at a visual representation of the reactor than all the gauges, lights and dials on the control panel. They could even try adjustments on the simulated reactor before doing it on the real thing. This was a leading edge use of the relatively new graphical user interface. To do this, I learned to program the Apple Macintosh with colleagues from NASA, JPL, MIT and the current CTO of the USA. PHASE 1 was impressive, but ultimately not helpful. Like the space program, every component in nuclear reactors has to be MIL-SPEC, as in 20 years old tech so that every failure mode is known, no surprises. Therefore, our new displays couldn’t replace the complex control panels, it just added two MORE screens for overwhelmed operators to look at. Each screen was more complex than shown below:
REAL PROBLEM: HELP NUCLEAR OPERATORS IN A CRISIS
Studying the people and their context brought the problem into focus. In an emergency, you don’t have time to read a 300 page book. In an emergency, people make silly mistakes by acting too quickly, especially when emergencies ALMOST NEVER HAPPEN. Paramedics are good at handling emergencies because that’s their daily routine. My redefined task was to design “something” that helped reactor operators do the right thing in a crisis.
NEW APPROACH: DESIGN BACKWARDS
Flowcharting the end result was very helpful in Phase 2. For every possible nuclear emergency, the correct procedure to follow was somewhere in that 300 page emergency manual. The problem was how to help operators calmly and correctly proceed through the process of finding out what to do and doing it. In reading the manual and talking to domain experts, I learned that each page of the manual linked to other pages to eventually find the right procedure. That’s when I found HyperCard, Bill Atkinson’s brilliant HTML-like invention that existed before the Web. I started recreating the long-winded emergency manual as a series of hyperlinked cards containing pictures and text. Each card asked the operator to check certain values on their panel and click a link based on what they found. Early tests looked promising, but I couldn’t get enough on the small black and white cards, so I upgraded to the more robust SuperCard and a larger screen.
SOLUTION: STREAMLINE THE FLOW
To fit 300 pages of stuff onto tiny B&W cards, I stripped out the content down to a) what to check b) a picture of the gauges you’d be checking and c) a series of choices based on what they saw. This was my first exposure to what we now call content strategy and information architecture. Since I had much bigger cards in SuperCard, I was tempted to put all the detail back in from the manual. Fortunately, my understanding of the operator persona and paying attention to early testing stopped me. In a crisis, they didn’t need explanations of what everything meant. Extra content was patronizing to expert reactor operators who knew how things worked and just needed to know what to do. When my testers tried the cards for a pretend emergency, we found that too much detail on a page increased the time it took them to make a decision AND decreased their ability to choose the right path. Every picture and bit of text became a tactical decision. I started using concise phrases instead of complete sentences, shorter action links and zoomed in pictures. If it didn’t help them decide the next step, I axed it.
After many wrong-headed moves, we pulled out a win. My design ended up being a simple interface into a deep network of knowledge. At each stage, the operator was asked to make observations and click a link based on what they saw. In most cases, the correct procedure could be located in 5 steps without rushing. It was nice when my boss liked my work, but I was a bit shocked when the International Atomic Energy Agency saw a presentation and endorsed it for real reactors.
LESSON 1 LEARNED:
Before designing, we must understand users and context.
In this case, everything designed and built before modelling the persona of the operator and the context of the problem was pointless. This may seem obvious in retrospect, but it wasn’t at the time. Even now, I sometimes have to argue for user research. I wasted time attempting to design before studying the most important tasks for the most important users.
LESSON 2 LEARNED:
To design something useful, strip the problem down to what’s essential.
My biggest takeaway was how simple and obvious the solution became when I reduced the problem to “what needs to be on this screen to help someone decide what to do next“. Most of the text in the emergency manual was a distracting repetition of facts or non-actionable platitudes. Operators didn’t need to read what they already knew. They just needed to skim some clear options to zero in on the right corrective procedure.
LESSON 3 LEARNED:
Flow is the most important aspect of design.
No printed manual could achieve what our system did. The hypertext linked cards allowed operators to FLOW through the steps of diagnosing and solving problems in a natural, quick and friction-less way. It was also a concrete lesson in the power of SGML/HTML and the importance of flow design.
Working on extreme problems was a great start to my career. I valued being able to work at a secure government facility involved with space shuttles and particle physics, but it was remote, isolated and very cold. When I left, I had learned a lot about power generation. Oil, coal, solar and even hydro harm the environment. Nuclear is the most efficient, but short term thinking led the world to choose cheaper, toxic reactors over cleaner breeder reactors that can burn nuclear waste. I don’t know if nuclear is a good or bad thing, but I believe that long term thinking is required for sustainable choices.
Everything is Apollo 13 today.
Working on extreme problems at a secure government facility involved with space shuttles and particle physics was a great start to my career. It’s funny when clients tell me “It’s not like we’re putting people on the moon“, as justification for scrimping on research, analysis and usability testing, as if it won’t matter. The only time it doesn’t matter is when you don’t care about making money. Every time a customer struggles to use your site or app, your business at risk. It’s 6 times more expensive to attract a new customer than to keep an existing one. A 5% increase in customer retention can increase profits by 125% (Bain & Co.) When users deeply sigh in defeat, or loudly curse at your site “why is this thing making it so hard?”, they aren’t going to call you up and tell you, they’ll tell friends and co-workers that they just switched to a competitor. Every product faces intense competition for users’ dollars and slim attention spans. Unless users are forced to use your stuff, you’re a few bad experiences away from being chucked. In this day and age, everything is Apollo 13. Devote resources to proper user-centered design or expect customers to leave for a competitor who does.