Tags:
It took me too long to finally read Dr. Atul Gawande's Checklist Manifesto, a surprisingly interesting story of how asking surgeons and nurses to follow a simple list of checks, adding just a little bit of red tape, can save lives and save lots of money. It starts with the pioneering work done by Dr. Peter Provonost at Johns Hopkins, where they used checklists to catch stupid mistakes in essential ICU practices, proving that adding some trivial structure could make a real difference in life-critical situations. The book walks through how checklists are relied on by pilots, and how experts at Boeing and other companies continuously perfect these checklists to handle any problem in the air. The rest of the book focuses on the work that Dr. Gawande led at the World Health Organization to develop and test a simple, 3-section, 19-step, 2-minute checklist for safe surgery to be used in hospitals around the world.
A few points especially stood out for me:
First, that so many problems and complications at hospitals are not caused by doctors and nurses not knowing what to do — but because they forget to do what they already know how to do, because they forget trivial basic details, or "lull themselves into skipping steps even when they remember them". And that this applies to complex problems in many other disciplines.
Second, that following a simple set of basic checks, just writing them down and making sure that people double-check that they are doing what they know they are supposed to be doing, can make such a dramatic difference in the quality of complex work, work that is so dependent on expert skill.
It made we think more about how checklists can help in building and maintaining software. If they can find ways to make checklists work in surgery, then we can sure as hell find ways to make them work in software development.
There are already places where checklists are used or can be used in software development. Steve McConnell's Code Complete is full of comprehensive checklists that programmers can follow to help make sure that they are doing a professional job at every step in building software. These checklists act as reminders of best practices, a health check on how a developer should do their job — but there's too much here to use on a day-to-day basis. In his other book Software Estimation: Demystifying the Black Art he also recommends using checklists for creating estimates, and offers a short checklist to help guide developers and managers through choosing an estimation approach.
One place where checklists are used more often is in code reviews, to make sure that reviewers remember to check for what the team (or management, or auditors) have agreed is important. There are a lot of arguments over what should be included in a code review checklist, what's important to check and what's not, and how many checks are too many.
In the "Modern Code Review" chapter of Making Software, Jason Cohen reinforces Dr Gawande's finding that long checklists are a bad idea. If review checklists are too long, programmers will just ignore them. He recommends that code review checklists should be no more than 10 items at the most. What surprised me is that according to Cohen, the most effective checklists are even shorter, with only 2 or 3 items to check: micro-checklists that focus in on the mistakes that the team commonly makes, or the mistakes that have cost them the most. Then, once people on the team stop making these mistakes, or if more important problems are found, you come up with a new checklist.
Cohen says that most code review checklists contain obvious items that are a waste of time like "Does the code accomplish what it is meant to do?" and "Can you understand the code as written?" and so on. Most items on long checklists are unnecessary (of course the reviewer is going to check if the code works and that they can understand it) or fuss about coding style and conventions, which can be handled through static analysis checkers. Checklists should only include common mistakes that cause real problems.
He also recommends that rather than relying on general-purpose checklists, programmers should build their own personal code review checklists, taking an idea from the SEI Personal Software Process (PSP). Since different people make different mistakes, each programmer should come up with their own short checklist of the most serious mistakes that they make on a consistent basis, especially the things that they find that they commonly forget to do, and share this list with the people reviewing their code.
There are still places for longer code review checklists, for areas where reviewers need more hand holding and guidance. Like checking for security vulnerabilities in code. OWASP provides a simple and useful secure coding practices quick reference guide which can be used to build a checklist for secure code reviews. This is work that programmers don't do every day, so you're less concerned about being efficient than you are about making sure that the reviewer covers all of the important bases. You need to make sure that security code reviews are comprehensive and disciplined, and you may need to provide evidence of this, making a tool like Agnitio interesting. But even in a detailed secure code review, you want to make sure that every check is clearly needed, clearly understood and essentially important.
Checklists help you check for what isn't there?
In 11 Best Practices for Peer Code Review, Jason Cohen hilights that checklists are important in code reviews because they help remind reviewers to look beyond what is in the code for what isn't there:
"Checklists are especially important for reviewers, since if the author forgot it, the reviewer is likely to miss it as well".
But can we take this one step farther (or one step back) in software development? Can we go back to fundamental work where programmers commonly make important mistakes, and come up with a simple checklist that will stop people from making these mistakes in the first place? Like the ICU central intravenous line problem that Dr. Provonost started with — a basic but important practice where adding simple checks can save big.
The first problem that comes to my mind is data validation, the cause of roughly half of all software security problems according to Michael Howard at Microsoft. Data validation, like a lot of problems in software security, is one of the worst kinds of problems. It's fundamentally, conceptually simple: everyone understands (or they should) that you need to validate and filter input data, and escape output data. It's one of the basic issues covered in AppSec training from SANS, in books on secure development. But it's like surgery: you have to take all of it seriously, and you need to do it right, every little thing, every time. There are a lot of finicky details to get right, a lot of places to go wrong, and you can't afford to make any mistakes.
This is the kind of work that cries out for a checklist, a clear, concrete set of steps that programmers can follow. Or like pilots, a set of different checklists that programmers can follow in different situations. So that we build software right in the first place. It works in other industries. Why can't it work in ours?