2019-11-25 -- Testing priorities
Software testing philosophy is softball subject for blog posts, and after a month of fighting with a NAS build I’m in the mood for an easy write-up.
This is an adaptation of a work thread where I reflected a bit on testing practices, why we test things, and mostly importantly which things we as professional developers should emphasize and put in the extra effort for.
Fuckhueg disclaimer: The following all assumes that the software being worked on is not library code, being produced to serve as a fundamental reusable component for other folks. It similarly assumes the software isn’t being done in a regulatory regime (say, United States DoD or FDA) where testing values and procedures are spelled out. Either of those domains likely requires a different mindset about testing.
When do we test?
We test our software to verify…
- …that a business workflow is supported properly (acceptance testing)
- …that modules/routines conform to a behavior contract (integration testing)
- …that tricky things are implemented properly (unit testing)
- …that some subset of the system meets a resource constraint (performance testing)
- …that some subset of the system stands up under adverse conditions (reliability testing)
Why do we test?
In short, we do testing to convince ourselves and our business friends that our software does what we say it does.
Note that convincing ourselves this is quite, quite different from proving it to ourselves. If we actually cared about correctness, we’d do something like:
- do a bunch of up-front systems analysis work to rigorously define the problem space and business needs
- use methods like TLA+ or Z3 to verify all states an modes of our software
- write the software in something with excessive capability for error-checking and logical verification (Haskell, for example)
- spin the software off into ASICs so that it couldn’t change in any meaningful way and thus violate our modeling
- use statistical process control and related reliability engineering methods to put a strong confidence interval on our system under normal and exceptional circumstances
This would guarantee the creation of artifacts that would extremely well-characterized, reliable to some arbitrary degree, inflexible to business demands, exceeding slow to produce, and expensive to commission. In short, “a museum piece from the very start”.
In most cases, the above is too much. We would be better served by just convincing instead of proving, and that means we can maybe pick our battles a bit more—we just need to figure out which things are convincing enough and we can focus on them more.
What do we test?
If we rank the aforementioned types of tests by what our business friends care about, it looks something like:
- Acceptance testing (since eventually customers get angry if the software can’t do what they ask it to under any circumstances)
- [some gap]
- Reliability testing (customers that start using software to remove a pain get upset if the software doesn’t reliably remove the pain)
- Performance testing (if SaaS, business folks want to cut costs which means fewer servers and cheaper deployments; if not SaaS, customers will make the same call for their personal machines)
- [huuuuuge gap]
- Integration testing (business people don’t care about how many stages are in the sausage factory—customers never ask)
- Unit testing (business people really don’t care about the bolts holding the conveyor belts in the sausage factory together)
Let’s look at how we developers probably rank the same things:
- Unit testing (since it’s straightforward, usually, and the functions are like right there)
- [minor gap]
- Performance testing (since it’s usually not too hard to run a function or module through its paces while wrapped in a timing block)
- Reliability testing (it’s always kinda fun to try and feed our code things it doesn’t expect)
- [moderately-sized gap]
- Integration testing (it’s a pain in the ass putting all the modules together and getting services to communicate)
- [moderately-sized gap]
- Acceptance testing (ugh checklists are boring to write and clicking stuff isn’t fun at all and if we change anything we’re gonna have to redo all this work and it takes so long to run)
The only thing both developers and business people agree on, it seems, is that integration testing sucks rocks to do.
(Notice, incidentally, that in neither case did we talk about what is actually effective testing—merely what the two groups care about more. Software defect rate is predicated more on size of system and lines of code than on any particular testing focus, I believe.)
So, with two very different views of how testing should be done, what should we as professionals do?
What to test
I submit that, from what I’ve seen in my career and elsewhere, defect-free reliable performant software that doesn’t meet the business requirements does not matter to the business. Any testing beyond the bare minimum required to convince the business folks that the software is ready (because, ultimately, they pay us and not vice versa) is internal overhead and should be streamlined where possible.
Given the above ranking, the first thing that should be tested (using manual testing if needed) is acceptance of a software implementation of a workflow customers pay money for. If that isn’t being tested and verified, every other form of testing is just academic.
I’ll stake out an even stronger claim, to make sure that the window is properly ajar:
Any testing code written before complete verification of all customer-facing revenue-generating processes is professionally negligent.
What does that mean, exactly? What would that look like?
The process I’ve seen (and used) is something like:
- Identify all the customer interactions and workflows with the software.
- Tag all workflows as “essential” that involve the transfer of money to your employer.
- Tag all workflows as “non-essential” that do not culminate in the transfer of money to your employer.
- Tag all workflows as “dangerous” that could result in the wrong amount of money leaving your employer.
- Write a manual QA runbook that brings a tester (human, not software!) through all of the “essential” and “dangerous” workflows with GO/NO GO standards.
- Prior to each public release of software, run through this runbook. After deployment to a live environment (if applicable) do the same thing.
- Note how colossal a pain in the ass step 6 is, so start writing acceptance tests that pretend to be a user—you may find that the runbook for step 6 will provide excellent pseudo-code. Switch over to using this as soon as reasonable.
- Ask the business if you need to test the “non-essential” workflows. They’ll probably say that can wait.
Once that’s done, we have a basic sanity check that the business will be appeased. Then—arguably, only then—we can test everything else, in decreasing order of importance to ourselves.
Why don’t we (industry) test this way?
Good question. I blame it on the confluence of a few factors, many of which are under our control and several of which are not:
- Writing good, detailed runbooks for running acceptance testing is boring and hard. They are also require constant updating to stay relevant and serve their purpose.
- Unit testing can be done without ever leaving the warm hobbit-hole of our codebases. It’s super easy to quickly throw a doctest on to an Elixir time function but it’s super hard to get a businessperson to sit still and haggle out the finer points of how a customer changes their billing information.
- Large projects with moving parts and distributed systems are annoying to setup and hard to test. If your company has fallen for the containers and/or microservices memes without careful preparation, it can be absurdly tricky to produce meaningful performance data and reliability data by simulating things like network partitions, transient service outages, and so forth. It is so tricky that there is at least one person making good money verifying small pieces of such systems.
- Testing/QA folks have basically no glamour in modern workflows. In many places, QA work is foisted off on to product managers or shoved—much as with devops replacing sysadmins—into the responsibilities of developers. In the games industry, QA is horrible.
- Testing is presented as expensive and hard to business folks by developers. The case for a basic sanity check of the stuff customers pay money for is not often made, instead ballooning out and becoming a boogeyman to be viewed with the same suspicion as a developer talking about things the business understands as sandbagging like “refactoring” or “specification”.
- Business folks change their minds and workflows in order to maximize the flux through the money funnel. Testing may finish just in time to be changed wildly by some new business experiment.
- Businesses as they grow usually add features instead of subtracting them. Feature accretion monotonically increases testing difficulty, possibly hyper-linearly if the features interact.
There are probably dozens more reasons, but those are enough to mull over for now and suggest a conclusion that our industry gravitates towards the rabbit’s foot of unit testing and test coverage, because it’s easier than changing our practices to allow a proper culture of quality engineering. How can we fix these factors? That’s a whole different blog post.
You may be shaking your head at this point. “This doesn’t sound like the good test-driven development practices I’ve heard Uncle Bob talk about.”
And you’re right! I think that this advice flies in the face of everything I’d come to accept about how Professional Software People Writing Professional Software Should Professionally Test.
But, here’s the key thing there: professional. It is unprofessional to get paid to waste your employer’s money by solving problems they don’t have without giving them the opportunity to intervene. It is unprofessional to deprioritize the core business needs that generate revenue in favor of some aesthetic of quality.
tl,dr: Be professional and test the needs—the real, cash-generating needs—of the business before working on anything else.