Stories

Why 100% Test Coverage is Not Enough

As developers, running a large test suite and seeing every single test pass is often a little moment of joy. It is also—if you're not careful—one of its most reliable traps. High unit test coverage is a good thing, and you’ll often see codebases that have test coverage attached to build and release pipelines. However, it is not the same thing as a well-tested system. Oftentimes, people learn this the hard way. What follows are some thoughts on the most common ways a test suite can feel thorough, while leaving meaningful gaps in the open. At Menlo we practice Test-Driven Development, which helps address some of these potential pitfalls upfront, but regardless of your unit-testing strategy, they are important things to consider.

The first trap is partial mocking. When you mock a dependency in a unit test, you are not testing how your code interacts with that dependency, you are testing how your code interacts with your assumption of how that dependency works. Consider a method that calls a database layer and processes the result. You mock the database call to avoid unnecessary and repeated transactions to a real database server, and have it return a clean, well-formed object. Assertions pass, and the test is green. What the test cannot tell you is whether the real data access layer still returns an object that looks anything like your mock. If you were mocking a database call, perhaps the schema has changed since the last time this code was modified. Maybe you were mocking the return of a library call, and since updating versions of that library the return object is no longer the same. Your mock, frozen in time, reflects the world as it was when you wrote it. Meanwhile, reality has moved on without telling your test suite. For some interesting reading on this subject, here is a link to a relevant medium article.

The second trap is more subtle, and in some ways more insidious: testing the code rather than the behavior. A unit test should assert that a piece of software does what it is supposed to do for the people or systems that depend on it. Too often, tests end up asserting that a piece of software does what it currently does, which is a very different thing. These tests become tightly coupled to implementation details: the name of a private method that is called the order of internal operations, the specific way a loop is structured. When a developer refactors that code in the future, the tests break. And not because anything went wrong, but because the tests were shadowing the implementation rather than testing the behavior. One way to avoid this problem is to write the test before you write any production code—that way the only thing to think about when writing the test is the behavior.

The third trap is the lack of integration tests, which often expose failures that might have been caught in a test like the first one mentioned that utilized partial mocking. Individual units can be perfectly correct in isolation and still fail catastrophically when they are asked to work together. An integration test validates the handoff between components and layers of your code, or the contracts. In a hypothetical example: two services, each with their own passing suite, exchange a timestamp. One formats it as a Unix epoch; the other expects ISO 8601. Every unit test in both codebases passes. The integration fails silently if not covered by an integration test, and you may find out the hard way when a user reports that their data is wrong or missing.

Taken together, these pitfalls are easy to fall into precisely because the feedback they give you feels real. A green test suite means that the changes you have made are sound, and haven’t distributed functionalities in other areas of the code. The problem is that unit tests on their own can only tell you so much. They are not designed to catch the gaps between services, the drift between mocks and reality, or the edge cases nobody thought to write down. Applying practical and simple integration tests, a more skeptical eye toward mocks, and a few tests that deliberately try to break things go a long way to increase your true test coverage.