The testing pyramid and why you shouldn’t use it

2 Oct 2023 · 7 min read

Yep you heard it here first. I got beef with the testing pyramid and I’m going to tell you what’s wrong with it. Also what you should be doing instead. But first to outline the discussion what is it that we are trying to achieve by writing tests? Hopefully we’ve all heard you should write tests (and if you haven’t YOU SHOULD WRITE TESTS), but this is often stated as a dogma and not really explained. So let’s start from a simpler time before I was ranting about testing strategies and consider the question:

Why did the first person to start writing tests do it?

Their boss was yelling at them for breaking the product again, and they’d now been back and forth fixing issues QA found for weeks. So they decided to write code that verified their other code worked. Fast forward a little bit while that guy tried some stuff and improved his testing practices and we see a few different things accomplished by his efforts:

Less manual testing required
Enable continuous deployment
Speed up development lifecycle
Catch regression early
Document behavioral decisions

Types of Tests

So those are our goals how best do we go about achieving them? Let’s start by seeing how different types of tests help accomplish those goals. Notably I’m only going to consider user facing software as library development comes with it’s own set of concerns that don’t match all of this exactly. For simplicity I’m going to split them into 3 categories.

Unit tests which attempt to test a specific piece of code and make heavy use of mocking for dependencies in order to achieve isolation.
Integration tests which attempt to test an arbitrary grouping of components with the goal to minimize required mocking, but do not attempt to act as a user.
End-to-end (E2E) tests which attempt to impersonate a user as they use your system in different ways.

How does each type of test help us accomplish our objectives?

Unit Tests

First let’s consider unit tests. Because unit tests are code driven tests and not based on any particular user action they don’t really save us manual QA or enable continuous deployment. We still need a manual testing cycle to ensure that the components are hooked up together correctly no matter how many unit tests we write.

We also can’t usually use them to document decisions about the systems behavior because individual components rarely contain the entirety of the logic for a particular feature.

The jury is a little more split on speeding up the development lifecycle and catching regressions early. When refactoring within an individual component they often accomplish both of those goals quite well, but if your refactor moves logic from one component to another it can be a productivity drag to also move and refactor the tests as well.

Integration Tests

Integration tests can be a hit or miss depending on how good a job you do of identifying your arbitrary groups. Since these are still code based tests and not generally based on user flows we aren’t going to replace manual QA, but if your groupings are well designed you can mitigate certain scenarios.

Consider the example of an API endpoint when tested all together. This would allow you to test for many odd edge cases around filtering and such, but wouldn’t let you verify that filtering on the frontend works. So it helps a little with manual QA, but not a ton.

Same story for continuous deployment. These also tend to speed up the development lifecycle because you can easily check if you broke anything by running them and they tend to be fast enough to run on a local developers machine without issue. Same story for catching regressions early, and documenting behavioral decisions.

E2E tests

That leaves us with E2E tests. Since E2E tests simulate actual user actions they can reasonably replace a manual QA scenario and it’s not hard to imagine that with enough of them we wouldn’t feel the need to manually QA our changes. They also enable continuous deployment, but with a bit of a downside that almost all E2E frameworks introduce flakiness into your tests. So they do enable continuous deployment on success, but they require constant vigilance against flakiness.

They generally speaking don’t speed up development because typically (although not always) the setup is a little too complex and/or slow to run easily on a developer machine. They do a fantastic job of catching regressions and documenting behavior because the scenarios are evocative of actual bugs as reported by customers.

E2E tests are also a great way to test and monitor for regressions of your APIs performance. Check out our Understanding API Performance Metrics blog post by liblab engineer Olufemi Thompson to learn more.

How do different test types help us?

So for those keeping track here’s how each type of test stacks up with our 5 goals:

Test Type	Unit	Integration	E2E
Save Manual Testing	No	Sometimes	Always
Enable continuous deployment	No	Sometimes	Always
Speed up development	Sometimes	Yes	No
Catch regressions	Sometimes	Yes	Always
Document behavioral decisions	Rarely	Sometimes	Always

So based on this we can come to some conclusions about when to write each kind of test. Unit tests should be reserved for when the unit they are testing is isolated and complicated enough that it speeds up development. Integration tests should be our general default style of testing because they can actually help with all of our goals. E2E tests should be used for replacing manual QA steps and documenting odd behaviors as they generally do everything we want, but at the cost of slowing down development.

So our ideal test suite will consist of some unit tests for especially complicated pieces, E2E tests for specific scenarios we would have otherwise manually QA’d, and everything else would be tested with integration tests.

Down with the test pyramid!

Now this leads us to my beef with the Testing pyramid. For those unaware the testing pyramid says that we should have mostly Unit tests with a smaller set of integration tests and an even smaller set of e2e tests (like a pyramid 🙂). This means that most of your tests only accomplish at most 2 of our 5 goals. This is a ton of wasted effort and leads to people writing tests like this:

// code under test
import db from './db';
const getUsers = (req, res) => {
  res.send(await db.users.findAll());
};

//test file
describe("getUsers", () => {
  it("stubs everything into oblivion", () => {
    usersStub = stub();
    stub(db, 'users').returns(usersStub);
    getUsers(stub(), stub());
    expect(usersStub.findAll).to.be.calledOnce;
  })
})

Hopefully the problems with testing like this immediately jump out at you. This test accomplishes 0 of our 5 goals. It does not prevent someone from having to manual QA this. It doesn’t make refactoring this code in the future any easier because any actual change to the code would absolutely require this test to be rewritten. It just rewrites the code, but using stubs instead of code.

In conclusion, unit tests should be used sparingly and only when the unit in question is important enough to have its own isolated logic. E2E tests should be used to help your poor QA team keep up with all the code you are pushing out. Everything else should be integration tested. And then once you write those brilliantly integration tested APIs you can completely skip writing both code and tests for your sdk’s by asking liblab to generate them for you!

Why did the first person to start writing tests do it?​

Types of Tests​

Unit Tests​

Integration Tests​

E2E tests​

How do different test types help us?​

Down with the test pyramid!​