Consider this a retrospective of my (failed) usage of snapshots. But let’s back up a bit….
What even are snapshots and how did they come to be so prevalent?
The rise of snapshots
Not every language is born equal when it comes to test tooling. On one hand you have languages like PHP that have been dominated by one single testing framework for close to 20 years now. Everybody’s agreed PHPUnit is the tool to use and while there are alternatives, it is the de-facto standard.
So when Jest released snapshot testing in 2016, they did so to an already very large user-base. And more importantly, they did so to a user-base that wasn’t entirely composed of people knowledgeable about testing but also to people who were very novice about it. With the way they presented snapshot testing at the time, it seemed like the holy grail we didn’t know we had been waiting for. Suddenly, you could test your components in a way that didn’t really require you to actually test it — you could just drop it in, call matchesSnapshot, and call it a day. What an incredible advance; testing is now super easy right?
The snapshot fallacy
Of course when Facebook released snapshot testing and explained they were using it extensively, they probably didn’t consider that most people (lacking the knowledge) would take it and run away with it, completely missing the point of what it was designed to be used for.
Snapshot tests, be they visual or text, were first designed as a quality assurance tool. They were a way to easily and quickly check that, when working on a given feature, you wouldn’t break things in obvious ways. They work by taking a snapshot of what a page/component looks like (or renders like) at a given moment and then checks future runs against that snapshot to warn you if things still match or not.
But they were never really meant to replace tests; they were meant to complement them and catch the most obvious mistakes in a quick and clear manner. There can be so much that goes wrong past the initial render that thinking of snapshots as the only thing you need is of course incorrect. Snapshots will catch bugs, obviously, but they will also let so much pass that they can never be entirely relied upon. Now here comes the problem: a lot of people, me included, were fully aware of that fact going in. Yet when I look at how I’ve used snapshots over the years, I can’t help but notice I made that very same mistake a lot of the time.
When it becomes so easy to test any component in one simple line, it becomes easy to technically “test” everything and it creates the false assumption that one snapshot test is better than no tests: because you’ve ensured the component isn’t going to break in a major way, every other kind of test suddenly loses its priority. It creates a false safety net with which you can say “I’ll add more complete tests later. We have the snapshot tests in the meantime.”
The reality of course is much more stark than that because not only are snapshot tests not the silver bullet people make it out to be, but it’s also very easy to misuse these tests themselves. When Jest unveiled snapshot testing they clearly said that it was something you had to pay attention to. Snapshots weren’t a generate and forget kind of thing. The main goal was when reviewing a pull request the developer should actively review the changed snapshots and see for themselves if what had changed was correct or not. But because snapshots are all or nothing, they also very, very often change in ways that are absolutely normal. Maybe you’ve changed a div to a section in your layout, and suddenly your entire snapshot suite fails so you just mass-update them and call it a day.
Once that has happened a good hundred times, you slowly lose the habit of actually checking your snapshots. You see that they’ve changed, just update them, and job done. Which, of course, defeats their entire purpose. Suddenly your safety net has holes so big you’re not sure it still even does anything, and inevitably, something much more nefarious than a div change passes through the armor of your test suite.
When I reached that point, I felt slightly betrayed by textual snapshots (even though they weren’t at fault). The rational reaction to that should have been “I need more actual tests instead of just snapshots.” We already had a decent amount of behavioural tests (using React Testing Library), but we were just not using nearly as many of them as we should have.
My reaction instead was to double down on my own mistake. I naively thought: “If textual snapshots failed me, maybe visual snapshots are the answer?” It made sense at the time. If the reason I didn’t pay attention to text snapshots was because of invisible changes, then taking a screenshot of the actual page would mean snapshots would fail on actual visible problems. So I started looking into the subject. At the time we had an extensive Storybook of all our components so I decided to simply use the Puppeteer plugin to take a screenshot of every Storybook page and use that as visual snapshots.
But there’s a reason textual snapshots took the spotlight off visual snapshots: the latter are much, much harder to pull off than I anticipated. The reality is because you’re comparing what pages or components look like once rendered, you’re at the mercy of whichever rendering engine takes the snapshot, what OS it runs on, and so on. This means when we started generating the screenshots locally on OS X and comparing them to the ones the CI would generate (on Linux), they started failing. Because the way Linux renders fonts was very slightly different and that 1% difference had huge ramifications in the way the layout overflows and the way text wraps.
Because the solutions to that required an even bigger involvement timewise and I was already too deep to back down, we decided to band-aid it by simply upping the failure threshold of snapshots. You can probably see where this is going — we had to up the tolerance so much to get past rendering differences of the OS that, in the end, the snapshots became once again ineffective for the task at hand. In the end, small important changes would fly below the failure threshold and big unimportant changes (changing the navigation bar for example) would fail the whole suite.
We also encountered some other difficulties setting this up. Amongst other things we had to make sure the screenshots were taken after asynchronous actions and loaders because we were now in a real DOM instead of a shim of one.
Eventually the problem I mentioned happened (bugs passing through snapshots). It hit me like a ton of bricks. I had no “Option C”, no further plan. The only thing I was left with was a bag of my own bad decisions. I started talking a bit about the issue with colleagues but soon realized the issue lay neither in textual nor visual snapshots but in the way we had been writing tests.
Snapshots are awesome. They may not always be easy to pull off. They may not solve all your problems. But they do solve a lot of problems and I now consider them an incredibly useful part of a healthy test suite. Not a required one, but a welcome one. The emphasis is on healthy test suite. Don’t start with your snapshots. Don’t even necessarily add them for every component. Instead, whenever you have to assert that large sets of data stay “put”, whenever you want a visual gallery of all the pages in your application, whenever you think to yourself “I wish I could ensure this particular piece of logic/UI didn’t change” then absolutely do use snapshots. They’re here for that and they’re pretty good at their job by now.
Always be certain that the weight of making sure your logic stays consistent doesn’t rest on snapshots. We have other tools for that that are so much better (acceptance tests for example). And yes they do require more work, but ultimately they will save your ass in a million ways that snapshots never can.
All this to say: don’t be like me. Don’t use a screwdriver to hammer in a nail.
Use a hammer.