It's Time to Disrupt Visual Regression Testing

October 11, 2024

I've been using visual regression tools off and on since 2017 when building Design Systems. Seven years later in 2024, things haven't changed a whole lot. While the current offerings are better than they were in 2017, there's a lingering cross-operating system problem, which is where the paid offerings come in. Paying for visual regression testing can be difficult in different organizations, so it's tempting to look at open source solutions. It turns out the major open source options have drawbacks as well.

I think it's time the visual regression testing space was disrupted. I understand visual regression testing is an incredibly complex problem to solve, but I believe there's a clever solution somewhere on the horizon.

Update October 15, 2024: A few days after posting this, Haley wrote some GitHub Actions to implement our idea. It's a proof of concept and could use some tuning, but hey, progress!

Update April 12, 2025: I ended up writing about another solution we adopted on our open source Web Component Design System with links to GitHub Actions you can reference!

What is Visual Regression Testing?

You're likely already writing unit and integration tests, maybe even with 100% code coverage, for your Design System components. How are you verifying your components visually as changes are made? This is where visual regression testing comes in.

It takes screenshots of your components with each pull request so you can be notified when something changes to a component's presentation. Did you update a CSS variable that now changes the Button's border to be red? Was that intentional? You'll get notified on your pull request and can make that decision. It's another automation tool that helps build better Design Systems.

As Design System authors, we care how things visually appear to the user. By testing only the functionality in our unit and integration tests, you're only covering one area of your work — the other area is the presentational piece. Does this component match the designs? Did you inadvertently break another component's visual presentation by making a change to shared styles?

Unit and integration testing tools don't allow for these types of assertions — they can only exercise the component. Without verifying how a component is presented, you're left with manual visual regression testing. And often times, visual bugs slip through the cracks — we are all human. We make mistakes! Why not let the robots help us cut down on those mistakes as much as possible?

What options are currently available in this space? I've listed some of the major players. There are quite a few more out there, but I didn't want this article to become massive. Let's dive in from a high level.

Chromatic

Chromatic is a great product. It is tightly integrated with Storybook. A personal drawback to me was being reliant on Storybook's play function to interact with a component, which sometimes leads to extra stories being written only for tests. The play function becomes massive if you rely on interactions to test different visual states.

The good news is that Chromatic now plays nicely with Playwright and allows you to do targeted snapshots. This looks pretty promising! It's especially appealing if you're already using Playwright.

It integrates well with CI. For the Standard plan, you get to test across all major browsers. Their UI looks really nice. The Enterprise option includes single sign on. There's a lot of great stuff here. Overall, Chromatic is a very solid choice, but has the drawback of being a paid service.

Percy

Percy has been around for quite some time and was bought by BrowserStack in 2020 (congrats 🎉!). I was using it back in 2017 with Toyota's Loom Design System. It has all of the advantages of Chromatic listed above. The thing I liked about it was how it met you where your tests were at.

You import their percySnapshot function. You call that function within your existing tests. You run their percy exec command to run your tests, but then that's it. It's very nice. Percy is also another solid choice, but like Chromatic, costs money.

Playwright

Playwright's component testing is still experimental as of the time of this writing. Due to that, some organizations may not choose it for writing unit and integration tests, but it is great for end-to-end tests. There's no denying that!

Playwright offers visual comparisons out of the box. On the surface, this sounds great! But it has a major issue that is very well documented in GitHub issues and discussions ( 1, 2, 3 ).

Playwright and other tools have an issue with headless Chrome and an OS dependency. As an example, if you're developing on a Mac, you check in your visual regression baseline images for Chrome. Then CI runs in Linux, comparing against those baseline images taken in MacOS, and guess what? Your tests fail. Every time. What's the recommended solution?

"Your CI environment is different from local one, it's probably a linux box while your local machine is a mac. The rendering is very sensitive to the host environment (and even to whether your mac has power adapter plugged in or not), it may be that on the CI machine the page uses different fonts too. To make the visual tests reproducible we recommend generating expectations on the same machine where you run the tests. There is a also a bunch of options that allow make toHaveScreenshot less strict, you can toy with those."

— Playwright GitHub Issue

I'll chat more about this in a moment, but man, this was a bummer. At first, I was so impressed with Playwright's visual testing. Then I pushed things to CI and it all fell apart. I feel like I'm knocking on Playwright a bit too much here, but I really do appreciate everyone who works on the project and think it's really great solution! I wrote Selenium + WebDriverIO tests for many years back in 2012 and I really wish we had Playwright back then. But there's no denying the visual testing area could use some improvement, either via documentation or technical improvements.

So What's the Problem, Tony?

To me, the problem with Percy and Chromatic is that it's from a third-party. Tinfoil-hat-thinking maybe, but these services take screenshots of your products, so there could be some hesitancy in working on a top-secret-soon-to-be-announced-feature, screenshotting it, and uploading it to a third-party service. Some organizations may, understandably, feel uneasy about that.

The other issue with these offerings are that even though I believe the pricing is very reasonable, it can sometimes be a no-go for approval within an organization. The manual labor involved to stop visual regressions is very costly. Engineering time, QA time, not catching a bug and letting your customer see it is embarrassing, and the list goes on. But quantifying this cost can be difficult when suggesting the org pays for a service. It's tough.

These are the two bottlenecks I've hit since 2017 when discussing visual regression testing. It's less of a technical problem and more of an organizational approval roadblock. Once again, totally understandable! Every organization operates a bit differently as far as approvals for this type of stuff goes as well.

How about Playwright? It's not uploading things to a third-party and all of the images are stored in the repository you're already working in. Well, as mentioned above, it ain't perfect either. You have to make some tradeoff decisions when it comes to developer experience.

Requiring the use of docker, as recommended in the GitHub comment above, to get around the browser/OS issue is a bummer. Docker adds complexity, especially for less experienced engineers. It can be a pain to use. You also need to ensure your local and CI images are in sync with one another.

For some, using docker may be a completely acceptable solution. Maybe you're already paying for it. That's totally fine! My issue with the docker approach is that we keep adding more technology tooling to solve a single testing problem. We keep throwing different technology onto the pile, adding new tools and complexities. It'd be nice to not have to do that in an ideal world. It'd be great if we could continue to use the existing tooling we are already familiar with.

Back to Playwright's OS issue. You could switch your CI over to MacOS boxes (macos-latest), or whatever OS everyone on your team uses, but that's not ideal for folks contributing that may not have access to that OS. For example, if your repository is a public library and your team develops solely on Mac, then a Linux user comes along, you'll hit issues when they go to check in their snapshot screenshot. There are ways around this, of course, but it's a process.

Another solution I've heard is to adjust Playwright's maxDiffPixels. Adjusting thresholds, in my opinion, is a no-go for Design Systems. We need to know when a 1-pixel border color changes. We can't adjust our fidelity bar, as it's extremely important for our work. If you're working outside of a Design System and at an app-level, adjusting the threshold may work just fine for you! But once again, in Design System land, I personally feel this is unacceptable.

Am I nitpicking? Maybe. But this has been my experience over the last 7 years.

Disruption

New JavaScript frameworks pop up frequently. I'd love to see some disruption in the visual regression testing space. Maybe something out there already exists and I've missed it? Definitely let me know!

My ideal solution would be an open source (of course!), free offering that is completely self-managed. One that does not require a subscription or any payment. Although I know I'd throw some cash at an open source project if one came along!

A solution where images are not uploaded to a third-party. This completely removes the middle man situation outlined above where your top secret features are now being uploaded to some service. If you want to push your images or test reports somewhere, go for it, but you should be in control of where things will live.

A solution that doesn't require docker, or any other heavy dependencies. Adding the overhead of docker or any other containerization option isn't ideal.

A solution that meets users where their tests are at - if you've already got a bunch of tests in vitest, great. If you're using jest, awesome. Forcing folks to migrate to Playwright or Storybook testing is a big lift and an instant non-starter for many. Being test-framework agnostic would be great.

A solution that can run across all major browsers. Even across the major OSes if you'd like.

I have an idea I've been cooking up along with Haley Ward that is purely GitHub Actions based. By pushing the work solely to CI, you get around the cross-OS issues mentioned above. The thought is that you can use whatever testing framework and visual testing tool you want — Playwright, jest-image-snapshot, web-test-runner, whatever. As long as it generates some sort of report.

That report can then be optionally deployed somewhere via a GitHub Action or you can pull down the GitHub Asset yourself to view the diff. The images can be checked in to source control, or not, it's up to you. It'll rely on GitHub commments and reactions to approve or deny the visual changes. It'll block pull requests until the appropriate amount of visual approvals occur.

We'll see where we land. Maybe the above gif is me. It could be way more complex than we think and we could miss our mark. Oh well, at least we're trying! Stay tuned!

To wrap things up, visual regression testing plays a crucial role, particularly for Design Systems. It ensures visual bugs are caught early, before they reach users. Although visual regression tools can sometimes be avoided due to their cost and complexity, I'm hopeful that this field will evolve and become more accessible over time. As always, thanks for reading. See you soon!