Disrupting Visual Regression Testing

April 13, 2025

A few months ago I wrote how visual regression testing is ripe for some disruption. It's clunky, the paid offerings are very expensive, and there are cross-OS issues. On top of that, not a lot of folks have documented ideas on how to solve these issues without using something like Docker.

At the end of that article, I floated an idea Haley Ward and I had cooked up. A few days later she wrote some GitHub Actions to prove it out. I'm happy to report back that it's generally working as expected.

But I'm also here to write about another solution. One that my team adopted over in Glide Core, the Web Component Design System from CrowdStrike. All thanks to clintcs for working through all of this and implementing it, it's really nice!

The Problem

As a quick recap, the main issue around Playwright's visual regression testing is cross-OS compatibility. For example, you're developing on a Mac, CI is in Linux. When you check in your images and let CI run, there will be a pixel diff leading to failing tests—even if there were no visual changes.

You could adjust the pixel diff threshold to account for this, but this could mean you're missing out on real changes you actually want to be notified about. When it comes to Design System work in particular, this is crucial. Feel free to read my other article for more information if you're curious.

The Solution

Similar to Haley's solution, CI has to be the source of truth. You can still run the Playwright tests locally, but don't check in any of the images it generates. We push everything over to CI.

This process requires access to a storage mechanism for the images Playwright generates. AWS, Cloudflare, or any other provider will do. We use Cloudflare for open source projects, so I'll be using/saying R2 from here on out.

We also deploy Playwright's test report and add it as a comment on the PR. This helps with our review process. A reviewer of a PR will review the code, view the changes in Storybook, and review the visual test report.

An image of a comment on a GitHub Pull Request displaying a warning sign and a URL to review a test report

From a high-level, here is how we solved the visual testing problems described in the other article:

Store baseline images from Playwright in R2.
On PR creation, fetch the baseline images before running your tests.
Run your visual tests.
- Because your baseline was established from R2, the PR opened will compare against the baseline on main.
- We leverage sharding to speed up the process.
Deploy the test report (optional).
Approving the PR is approving the visual updates that come with that PR.
- Rather than having an explicit "I approve of these visual updates" on top of approving the PR itself, Clint had the great thought that approving the PR means approving everything in that PR - the code, any visual updates, all of it. This greatly simplifies things.
On PR merge to main, update the baseline images in R2 to reflect these changes.

We added a merge queue to prevent issues with multiple PRs going in at the same time. Without a merge queue, you could clobber the baseline images by mixing up the order of when PRs merge. A merge queue ensures only one PR is merged into main at a time. This allows for the baseline images to be fully updated before moving on to the next main merge.

If you'd rather read the code, check out our workflows here. The power of open source!

Hopefully this article helps save others time and effort when they want to add visual regression testing to their project.