Thought provoking post from DDH on the broad failure of system tests, defined in this context as web UI tests, driven by a headless browser.
A good way to test UIs is a problem that people have been trying to solve since the moment I stepped out of university and into a software engineering job. Back then, despite the evasiveness of a good answer to date, I assumed that someone would eventually figure it out. Now, almost twenty years later and an even halfway good answer still elusive, I’m not so optimistic.
Our latest round of test strategy uses Playwright, which describes itself as “reliable end-to-end testing for modern web apps”. I haven’t found it particularly so:
git clone && make test
as you can get.All in all – and I’m trying my best to make sure that I’m not exaggerating – the false positive rate on failures is something like 99%. I actually don’t recall ever seeing a true positive in the sense that a test case caught something I broke by accident.
DHH’s prescription seems extreme at first glance:
HEY today has some 300-odd system tests. We’re going through a grand review to cut that number way down. The sunk cost fallacy has kept us running this brittle, cumbersome suite for too long.
But for a test suite that slows development and only prevents a regression once in a blue moon, isn’t it the only rational answer?