No Docker; Onionskin Stacks

Union Square in the rain

Readers –

I’ve been thinking all morning about a single short topic that I could use to close out the year with one last issue of Nanoglyph. My mind wandered between a few ideas before settling on the evergreen vitriolic battlefield of microservice versus monolith versus monorepo. But as I was jotting down notes, I could already feel it ballooning to a multi-thousand word essay (of “monolithic proportions”?) that’d never ship on time.

So instead, I’m going to leave you with a short screed on what may be one our controversial internal tech choices at Crunchy (or within its cloud division at least): no Docker, no containers.

OCI containers have in a period of less than a decade become the gold standard for deployment. Originally you’d hear about them only in association with Docker, but they’ve since grown to a wide standard that’s in use by the vast majority of major tech companies. Every major cloud provider provides a service that deploys containers. Cutting edge operational paradigms like Kubernetes and serverless use containers as basic foundational building blocks. Even companies like Heroku that were using their own version of containers before Docker existed, have since retrofitted Docker-style containers into their product. In short, containers are the future, and they’re already everywhere.

Run a quick Google search and you’ll find them lauded with hundreds of operational, organizational, and technical benefits. Some major ones:

And yet, if you were to examine the repos for our backend, our API, or our frontend, you wouldn’t even find a Dockerfile. So what are we even doing over here?

In 029 I talked about the development experience at Stripe. Let’s briefly visit that again.

Developers would start their environment via one simple command: pay up. This would kick off a plethora of activity that among other things would:

Back in the old days, I used to know pretty much exactly how it worked. Big Ruby processes start up very slowly, so it’s not uncommon to start an environment once and fork pristine processes as required, a model established by Zeus. We’d layered on a NIH project called “Hera”, but it worked roughly the same. After getting Ruby and its dependencies up and running it was just a matter of spinning up the constellation of adjacent daemons – Mongo, ElasticSearch, Redis, etc., and bingo.

Our dev productivity team had been pushing remote development for some time, but it had significant downsides – it was slow, and naturally made it impossible to work offline, something that some of us still did back then. A small group of engineers collaborated to maintain an informal Hackpad titled “local development setup” with homegrown instructions on how to get the stack running minus the cloud bootstrap. Along with conveying additional speed and keeping us effective in low connectivity environments, there was another side benefit – every person who’d run through that document had a better understanding of how the stack worked than the other 95% of the engineering contingent.

But as time went by, the stack got deeper. The stack got wider. The stack grew adornments, and it grew thorns. As the flywheel accelerated, trying to keep pace with changes made in the cloud became increasingly untenable, and one by one, those of us who’d been running local were thrown off by centrifugal force, landing on the blessed path of centrally managed development. Eventually, I was running pay up just like everyone else – and just like everyone else, not really understanding the specifics of what was happening within.

And for a company of that size, this might’ve been the right answer. Engineers run a command, a whole bunch of magic occurs behind the scene, and from there they have a mostly functional development environment. This is very similar to the model put forward by containers – run something like docker compose up, and in one command you’ve got your whole platoon of services up and running just like that. It’s fast, and anyone can do it.

But you know what they say about things that sound too good to be true. The model is largely functional, but comes with bad along with the good.

A problem is that thanks to the near perfect opacity, the majority of users don’t understand how anything works, and lose the ability to diagnose problems and any hope of divining their way to a solution. In the case of pay up, the underlying infrastructure was so complex that the only remediation for 95%+ of the org when encountering the problem was to report it to someone else and get them to fix it. Not only does this mean that problems now eat at least two peoples’ time (and usually more), but it’s also a negative feedback loop: problem appears, problem is reported, debugging skills atrophy, problem appears, problem is reported, …

An opaque stack also means that significant complication can be hidden below the surface thanks to the sophisticated facade. This often includes complication that by all rights shouldn’t exist – akin to cleaning your room by shoving everything under the bed instead of being forced to address each item head on.

Back to Crunchy: an alternative to the ease-of-use of a single Docker command to do setup is to keep your stack so thin that you can see through it. But also strong and lightweight – an onionskin.

Here are our README instructions for bootstrapping and running the API’s test suite:

$ psql < sql/raise_databases.sql
$ migrate -source file://./migrations -database $TEST_DATABASE_URL up
$ go test ./...

That’s it – three commands.

Granted, it depends on a few external prerequisites (Postgres, Go, direnv, and migrate), but all common software that most engineers at the company have already, which is easy to install in case they don’t, and none of which needs to be upgraded very often.

Go helps a lot in keeping things this simple – if this is the first time go test is being run, the command will automatically detect that dependencies need to be installed and go fetch them. Also, practically every Go dependency is written in Go, so installing those dependencies works with almost 100% reliability.

Go is good, but that said, our Ruby app (the database state machine) isn’t too far off:

$ asdf install
$ gem install bundler
$ bundle install
$ ALLOW_DB_LOCAL_SETUP=true bundle exec rake db:localsetup
$ bundle exec rspec

It needs Postgres and asdf to fetch Ruby, but not much else.

Not visible in these command sets are the improvements to ease-of-set-up that have trickled in to many modern stacks over the years. Circa 2013 you would have wanted Docker to compose your Ruby environment because there was so many steps to get to a successful installation. Nowadays, between improvements in version managers, package managers, and more streamlined dependency sets (e.g. jettisoning pain-in-the-rear dependencies like Nokogiri that never quite compile right), it’s much more plausible to be running a thin Ruby stack with no orchestration involved.

So why are we avoiding containers? Am I filibustering to gloss over what can only be explained by an elaborate rationalization for Neo-Luddism? Well, that’s not how we’d put it at least. In a nutshell:

A key element in making sure this works is keeping our stacks aggressively thin. A notable omission from both of the above is Redis, a component so common these days that it’s probably found in the majority of production stacks around the world. (And no Kafka either!) I like Redis a lot, and we may yet bring it or other elements in eventually, but are living on a just-Postgres model for as long as possible.

None of this means that we’re excluding the possibility of using containers either. For now we still deploy on Heroku via git push (maybe our second most controversial tech decision), but if we migrated somewhere else, it’s likely we’d write some thin Dockerfile shims because as stated above, OCI is more or less the de facto standard of cloud deployment, and isn’t going anywhere.

So there you have it. These days, every engineer and their dog/cat/cockatiel will preach the virtues of minimizing dependencies and keeping things simple, but few actually do it. Maybe we don’t either, but we’re giving it our best shot.

A few links from around the web:

Happy New Year – see you in 2022.