Lowbeer's Tipstaff; Cool Tools

A lizard in San Bruno park

Readers –

I recently enjoyed Why William Gibson Is a Literary Genius from The Walrus.

While I can’t purport to be totally crazy about Gibson’s books, his cultural significance is unquestionable, having coined the term cyberspace and more or less singled-handedly launched the entire genre of cyberpunk. You also have to appreciate just how out novel so much of his work is – The Peripheral revolves around a future version of Earth networking to a past version of Earth by way of a never-explained server mechanism in China. The entirely of Zero History is the story of its characters searching for the fashion designer behind a “secret brand” of clothing called Gabriel Hounds.

One of the most distinguishing characteristics of Gibson’s prose is how challenging it is. He doesn’t spell anything out, instead relying on the reader to understand what happened in specific scenes by way of inference, and to extrapolate the overall narrative by way of clues planted along the way. I liked The Peripheral, but I was just re-reading my own review of it, and remembered that I’d felt that there was a little too much inference required:

Between invented terminology not explained until hundreds of pages later (the record by my casual read being “homes”, defined on page 350), dense prose, and dialog that seems to suggest that the characters are deliberately trying to confuse each other, this book is difficult to read. The first 100 pages are practically incomprehensible, although it gets easier. […]

As an example, the article’s author describes a scene in The Peripheral where Ainsley Lowbeer, an inspector in 22nd century London has a shape-shifting weapon called a “tipstaff” which can summon a drone strike. None of this is said explicitly (the following a quote from the book):

[…] morph again, becoming a baroque, long-barreled gilt pistol, with fluted ivory grips, which Lowbeer lifted, aimed, and fired. There was an explosion, painfully loud, but from somewhere across the lower level, the pistol having made no sound at all. Then a ringing silence, in which could be heard an apparent rain of small objects, striking walls and flagstones. Someone began to scream.

“Bloody hell,” said Lowbeer, her tone one of concerned surprise, the pistol having become the tipstaff again.

The article makes the case of how Gibson manages to say so much in so few words:

The succession of double-take-inducing details is exquisitely managed. (It’s as if someone called in an airstrike on a rotary phone.) Gibson doesn’t explain how the tipstaff works or why it assumes the look of a “baroque” pistol; alert readers will get that tipstaffs are the products of nanotech and nostalgia, of advanced societies that have aestheticized how they do harm. A lesser writer, of course, would’ve insisted that the pistol do the firing, but Lowbeer’s ornamental weapon disgorges nothing, not a peep. The explosion is elsewhere, and Gibson is mindful that explosions have epilogues, the follow-up sound of raining objects “striking walls and flagstones. Someone began to scream.”

Like the title claims, genius.

Welcome to Nanoglyph, a newsletter about cyberpunk and data warehousing. This week: how Stripe taught its employees to use SQL, and why this is a very, very good idea, along with in the same theme as Lowbeer’s tipstaff, a laundry list of cool internal tools for inspiration.

If you’re wondering why I write so much about an organization which I no longer work for, the answer involves the fallibility of memory – I want to get words to paper before it all fades into a foggy haze (some of which is already happening). Don’t worry, I won’t subject you to too much more of this – my plan right now is a very complimentary issue this week, a less-complimentary one next week, and then into new territory.

Perhaps one of the more interesting systems we had in place at Stripe was an incredibly comprehensive data warehouse. The company prides itself in data-driven decision-making and design, and the warehouse was the backbone that made that philosophy possible.

Unlike what you might’ve seen in the past with more traditional businesses, the warehouse was with Stripe since quite early on – it was well-concreted by the time I started in 2015. Likely the major impetus for the early adoption is how the company was powered by Mongo. With a relational database you can get away for a long time using it in hybrid form to power both transaction-processing and analytics, but with Mongo, you really have no plausible way of querying data beyond the most primitive operations – you can ask your cluster to execute some custom JavaScript, but nothing like SQL exists.

Nightly ETL jobs would dredge tracts of Mongo pages, transform them into a more agnostic format, and bulk load them into a warehouse so that we could get at the information more easily. Originally, this was a lot of extra infrastructure for the dubious privilege of getting to use Mongo as a datastore, but it did come with the benefit of establishing a warehouse and a well-worn path of getting data into that warehouse early on, and that to even more adapters until it became the one stop shop for everything as all kinds of data sources were connected – Kafka queues, JIRA, GitHub Enterprise, Workday, etc. – if you could think of it, it was probably in there.

Some examples of things that a person in my position might ask the warehouse about:

Originally, this all operated on what was perhaps the world’s most tortured Redshift cluster. It was impressive what this thing could do considering the sheer volume of data and load that we were throwing at it, but using it was rocky – queries would run for minutes at a time if there was any other load in the cluster, and often just time out completely. An “observatory” tab was added to our custom UI so that users could go in and kill other peoples’ stuck queries so that theirs might succeed.

Eventually, an alternative implementation on top of Presto was introduced, and after a period of hybrid support for both systems, we eventually transitioned entirely to that as it proved to be far more reliable and friendly for wide simultaneous use.

A Stripe UI gave its internal users an interface not dissimilar from Heroku Data Clips. We could put an SQL query into an editor in the browser, and have it dispatched to the cluster. After results came back, it could be tabulated and plotted, then annotated with a title and description (so you could find it again later), shared via link, or forked for refinement. The interface was very raw early on, but eventually given a makeover and some Stripe-style spazaz.

It was a solid set up, but I’d hazard a guess that many major tech companies have something similar, especially with the glut of analytical products that are on the market these days. But something those other companies don’t have, and what Stripe got really right wasn’t technical, it was organizational.

At most companies (and this applies even to tech companies in the valley), data is a human service as much as a technical one. People in sales, marketing, and sometimes even engineering hand off analytical requests to a team which then figures out how to get a result. They’ll also be using data warehousing tools, but with a team of specialists running them. This works fine, and has the advantage that most of the company never has to get its hands dirty, but comes with all the obvious downsides – slow turnarounds, resourcing problems as data teams can only take on so much, and information simply being used less – when it’s painful to get and you can’t do it yourself, after a while you’re only going to be sending over the most important requests.

What Stripe got right: making analytics a self-serve process. Every employee could access the data warehouse and run their own SQL, and while it was expected that people would collaborate or ask for help while doing so, it was not expected for them to dump the work on someone else.

SQL is a high bar, even for engineers, and especially for non-technical people. The basic case of pulling data from a table is pretty easy, but once you get into CTEs, inner versus left versus cross joins, complex aggregates, window functions, etc., it can warp the mind. Positing that non-technical people should be able to learn it is an aggressive position, but incredibly, it worked. I’d attribute a lot of that success to the power of the example – although many people wouldn’t be up for writing their own complex, deeply nested query from scratch, they were up for using someone else’s as prior art, and manipulating that to get the desired outcome.

This “teach a man to fish” philosophy made a huge difference. Being able to do something for yourself versus asking someone else to do it is a night and day difference in terms of whether it’s likely to happen, and the common case on the ground was that everyone was running data all the time. Every company in Silicon Valley claims to be data driven, but Stripe really is a data-driven organization if there is such a thing.

(I’ll caveat by saying some of what I’ve said here got less true as the organization got bigger. Access to parts of the warehouse were request-only or restricted depending on the sensitivity of the data. We also did achieve final retrograde form with hands off middle management who would delegate rather than do.)

A ridgeline in San Bruno park

Stripe’s data warehouse is great, but was only one amongst many powerful internal tools. Here’s a few examples of others for inspirational purposes:

I should caveat that Stripe has a lot of uncool tools too – at one point we went full Atlassian for reasons that I still don’t understand (I swear that in aggregate the use of JIRA in the country’s biggest enterprises must be reducing GDP by a full single digit), but the company has a lot of great stuff. It was very good at periodically observing pain points that were costing productivity, and tasking people to reduce them, thereby producing more compound leverage across the whole organization.

Speaking of cool tools, Postgres is a cool tool. Last week I published two articles on it:

With Apple’s annual iPhone announcement event having come and gone already (“California Streaming” – give whoever came up with that one a raise – perfect), I was just reflecting on how far tooling in the form of hardware’s come in just a decade or so.

The first laptop I ever bought myself was Apple’s plastic MacBook in my last year of university. It was okay. Good for the time, but weird issues around the plastic discolouring and peeling back, and although you’d get a few classes worth of power out of it, you didn’t leave home without a charging brick. I would’ve been using a Motorola KRZR at the time, with which I’d occasionally text people at three words per minute.

Fast forward to today, the iPhone 13 pushes the 12’a already-tremendous battery life another 2.5 hours, and is a few orders of magnitude more powerful than my old MacBook. I’m still using a five-year old iPad Pro, but it’s still perfect, with almost edge-to-edge screen and charged only once a week. Laptops were the final frontier in decent on-the-go battery, but Apple’s M1 finally cracked that one too – usable all day as long as you remember to plug the computer in overnight.

The world is more messed up than ever, and the very fabric of our civilization might be coming apart, but g’damn do we ever have some cool tools.

Until next week.

1 We’d often offer tacit support for old language versions considered deprecated by core because so many users have trouble upgrading in a timely manner.