Designing With Data

It happened again, just a few weeks ago.

I drew a great sketch in my sketchpad that could be The Visualization. The one that would help my users make sense of their data, simplify complexity, and wipe away all the layers of confusion. 

I could almost hear the angels singing in the background as I coded up the first draft and spun up the system. I was ready to send the triumphant Slack announcement – maybe phrased with a humble “I think I might have made a little progress” but really knowing the answer was Right Here grasp. It looked good on some test data that I made up, but that seemed to confirm I was doing the right thing.

Then I threw actual user data into the visualization and everything collapsed. 

Lines everywhere. Colors that made no sense. Axes that bunched lots of data into one corner and left everything else empty.

It turns out I had some beliefs about how my data would behave, and had made some implicit assumptions about distributions; it turns out that real data acted nothing like it. There was no way my clever idea could show what I hoped – at least, not without some serious re-thinking.


I’m not giving you details on this particular failed attempt because I would have written this a week ago, or a year ago, or five, and it would have been true every time. The clever graph simplification failed because the data wasn’t really as acyclic as it seemed. The clustered barchart that fell apart because some users had thousands of categories. The comparison that tried to compare dozens to millions.

Nor am I alone in making this mistake. A few years ago, a colleague had a clever idea of playing back the history of bug fixes as a game of space invaders: relentless bugs marching downward; heroic developers shooting them down one at a time. Then they looked more carefully at the data: sometimes hundreds of bugs would be wiped away with a single check-in (or erased with a WONTFIX); some bugs would linger for months or years.

Reality, it turns out, is never as well behaved as design wants it to be. This seems to be a particularly prevalent problem with data-driven design.

I’m far from the first to notice this! The paper Data Changes Everything (Walny et al, 2019) points out the mismatches that come when designers imagine what a visualization might look like, but without a good sense of how the actual data will behave. Among other things, the paper suggests building stronger collaborations between designers and data engineers.

I would generalize this, though: in data-intensive projects, your user’s data will behave in entirely unexpected ways. 


I care about this because I create analytics tools for users. I’ve been re-learning this lesson for my entire career. I started off as a researcher at Microsoft, designing data visualizations. There, I designed tools for people to carry out data analyses – and found unexpected data gremlins hiding in almost everything I touched, from map data to user logs data.

Then I went to Honeycomb, a company that builds tools for DevOps. I had a more specific audience, and goals – and I still re-learned this lesson. Our users’ data embodied different assumptions than we had made in designing our system, and we needed to design our visualizations to be robust to their needs.


This is the first in a set of blog posts where I’ll try to tell a couple of different stories about reality and data colliding with previous expectations.

I’ll give you a spoiler, though: my advice is going to be the same – iterate rapidly.

My goal is:

  • Get real data flowing through your protoypes and designs as quickly as possible

  • Use the data to solve real problems as quickly as possible

  • Get real users to ask questions of their data with your system as fast as possible

I’m willing to compromise on a lot to get these steps done. Do we need to write the prototype in a Python script that draws a PDF file? Good enough. Does the output require us to reassemble images into a flipbook? No problem. Whatever it takes to get a sense, as rapidly as possible, about whether our assumptions match the way the data works.
Lets talk.