Accelerate Product Adoption with the Diffusion of Innovations

In my last entry, I talked about how I used “Capability Maturity Models” to better make sense of interview results. In this entry, I’d like to explore how the Diffusion of Innovations helped us think about a direction for product strategy.

Moment.dev asked me to help define a clear product direction: the bases of the tool were in place, but how would we convert that into users? We felt that if we had a clear story about who would use Moment, we’d be more able to help focus the tool. In other words, a clear adoption strategy would shape product design.

There were many directions we could go. One strategy might seem good because it looked exciting to the first customer at a company; a different strategy might seem good because it could lead to very powerful usage patterns in the medium term. To decide how to move forward, we needed to juggle many factors: how did we picture it being adopted at a customer’s site? How quickly did we need users to see results?

We needed a way to keep these tradeoffs straight.

The Diffusion of Innovations

I’ve been a fan of the Diffusion of Innovations – both the book and the theory – since Barry Wellman introduced it to me in graduate school. The USDA had originally funded much of the Diffusion of Innovations line of research in the 1950s, trying to figure out how to help farmers modernize their methods. The book breaks down the attributes of innovations to help understand which ones are likely to be adopted, and the attributes of the people adopting them. You might recognize the book from the language about “early adopters” and “late majority,” which became popular from that research. 

The other part of the book identifies five different attributes of an innovation:

  • Perceived advantage: can the user tell that the innovation works better than the status quo?

  • Complexity: how hard is the innovation to use?

  • Trialability: once I start using the innovation, can I turn back?

  • Compatibility: how much does the innovation require me to change about the way I work?

  • Observability: can other people see that I’m using the innovation?

“Observability” is the term used by Rogers; and is entirely distinct from the sense of Observability used in the distributed-systems monitoring community.

Using the Diffusion of Innovations at Moment.dev

How do we apply this to Moment? Moment.dev has an infrastructure combining text and apps. This means end-users can edit pages, adding interactions to them that can run as code. We saw in the last blog entry that Moment supports gradual automation, where subsequent users can improve documentation into automation. 

Moment for Interactive Runbooks

What if we provided a tool to help make instructions for common internal tasks, such as runbooks for incident reviews or internal instructions for managing tasks. As ops teams dealt with incidents, they might choose to improve the pages that they used often. Teams might even be able to review which pages were being executed often to know what needs improvement and to better locate their internal pain.

We loved the idea that some of the runbooks would gradually improve into live notebooks. For example, if a step was “confirm that the deploy had completed”, that might be improved to a live indicator in the page that would show whether the deploy had in fact completed. If a second step required the user to find a process and restart it, that could be replaced with a button.

  • Perceived advantage is pretty good, once the runbook is established – our interviews showed that some users had some challenges with runbooks becoming obsolete. An automated runbook would attract use, and hopefully be kept up to date.

  • Complexity would be a challenge: users would need to learn the Moment way of creating live examples. 

  • Trialability would also be a challenge: the default state of Moment was just documents, so users wouldn’t see the advantages of Moment quickly.

  • Compatibility would be fairly high: we could build an automatic upgrade that would help users get from their current runbooks to our system. 

  • Observability was very good: users would see the runbook and be able to see any live examples that had been created.

Enhancing Runbooks - Kubernetes Upgrades

Knowing that these are the strengths – and weaknesses – what could we do to ease adoption? Are there any design enhancements we could make to runbooks to help them be more triable and to reduce complexity?

We thought about providing an out-of-the-box solution that would be easy to use from the first click. We want to find a task that we could prepare a runbook for – and that people could use with minimal configuration. Optimally, we’d find a task that:

  • Is annoying to carry out

  • Has some steps that can be automated.

  • Will help scaffold users to explore Moment and learn more about how to create their own automations

The idea is that these three steps would address the weaknesses. Having a packaged solution would help improve the perceived advantage; Trialability would be improved by helping customers get started rapidly; and complexity would be greatly reduced when users only had to do a moderate amount of configuration. 

We began to consider some tasks that might fit well in that space. Upgrading Kubernetes clusters seems to be a candidate: we had a few reports that had talked about what it took to properly test a Kubernetes upgrade. They had reported the process required lots of tracking of progress through steps, including collaboration with other teams, and asking each of the teams to run through a set of testing and acceptance tests.

This begins to sound like a good task for Moment. If we could build a tool that made it reasonable to run a Kubernetes upgrade in Moment, then perhaps this could help with trialibility and complexity. We would have to carefully design the system to be ready to support this set of tasks.

A Design Path Forward

To be clear, we don’t want to build a version of Moment that can only do Kubernetes upgrades. Rather, we want to make sure that this starting scenario works well out of the box, so that users can start with this, and begin to support their other needs, too. We also need to consider whether there are features that would support not just this scenario, but others, too. For example, what features can we add to Moment to best support coordinating large projects like this?

This more specialized tool offers some great opportunities – but it comes with a trap. Users might decide that Moment is only a tool for Kubernetes upgrades. In gaining easier adoption, we might discourage users from being as creative as we want to allow them to be. It will be important to design the next steps to ensure users can grow and develop in the tool.

Harnessing the Power of Frameworks

The Diffusion of Innovations framework proved invaluable in organizing the complex questions facing our product strategy at Moment. It allowed us to compare strategies, understand trade-offs across multiple dimensions, and rapidly identify testable product directions.  This approach can streamline decision-making and accelerate innovation within any organization.

If you're looking to enhance your product development process and make strategic decisions with greater clarity, I'd be delighted to explore how these frameworks could benefit your team. As a consultant, I bring expertise in applying these methodologies to real-world products and can help you unlock your product's full potential.

Let's connect and discuss how I can help your team achieve its goals.

Making Sense of Automation with Maturity Models

Ever spent hours talking to users, only to end up feeling you've got nothing but a bigger pile of information? Raw interview data is a treasure trove, but doesn't give you the roadmap. That's where analysis frameworks come in –  they help you turn those messy stories into actionable insights. 

In this blog entry, I’m going to introduce a less familiar, but highly valuable analytical lens – the Capability Maturity Model (CMM). And so I’ll tell the story of I got to help a modern startup by unearthing government measures of software design from the late ‘80s.

Popular User Research Techniques

Once you've collected your user stories and done a round of coding to coalesce and consolidate your data, what comes next?

I like to think about what classes of information I’ve gotten.

  • Do your users break themselves into groups, based on shared needs and behaviors? A Persona approach can help.

  • Are users hinting at a deeper layer of goals then the surface tasks? Dg deeper to uncover the “Job To Be Done.” 

  • Are you finding a “right way” to use your product – and now you’re evaluating whether they’re getting there? That’s getting close to locating a North Star.

Each of these techniques has its strengths – you pick the one that’s most appropriate for the problem at hand.

The right analysis approach depends on the kind of insights we're seeking. In a recent experience, CMMs provided unique value in assessing users' process maturity. Let's explore how it works!

Background: How a Startup's Automation Needs Inspired This Analysis

Moment Software is building a tool that makes it easy to embed code in documents. The tool targets infrastructure teams that manage internal processes and runbooks, and write internal software for their companies. One theme I repeatedly heard at Moment was that we were building a tool to help create automation in documents.

At first, I didn’t get it: documents feel like the definition of static material, while applications are dynamic. Documents live in document repositories; code is stored in source directories.  

Over time, I realized the two of those aren’t as far apart as I might think. Documentation shows how to carry out a series of steps. Automation packages those steps together.  This became clear during the interviews: if a task were infrequent, someone would document how to do it. As the task became more common, people would write scripts to automate it. In fact, I began to hear some interviewees complain about a whole backlog of “to-be-automated” tasks!

How Documentation Grows into Automation

Here’s an example of how documentation evolves into automation, drawn from my time at Honeycomb. 

We had a documentation page on resetting your development environment, including how to reset your local database. It listed the tables to drop and configuration information.

Periodically, someone would need that reset. They would follow the instructions, and sometimes even improve them. Here’s how it evolved.

  • It started as a set of instructions (“from the devdb database, drop the tables users and queries”)

  • Someone added a code segment for each step (“use devdb; drop table users; drop table queries”)

  • Someone else created a script, which they checked into the scripts repo, and changed the documentation to just saying  “run scripts/droptables.sh

  • Finally, yet another person incorporated it into the main control interface. They moved the script and changed the documentation again (“click the button labeled ‘reset DB’”).

There was no top-level mandate, just many iterations of users making their lives a little easier. Note how the code moved a few times – from documentation (where users would cut and paste it) into the script folder and then to the control interface. 

Capability Maturity Models

I was looking for a way to analyze how this process of documentation turning into automation evolved – and to pin down why people might get stuck along the way.  That's when I remembered an older concept called a "Capability Maturity Model" (CMM). It emerged from 1980s software studies, but it became a powerful tool to analyze processes like this!

Essentially, a CMM describes an organization's skill level in carrying out a process in five maturity levels:

  1. Initial: It's been done at least once (probably with a lot of improvisation)

  2. Repeatable: Someone documented it; others can follow the steps

  3. Defined: Clear procedure, maybe with some basic automation bits

  4. Capable: The process is streamlined, heavily reliant on systems

  5. Efficient: Fully mature, may even run automatically.

This framework gives us a clear vocabulary to describe where an organization has focused its efforts. The droptables script above is a classic illustration of moving from stage 1 to stage 4.

five levels of capability maturity models

We can apply CMMs to lots of processes. A few years ago, Liz Fong-Jones and I talked about how Honeycomb had a “deploy on green” philosophy: a goal that passing tests (and a peer review) should leave a developer confident enough to deploy a code change to production. At Honeycomb, deployment was at maturity level 4: one click would trigger a continuous-integration action and start the entire process flowing. 

In contrast, our rollback process was much less mature – maybe level 2, just documented.   Since our deployments were so reliable, if we needed to change something, we usually fixed forward.  Rollback wasn't a frequent need, so we hadn't invested effort in making it as smooth.

(I’ve learned that some organizations use CMMs as a way to abuse — or discipline — product teams by trying to turn them into a dashboard. That is a very different use, and I’m not sure I agree that’s a valuable use of the model.)

Mapping Maturity to Code Locations

During interviews, I asked people where they'd look to figure out a process.  Here's the pattern I noticed:

  • Level 1: "The Wild West" – think Slack scrollback or quickly edited wikis – stuff that changes fast and gets lost easily.

  • Level 2: Processes get moved to docs repositories or organized knowledge bases (like Notion) for more permanence.

  • Level 3-4: Documentation AND Code Processes now straddle two worlds, their descriptions get more formal, and there's code involved.

  • Level 5: Fully Automated Processes might even run themselves via control panels.

The Problem: This system is chaotic!  Users I interviewed complained about wasting time searching,  sometimes even finding outdated instructions before realizing there's a better way to get the job done. Have you ever had that experience?

This insight is where the CMM analysis started paying off – it showed us a clear path for Moment to make a real difference.

Putting CMMs to Use

Now that we had the CMM framework, we saw the underlying problem clearly: these maturity levels exist, but users get stuck jumping between them!

Here's where Moment makes a huge difference:  by allowing code to be embedded in documentation, we make that evolution less jarring.  Anyone editing a page can automate bit by bit, without big rewrites or needing to switch between totally separate systems.

The Moment approach gives teams flexibility. In Moment, a whole range of processes can coexist.  Simple notes, a partial script that carries out one annoying step, or a fully automated script – they all can work. Users will naturally evolve things that are used frequently can naturally evolve towards higher maturity levels.

The CMM didn't just highlight possibilities; it showed us potential challenges, too. Stage 4 users worry about things like version control and editing history – Moment needs to be able to answer those concerns. And as something hits Stage 5, there needs to be a smooth "graduation" path from Moment into broader automation workflows.

Conclusion

The concept of “Capability Maturity Models” gave us a powerful lens for understanding how documentation transforms into automation. It put the challenges users faced with those transitions into clear focus – a huge help when thinking about both Moment's marketing and design goals! We could communicate how users at different maturity levels would all benefit from our product.

Speaking of those challenges... are you finding it hard to pinpoint where your users get stuck on their workflow journey?   That's where bringing in my expertise makes a massive difference.  Drop me a line, and let's see how insights from users can level up your next project.

And stay tuned for my next blog! I'll dive into a completely different framework that helped  Moment chart its strategic course.

How to Ask the Right Question

Have you ever wrapped up a user interview feeling like you didn't really learn what you needed?  The key to great interviews isn't technique alone – it's asking the right questions. I've spent years refining how I approach user interviews. I’d like to discuss common pitfalls – and what you can do to ensure your interviews get you the information crucial to your decisions.

Before an interview, define your core goal. What critical information do you need? Who will ultimately use those findings?  Interviews meant to aid sales differ vastly from those meant to guide product design. Having a clear purpose leads to asking better questions.

When I did a recent project with Moment Technologies to shape product strategy, we had to revise our initial questions substantially.  Our first questions were likely to lead users to give us misleading answers. Let's break down some common pitfalls:

Overly specific questions

Sometimes, we have a feature or a product direction in mind when we’re starting interviews. Asking about those specific features is not likely to work. Asking hypothetical questions – “would you pay for this feature"? – rarely gives meaningful answers. Users aren't great at predicting their future selves, and you and your interviewee almost definitely have different ideas of what a product with that feature might look like. 

For a better approach, reframe the conversation around pain points. If you know what your product is meant to help with, you can learn how users interact with that problem. For Moment, we started exploring the concept of  "toil" –  DevOps lingo for those annoying manual tasks that aren’t quite worth automating. We learned a lot about our users' daily challenges, which let us start figuring out how to tune our tool for their work.

Confirmation Bias and Leading Questions

Beware the trap of asking what you want to hear – and then hearing what you expect! Our preconceived ideas can seriously derail interviews. Maybe you're hoping for positive feedback on a pet feature,  so you accidentally phrase questions to steer users that way. It's surprisingly easy to slip into without even realizing it. These sneaky biases mess with your results. 

I’ve been happiest with the results of studies where any answer is a surprise. It’s hard to steer users wrong when you genuinely just want to know what they think.

Using Your Own Vocabulary

Watch out for that jargon! In the interviews I carried out with Moment, after stewing in terms like "toil" and "automatable" for a week, we nearly forgot those are our internal lingo. Turns out, some users had totally different definitions, which started to skew our results. We were able to get back on track, but it’s important to look out for this one. Try to understand the world from your users' point of view. What language do they use to describe their day-to-day problems?

Overly personal personas

Persona methods can get us into our users’ heads and can give a rich sense of how your product can fit into their work. Some persona presentations go deep on rich and descriptive persona examples: “Jill lives on a homestead farm with three chickens”. The trap is when teams dive too deep into irrelevant questions. Before you ask Jill about her egg-laying situation, focus on how work fits into her life. Is she at home or office-bound? Does she have set hours, or does her work come at irregular intervals?

Let me share a story from my time at Honeycomb. We wanted to understand our users’ work schedules and when they picked up the product. We did get some stories about people’s personal lives — but mainly because they were explaining how they’d used the product on a plane or while picking up groceries. The real insight of our interviews was that we found two distinct core behaviors: people who used Honeycomb actively during the development process and those who turned to it only after something broke. These were very different mindsets – and they sparked some great conversations about what features could support each work mode. 

A notebook organizes interview notes.  The pages are well-organized, and concepts are well-separated

Wrapping It Up

Interviewing users can be tricky, but the payoff is huge! Everyone loves talking about their work – and you might be surprised at the gold you uncover, leading to products people genuinely love.

Transforming those raw insights into a product roadmap is another skill. Do you need help crafting an impactful, data-driven strategy? That's where I come in! Drop me a line, and let's see how user interviews can supercharge your next project.

Speaking of those Moment interviews – the next step is to organize the raw material I pulled out of those conversations into insights. In the next blog, I'll dive deep into applying analysis methods to turn those interviews into powerful actions!

Growing into Production

In my entry on “Measure, Design, Build,” I talked about the prototyping process: how we get from data, users, and an interesting problem into a workable prototype. What’s the next step?

Optimally, this process becomes a step in organic growth: bringing in the additional skills that we need to make the new feature real, one step at a time.

Growing the SLO project

When Liz Fong-Jones and I created the Honeycomb SLO feature, for example, we kicked it off with three days of intense meetings in a Seattle co-working space. (By the way: if you ever have a chance to get to work with Liz? Absolutely worth it.) We looked at existing user feedback, finding limitations in the trigger product. We sketched possible UIs. We mocked up the core algorithms and equations — first in a spreadsheet, then a Python notebook. Those first steps got us far enough to learn about a lot of unstated assumptions in the algorithms. We were able to use the sheets to figure out how to model prediction, and to figure out what parameters the algorithm needed.

Then we started to build it into a product. We had three steps in mind:

  1. Be able to run dogfood SLOs on Honeycomb data

  2. Be able to hand-hold a few intrepid customers through SLOs

  3. Any customer can create an SLO

The first thing we needed was a back-end engineer, who could start building out the query caching machinery. (This turned out to be rather hard! I wrote about some of our lessons — and how we spent $10,000 in a day.) I had enough enough coding prowess to start to build out the front-end, but anther engineer hopped in to to start wiring us into the production database. Liz put on her infra hat and started figuring out how we’d need to arrange databases and servers to support our expected user loads.

We now had four people on the SLO team. As we crossed the first threshold and had an internally-available tool. We encouraged the front- and back-end engineering teams at Honeycomb to incorporate SLOs into their practice, and they started adding SLO alerting to their on-call monitoring and handoff cycle. Liz stepped away, driving the growth of DevRel at Honeycomb.

The project continued to grow: a product manager to help manage internal usage and prioritize the growing todo lists. Design resources. Another front-end engineer to polish the graphs and get BubbleUp more fully incorporated. Step by step, we moved toward release, and the team grew to fulfill our needs. We brought in a Solutions Architect to help us coordinate Phase 2, hand-holding our first customers.

By now, the design was stable. I was less and less useful, as the engineers discussed query policies, API changes, and challenges incorporating various libraries. I transitioned into writing documentation, working with customers, and started up my next project.’

Growing into product

It feels like I’ve seen two different patterns for staffing projects. When you understand the scope of the project, it makes sense to assign a team at the start — say, “a designer, a PM, two devs” — and build out the feature with the team.

But other projects are harder to shape and scope. This pattern seems like a useful one for projects that still have some risks: we bring in resources on parts that we have either successfully de-risked, or where we need the expertise to build out the next step. When building SLOs, we couldn’t have used a full team at the start: there were too many dependencies to work out, too many pieces to put in place.

I love watching a project grow — and I love when I get a real expert on a topic outside my domain who can pick some early decision and sand off the sharp edges, turning it into smooth, well-operating code. And , always with some sadness, I love that moment where I realize the fledgeling product can now fly without me, and its time to start on the the next phase.

Measure, Design, Try: On Building New Things

My passion is helping users make sense of data - at every stage from ingestion and processing, through analysis, and (especially) exploration and visualization. Often, that entails creating new ways to interact with the data – visualizations that bring out new insights, or query tools that make it easy to ask important questions.

I’ve been reflecting recently on my research process – both of the “user research” and “academic research” varieties. I’ve gotten to experience both: my career started out in academia during my PhD research; then at Microsoft Research (MSR), which operates as a sort of hybrid between academia and industry. I’ve worked since in industry settings. These settings have distinct constraints and objectives. Academic settings have an explicit goal of advancing the state of the art, usually through publishing research papers; in industry, we want to create a product which will benefit sales.

I would suggest, though, that there is more overlap then it may seem: while the business contexts are very different, the real goal is to improve user experience.

But how different are the academic and user research processes? I’ve been interested to realize that the project cycles look fairly similar. A full project is situated in a cycle:

  • Measure what users are doing. Build a model of user needs and goals through qualitative and quantitative data analysis,

  • Design a solution to find approaches that speak to these needs,

  • Try solutions, building the lowest-fidelity prototypes to test them quickly

… and then

  • Measure whether the change worked, again with qualitative and quantitative techniques.

The Measure - Design - Try cycle

I was trained on this cycle – a typical academic design-study paper proposes a problem, proposes a design solution, prototypes it, and then measures whether it succeeded. A typical academic paper documents one round through the cycle, defending each step in terms of the literature.

In industry, the cycle looks similar. We’d start by trying to understand a user need, based on signals from sales, product management, or user observations. It starts as a mystery, following the footprints left in user’s data – “if users really are having trouble with this feature, then we should see a signal that manifests in their data this way”.

Optimally, we’d then be able to reach out to users who were running into the issue, and talk to them – one wonderful opportunity about working on a SaaS is that we were able to follow those users.

We could then identify the underlying user need. In the BubbleUp story, for example, I show that what we thought was a user need for query speed was actually a need for understanding data shapes.

A successful project builds just enough to test out the idea, gets it in front of users, and iterates. There are many forms of carrying out user research to test out a prototype – ranging from interviews, to lab experiments, to deploying placeholder versions behind a feature flag. The goal is to find signal as quickly as possible that an idea is working, or not, as the design change marches toward release (or is mercifully killed.)

Over time, the prototypes begin to look more like real code. Hopefully, the process of putting versions into user hands is beginning to show what work will have to be done to the real product.


While there are lots of difference between the academic and industry context, the commonalities far outweigh the differences. Some of the lessons that I’ve learned from both of these:

  • Choosing an appropriate problem is a blend of intuition, qualitative, and quantitative signals – and must be validated with data. It’s far too easy to pick the low-hanging, most obvious challenge.

  • Finding a solution in the design space is where domain expertise can be invaluable. My background in visualization often means that I can point to other solutions in the space, and can figure out how to adapt them. As the adage goes, “great artists steal” – a solution that someone else has used is likely to be both more reliable and more familiar.

  • It’s critical to protoype with ecologically valid data. It’s far too easy to show how good a design looks with idealized lorem-ipsums and well-behaved numbers – does that model the user data?

  • The best way to get that ecologically valid data is to iterate. Get feedback from users as rapidly as possible – and, if possible, reflecting their own data. Learn, iterate, tweak the prototype, and try again. (Even in academic papers, the section about “we built a prototype to test a hypothesis” usually hides a dozen rounds of iteration and redesign, as ideas that looked great on paper turn out to encounter subtleties when they become real.)

  • When you know you’re going to measure your prototype, you build in instrumentation from the beginning – state up front what you’ll want to measure, and then make sure you leave in hooks to read it back later.

Conclusion

This cycle of measure-design-try-measure is the core of every project I’ve built. Knowing that we’re working in this cycle drives the decision cycle – what questions do we need to solve at each iteration? how prototyped is enough? Reframing the discovery process through this lens helps guide the design process to create new, more exciting tools.

A Honeycomb Story: Five Different Kinds of Bar Charts

This blog entry continues my discussion of Designing With Data. You might enjoy Part I: “Designing With Data.”

I worked for Honeycomb.io as the first Design Researcher; I was brought in to help the company think about how data visualization could support data analytics.

At Honeycomb, I helped create an analysis tool called BubbleUp — there’s a great little video about it here. BubbleUp helps tame high-cardinality and high-dimensionality data by letting a user easily do hundreds of comparisons at a glance. With BubbleUp, its easy to see why some data is acting differently, because the different dimensions really pop out visually.

Today, I want to talk about three behind-the-scenes pieces of the BubbleUp project.

The Core Analysis Loop

When I arrived at Honeycomb, the company was struggling to explain to customers why they should take advantage of high cardinality, high dimensionality data – that is, events with many columns and many possible values. Honeycomb’s support for this data is a key differentiator. Unfortunately, it is hard to make an interface that makes it easy to handle that kind of data – users can feel they’re searching for a needle in a haystack, and often have trouble figuring out what questions will give them useful results.

Indeed, we saw users doing precisely that fishing: they would use the GROUP BY picker on dimension after dimension.

I interviewed several different users to understand this behavior. At heart, the question they were asking was, “Why was this happening?” and found that they were all struggling with the same core analysis loop : they had the implicit hypothesis that something was different between bad events and good events, but weren’t sure what it was.

A graph showing 99th percentile latency for requests to a web service. Note that latency gradually increases to 2 seconds, then drops dramatically. This and following screenshots from Honeycomb.io

This, for example, is a screenshot of Honeycomb’s interface, showing the P99 of latency — that is, the speed of the slowest 1% of requests to a service. This gives a sense for how well the service is performing.

It’s pretty clear that something changed around 1:00 — the service started getting slower. This chart is an aggregation of individual events, each of which represents a request to the service. Each event in the dataset has lots of dimensions associated with it — the IP request of the requester, the operating system of the service, the name of the endpoint requested, and dozens of others — so its reasonable to believe that there is some dimension that is different.

It’s also pretty clear that the user can easily describe what happened: “some of the data is slower.” The user could point to the slow data and say “these ones! I want to know whether the events with latency over 1 second are different from the ones with latency around one second.”

We can look at a different dimension by dividing the data with a GROUP BY.

In the slider below, I’ve picked a few of the dimensions and did a GROUP BY on them. You can see that some dimensions don’t really give good signal, but one of them really does. That dimension would be the key to the puzzle — What would it take to find that more easily?

(By the way — you can try this yourself, at Honeycomb’s sandbox, which comes pre-loaded with data to explore.)

Rapid Prototyping

I love histograms as an overview technique – they show all the different possible values of a dimension. They give a quick overview of what the data looks like, and they are easy to compare to each other. Better yet, for my purposes, they don’t take up much space.

These histograms do, fact, take up space on your bed when you cuddle with them. (Technically, these are “evil distributions,” visualzed as smooth histograms. These are from Nausisca Distributions, but unfortunately not for sale anymore.)

You can compare histograms in a couple of different ways: you can put them side-by-side, or you can overlay them. (The pillows illustrate overlaid histograms, with dotted lines showing different parameterized versions of the distributions.)

I pulled sample data from our dogfood system. The quickest tool I had at hand was to throw it into Google Sheets, and drew histograms one at a time, choosing whatever settings seemed to make something useful, and cutting and pasting them over each other. (I could tweak parameters later. Right now, it was time to just build out a version and see if it worked.) Even with this primitive approach, I could see that some dimensions were visibly different, which meant that maybe we could use this technique to explain why anomalies were happening.

One of the first sketches of comparing distributions in Google Sheets, comparing anomalous (orange) data to baseline (blue) data.

First, it’s pretty clear that there’s some signal here. The “endpoint” dimension shows that all of the orange data has the same endpoint! We’ve instantly located where the problem lies. Similarly, the “mysql_dur” dimension shows that the data of interest is much slower..

We can also see that there are a few tricks that will need to be resolved for this technique to work:

  • On the top-right, we see that the orange dataset has much less data than the blue dataset – perhaps only six values on that dimension. We’d definitely have to solve problems of making sure that the scales compared.

  • The middle-left shows a bigger problem: there’s only one distinct value for orange. It’s squinched up against the left side, which meant that I drew the orange in this sketch as a glow.

The fact that we got real data into even this low-fidelity sketch meant that we could start seeing whether our technique could work.

To learn more about future blockers, I implemented a second version as a Python notebook. That let me mass-produce the visualizations, and forced me to face the realities of the data. Axes turned into a mess. Lots of dimensions turned out to be really boring. Some dimensions had far too many distinct values to care about. Some had too few — or none at all.

An image from a python notebook prototyping BubbleUp.

The Python notebook let me start exploring in earnest. I added a handful of heuristics to the notebook to make them less terrible — hiding axes when there were too many values, dropping insignificant data, and scaling axes to match — and now we had something we could start experimenting with.

My first step was to build a manual testing process. I’d export data from a customer account, import it into the notebook, export the output as a PDF, and email the PDF to a customer. Laborious as it was, it got us feedback quickly.

This experience proved to me that it was our users were, in fact, able to identify anomalous behavior in their data — and that this visualization could pin it down.

Of course, that also meant we found yet more strange ways that our customer’s data behaved.

Handling Edge Cases

The version of BubbleUp Honeycomb ships today is very different from this primitive Python notebook. The early iterations helped us figure out where the challenges were going to be.

We came up with answers for …

  • Times when the two groups were radically different in size

  • Times when one group was sparsely populated, and the other was dense

  • How we order the bars, including which graphs should be alphabetical, numerical, or by value.

  • What we do about null values.

  • How we order the charts relative to each other.

One interesting implication is that BubbleUp today actually has five distinct code paths for drawing its histograms! (I certainly didn’t anticipate that when I put together that Google sheet.)

  • Low-cardinality string data is ordered by descending frequency of the baseline data, and gets an X axis

  • If there are more than 40 strings, we stop drawing the X axis

  • … and if there are more than ~75, we trim off the least common values, so the graph can show the most common of both the baseline and the outliers

  • Low-cardinality numeric data is not drawn proportionately, but is ordered by number — with HTTP error codes, for example, there’s no reason to have a big gap between 304 and 404.

  • High-cardinality, quantitative data is drawn as a continuous histogram, with overlapping (instead of distinct)) bars.

None of this would have been possible without getting data into the visualization rapidly, and iterating repeatedly: there’s simply no alternative to experimenting with user data, interviewing users, and iterating more.

The BubbleUp interface today. It’s gotten a lot better then that first Python notebook, and shows data in a much easier-to-understand way.


Designing With Data

It happened again, just a few weeks ago.

I drew a great sketch in my sketchpad that could be The Visualization. The one that would help my users make sense of their data, simplify complexity, and wipe away all the layers of confusion. 

I could almost hear the angels singing in the background as I coded up the first draft and spun up the system. I was ready to send the triumphant Slack announcement – maybe phrased with a humble “I think I might have made a little progress” but really knowing the answer was Right Here grasp. It looked good on some test data that I made up, but that seemed to confirm I was doing the right thing.

Then I threw actual user data into the visualization and everything collapsed. 

Lines everywhere. Colors that made no sense. Axes that bunched lots of data into one corner and left everything else empty.

It turns out I had some beliefs about how my data would behave, and had made some implicit assumptions about distributions; it turns out that real data acted nothing like it. There was no way my clever idea could show what I hoped – at least, not without some serious re-thinking.


I’m not giving you details on this particular failed attempt because I would have written this a week ago, or a year ago, or five, and it would have been true every time. The clever graph simplification failed because the data wasn’t really as acyclic as it seemed. The clustered barchart that fell apart because some users had thousands of categories. The comparison that tried to compare dozens to millions.

Nor am I alone in making this mistake. A few years ago, a colleague had a clever idea of playing back the history of bug fixes as a game of space invaders: relentless bugs marching downward; heroic developers shooting them down one at a time. Then they looked more carefully at the data: sometimes hundreds of bugs would be wiped away with a single check-in (or erased with a WONTFIX); some bugs would linger for months or years.

Reality, it turns out, is never as well behaved as design wants it to be. This seems to be a particularly prevalent problem with data-driven design.

I’m far from the first to notice this! The paper Data Changes Everything (Walny et al, 2019) points out the mismatches that come when designers imagine what a visualization might look like, but without a good sense of how the actual data will behave. Among other things, the paper suggests building stronger collaborations between designers and data engineers.

I would generalize this, though: in data-intensive projects, your user’s data will behave in entirely unexpected ways. 


I care about this because I create analytics tools for users. I’ve been re-learning this lesson for my entire career. I started off as a researcher at Microsoft, designing data visualizations. There, I designed tools for people to carry out data analyses – and found unexpected data gremlins hiding in almost everything I touched, from map data to user logs data.

Then I went to Honeycomb, a company that builds tools for DevOps. I had a more specific audience, and goals – and I still re-learned this lesson. Our users’ data embodied different assumptions than we had made in designing our system, and we needed to design our visualizations to be robust to their needs.


This is the first in a set of blog posts where I’ll try to tell a couple of different stories about reality and data colliding with previous expectations.

I’ll give you a spoiler, though: my advice is going to be the same – iterate rapidly.

My goal is:

  • Get real data flowing through your protoypes and designs as quickly as possible

  • Use the data to solve real problems as quickly as possible

  • Get real users to ask questions of their data with your system as fast as possible

I’m willing to compromise on a lot to get these steps done. Do we need to write the prototype in a Python script that draws a PDF file? Good enough. Does the output require us to reassemble images into a flipbook? No problem. Whatever it takes to get a sense, as rapidly as possible, about whether our assumptions match the way the data works.
Lets talk.