Visualization Genres: Exploration

In the last entry, I introduced three genres of visualization —presentation, exploration, and monitoring. I’d like to dive into these in a little more detail.

The first genre I want to talk about is data exploration. Exploration – formally, “Exploratory Data Analysis” –  pares visualization down to its barest elements: it asks nothing but how to deliver an insight rapidly. 

An analyst starts an exploration with almost no idea of the outcome. They might start with a concrete question – “how are sales doing in our new store?” – but will rapidly begin to explore possibilities, trying to unpack reasons and correlates and related factors. Are sales up?  What’s changed? Why did this month look better than last, and can we replicate it? They will ask dozens of questions, trying to learn what is going on in the data and why, trying to build a story. 

A Python Notebook showing a pairplot — the combinations of dimensions — to rapidly explore distributions and relations.

Some explorations may even start with a blank slate, knowing little about the data. The first steps in an exploration might be to evaluate the quality and meaning of the data – checking what the dimensions of the data are, how the values are distributed, and how different dimensions relate to each other.  An exploration might require merging data from multiple sources, cleaning stray values, and looking at multiple histograms, scatterplots, and different groupings of the data.

Explorations are usually led by data analysts, or subject-matter experts. Their tools are data-oriented, including R, Jupyter notebooks, Tableau or PowerBI. Tools for data exploration prioritize rapid iteration – it should be easy to change axes, swap out dimensions, or change groupings; and it should happen quickly, with just a few keystrokes or a click of the mouse. 

To make sure they can keep diving into the data, the analyst carrying out an exploration almost always runs it against a static excerpt of data, ensuring that their discovery is easy to replicate – and ensuring that their data excerpt can be computed quickly. Once they’ve found an insight, an analyst will often then run it against a different time period or a different sample of their data to validate that their insight is robust.


It’s worth talking in some detail about data exploration because people find it … underwhelming. At Microsoft, I often had colleagues come to my office with a dataset that they wanted to explore. What new insights would they find?

Somewhat to their disappointment, I didn’t turn to flying dots or animated graphs or 3D visualization — despite the fact that I was working on those very technologies. Instead, I’d start with a few histograms, getting to know the distribution.

Now, later, when we were presenting the results — but that’s next week’s posting.


There are a few sub-genres of exploration that are worth noting. The description above suggests data exploration of the frontier: when you could be faced with almost anything. Often, though, that’s not quite true.

Some tools support context-specific data exploration. An MRI machine display allows an analyst to scan through layers of sophisticated imagery, but does not allow generalized display. System monitoring tools, like Datadog or Google Analytics, allow users to ask questions tuned specifically to the DevOps domain. In each of these domains, there are some interesting changes:

  • The tool might offer domain-specific visualizations. A system monitoring tool might offer trace visualizations or flame graphs; a geographic tool might natively offer maps; the MRI display makes sense of underlying data by visualizing and partially analyzing the results.

  • The tool might offer opinionated computations: while “95th percentile” might not be a one-liner in general math libraries, it is fundamental to the DevOps workload.

Overall, context-specific exploratory visualizations use additional domain information to help analysts ask more-restrictive questions, in exchange for rich, domain-specific tools.. I’m going to talk more about tools that sit between genres at the end of this series.

The genre of exploration overall privileges rough-and-ready computation over precise details; it looks to come up with insights rapidly; and is largely uninterested in the aesthetic of the visualization.

In my next entry, I’ll talk about the next genre: data presentation — a very different domain.

Previous
Previous

Visualization Genres: Presentation

Next
Next

Understanding Visualization Genres