I was recently asked to put together a ‘data visualization best practices’ document for my analytics colleagues. It was a great experience, helping me to formalize a lot of lived experience. Plus, it was a lot of fun! I’m sharing that document here in the hopes that this will help at least one person think about data in a new way.
Caveat #1: I recognize that this isn’t new information and that there are countless books out there on this topic (plus extensive conversation in the data community on Twitter and otherwise). This is my take, based on years of worked experience and standing on the shoulders of giants.
Caveat #2: I’m not great at ‘pop vizzes’ / infographics – this is very much focused on business-oriented viz techniques.
First and foremost: KEEP IT SIMPLE
- A simple, elegant, minimalist graph is easy to read and interpret, freeing up your mind to think about what it actually means.
- A cluttered, overcrowded graph with too much information actually makes it harder to understand what is being communicated.
- “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” – Antoine de Saint-Exupery
Watch out for Chart Junk
- Just like Lean has a “value-added ratio” for work / effort, data visualization has a value-added ratio for pixels / ink. Effectively, how much of your graph is telling your user something valuable, and how much is distracting them?
- “Chart Junk” refers to anything that does not communicate relevant information, or that distracts users. Avoid / minimize Chart Junk at all costs!
Things to get rid of, or make lighter
- Background fill
- Labels: instead of labeling every point, try just labeling the last point or include a small table below the graph.
- Grid lines: instead of grid lines, use targets, goals, or benchmarks. This is much more meaningful to interpretation. But even so, keep it simple and use one (maybe two) reference line(s). If you must use grid lines, make them very light.
ALWAYS KEEP YOUR AUDIENCE IN MIND
- If you’re making a graph for a large audience, or for people who don’t do a lot of analytics (e.g. doctors and nurses), keep it simple. If you’re using something other than a bar chart, line chart, or simple dot plot (like the squares in the KPI dashboard), people are likely to get confused.
- This is not meant to be derogatory – everyone has different strengths, that’s why we make a good team. You wouldn’t want me (a data analyst) to change your PICC Line!
- If you’re making a graph for a smaller audience of more technically savvy users, you can get more sophisticated. That said, generally the simple approach is the best approach, even for advanced users. Strategically combining a few well-crafted bar charts, line charts, and scatter plots can provide an incredible amount of insight.
Types of charts and when to use them
- Bar Charts
- Probably the most common chart type, and for good reason. Almost always a great choice. Easily understood by a wide variety of audiences.
- Good for comparing categories or showing data over time.
- Bar charts are interpreted by the length of the bars, so you MUST anchor bar charts at zero. Some metrics have variation that is very small compared to the absolute value, making it hard to interpret graphs that are anchored at zero. Additionally, some metrics don’t have a relevant fixed anchor point. A human temperature of 0 °F is bad data, not a relevant comparison. A bar chart may not be a good choice in these scenarios.
- “Flipping and Sorting” – when using a bar chart for comparing categories, consider flipping your axes and sorting by performance. The Flip means you don’t have to turn your head to read the axis. The Sort helps users easily determine relative performance of each bar to the entire cohort.
- Line Charts
- Line charts are probably the second most common chart type, again for good reason. A great choice and easily understood by a wide variety of audiences.
- Only use line charts for time series data. Line charts imply a connection between points, and as such, should only be used for time series data.
- Which chart type for time series data?
- It depends. One guideline is to use bar charts for raw numbers and line charts for rates and percentages, but this is a very loose guideline. Use whichever technique best answers the question at hand.
- Scatter Plots
- Scatter plots are likely the third most common chart type. They’re a little more advanced than bar charts and line charts, but offer something completely different.
- Great for comparing two continuous variables across category values (e.g. “let’s do a scatter plot of all surgeons, comparing infection rate to surgery volumes”). Scatter plots help quickly and easily identify relationships, cohorts, and areas for improvement.
Other graph types and visualization techniques
- There are many, many other visualization techniques, but bar charts, line charts, and scatter plots should probably make up at least 95% of your visualization work, no matter how advanced you are as an analyst. They’re just that useful!
- Dot Plots (Cleveland)
- A good alternative to bar charts if you need to zoom in
- A good alternative to line charts when you’re not graphing time series data
- Heatmaps are great to compare one continuous variable across two categories (e.g. ED Arrivals by Day of Week and Hour of Day, or Discharges by Unit and Service), especially when you include the marginal distributions (the bars in the image below).
- Small Multiples
- Small multiples are incredibly useful. Basically, you reproduce one graph multiple times. Each graph has the same x and y axes (with the same ranges), and you compare them using additional variables to organize the graphs into columns and rows. A great way to quickly and easily identify high level trends. Best used when you want to compare the inner graph across the outer variables (e.g. want to compare profit over time across regions and sectors in the image below).
- Pie Charts
- Don’t use them! Pie charts are hard to read. A Flipped and Sorted bar chart is almost always a better choice. The only time you might consider a pie chart is for showing one percentage (i.e. “75% of people think cake is yummy”). But even in this case, you could probably just write the number and it’d be more effective.
- 3-D Charts
- Don’t use them! 3-D charts only make things harder to read. Unless you’re doing hardcore science requiring actual 3-D surface mapping, you’re better off keeping things in 2-D.
- Color is an incredibly useful tool for data analysis. But don’t overdo it, keep it simple – a little color goes a long way!
- You should mostly use grey or another neutral color (black, navy blue, white, etc).
- In addition to your neutral color, you should pick ONE (maybe two) accent color(s).
- If you want to highlight areas for improvement, use neutral for good and a vibrant accent color for bad.
- If you want to highlight successes, use neutral for bad and a vibrant accent color for good.
- Using a vibrant color to highlight problem areas and another vibrant color to highlight successes ends up not highlighting anything. If everything is emphasized, nothing is emphasized.
- Don’t use color just to use color – be very intentional about how you use color. Unnecessary color distracts users and makes graphs harder to interpret
- Beware of red & green, also known as the “traffic light” color scheme. This is commonly requested by users (including senior leadership), but green and red are not distinguishable for people with color deficiency / blindness (5-10% of the population). Our job as analytics experts is to advocate for best practices, and that includes not using red & green. If you asked your doctor to surgically install a USB port in your neck, they’d tell you it was a bad idea. Don’t be afraid to be an expert.
- This somewhat depends on the particular shades of green and red that you use, but it’s just not worth it…
- There are a number of great tools for simulating color deficiency. Here are a few:
- Here are a few alternatives to the traffic light color scheme. First, I’d suggest just keeping it simple and using red & neutral (white or grey). Or, if you want to be optimistic, use green & neutral. Or try red & blue or green & purple instead of green & red. Orange & blue is also common. Or you could avoid the issue altogether, trying a different approach (e.g. bars vs goal). If you absolutely must use red & green, use symbols or something else to double encode your meaning (“design redundancy”).
In general, small design choices can have large impacts on interpretation. There generally isn’t just one answer – it depends on the question being asked. Below, the graph on the left is better if the question is “how are we doing overall throughout the day?” but the graph on the right is better if the question is “how does our performance change throughout the day?”
Nitpicky formatting things to keep in mind (make sure these are nice to read and are consistent between graphs)
- Tooltips / hover text
- Axis titles
- Graph titles
- Date & number formats
- Axis line color and style
- Borders and background fill
- Capitalization (sentence vs title case, etc.)
Standing caveat: “it depends.” These are all best practices based on robust literature and many years of experience. And with that said, there are times when a best practice isn’t the best choice in a given situation. If you’re ignoring a best practice, just be sure that you’ve thought it all the way through and have a good reason.
These ideas are not new. They are based on a wealth of existing literature as well as years of applied visualization work for varied audiences. If you’re interested in digging deeper into this subject area, here are a few highly recommended places to start:
- Information Dashboard Design. Stephen Few.
- Visualizing Health and Healthcare Data. Betzendahl, Brown, Rowell.
- The Big Book of Dashboards. Wexler, Shaffer, Cotgreave.
- Storytelling with Data. Cole Nussbaumer Knaflic.
- How Charts Lie. Alberto Cairo.
- The Visual Display of Quantitative Information. Edward Tufte.
- The Future of Data Analysis. John Tukey.
- The Elements of Graphing Data. William Cleveland.
This isn’t meant to be a final word, but a part of an existing, vibrant conversation. So, let me know what you think! Did I miss your favorite graph? Your favorite author? Do you feel that I unfairly denigrated pie charts? Did you love this post?
Until next time. Jeff.