Python-bloggers

How to conditionally format a scatterplot in Excel

This article was first published on python - Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

A scatterplot is a chart that displays the relationship between two variables by plotting data points along the X and Y axes, with each point representing a unique pair of values.

By conditionally formatting points based on their category, we can reveal patterns, clusters, or outliers, making it easier to interpret and draw insights from different groups within a single chart.

Excel doesn’t natively support conditional formatting for scatterplot points by category without manual adjustments, but with some creative Excel workarounds—such as creating separate series for each category—or by using Python for enhanced customization, we can produce scatterplots that visually differentiate each category effectively. In this post, we’ll walk through how to achieve this.

To follow along, download the exercise file:

In this dataset, we have information on total bill amounts, corresponding tips, and whether each receipt is from lunch or dinner. Creating a basic scatterplot in Excel to compare total bill versus tip was straightforward—just select the data, go to Insert > Recommended Charts, and choose a scatterplot from the options.

But how can we adjust this plot to color each point by meal type (lunch or dinner), helping us see if time of day impacts these values or the overall relationship? Let’s explore how to do it.

Total bill vs tip starter

Option 1: Legacy Excel and the NA() function

To add conditional formatting to a scatterplot in basic Excel, we’ll start by adding two extra columns to our table: Lunch and Dinner. The goal is to use conditional logic to populate these columns with tip values only for the corresponding meal type, mapping lunch tips to the Lunch column and dinner tips to the Dinner column.

Great! Let’s add the conditional formatting. This might look a bit different since we’re using structured table references, but the key concept is straightforward: if the value in the “Time” column is “Dinner,” then the formula will return a value in the “Dinner” column; otherwise, it will use the NA() function to signal to Excel that this value isn’t available, which prevents it from being plotted.

=IF([@time]=scatter_format[[#Headers],[Dinner]], [@tip], NA())

Now, write a similar function for the “Lunch” column, so it only returns a value where “Time” is “Lunch.” Once you’re done, select the data in the “Total Bill,” “Dinner,” and “Lunch” columns, and go to Insert > Recommended Charts to visualize it.

You should still see a scatterplot recommended, but this time it’s color-coded by time of day. Great work!

It appears that while both total bills and tips tend to be lower at lunch than at dinner, the overall relationship between the numbers follows a similar pattern for both meals.

Option 2: Python in Excel and the seaborn package

Now, let’s see how to accomplish this using Python in Excel. The biggest advantage of this approach is its flexibility. Unlike in Excel, where we need helper columns for conditional formatting, here we can simply add an extra argument to Seaborn’s scatterplot() function. This is particularly useful if we anticipate adding more categories to our data. For instance, if the restaurant starts serving breakfast, we won’t need to adjust the chart’s formatting manually.

With Seaborn, we can achieve this visualization in a single line of code. Here’s how it looks (the first line is just for data import, no visualization yet):

tips_df = xl("tips[#All]", headers=True)
sns.scatterplot(data=tips_df, hue='time', x='total_bill', y='tip')

We’ll get the following plot returned to us in Excel:

The main advantage here is definitely the ease of extensibility and the fact that we don’t need to create additional helper columns. However, there are a few limitations to keep in mind. For instance, the lack of tooltips and the inability to click on each data point for details can’t be addressed directly in Python. But we can enhance the chart by adding a title, improving the axis labels, and formatting the axes as currency.

This is where Python might start to feel a bit more cumbersome than Excel, as we’ll need to add several lines of code to achieve these adjustments:

You should now have a chart like the following:

To close, here’s a summary table comparing and contrasting these two methods for conditionally formatting a scatterplot in Excel:

Feature Native Excel Python in Excel (Seaborn)
Conditional Formatting Approach Requires adding helper columns (e.g., Lunch and Dinner) and using conditional formulas with NA() Conditional coloring done directly via hue argument
Ease of Setup Moderate – needs manual setup and workaround for each category High – single line of code for basic scatterplot with color differentiation
Customizability Limited – manual adjustments for new categories, series separation High – easily extendable with minimal code for multiple categories
Chart Interactivity Moderate – clickability, some built-in tooltips available Limited – lacks tooltips and direct interactivity in Excel
Required Skill Level Basic to Intermediate Excel knowledge Intermediate Python knowledge (Seaborn library)
Maintenance More time-consuming with new categories Easier to scale and maintain for additional categories

Which method do you think best suits your needs, and do you have any questions about implementing either approach? Let me know in the comments.

The post How to conditionally format a scatterplot in Excel first appeared on Stringfest Analytics.

To leave a comment for the author, please follow the link and comment on their blog: python - Stringfest Analytics .

Want to share your content on python-bloggers? click here.
Exit mobile version