Python-bloggers

Python in Excel: How to create custom bins and frequency tables

This article was first published on python - Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

If you’ve ever spent time manually grouping data into custom ranges or bins in Excel, you know it can get tedious fast. Whether it’s categorizing sales by price tiers, analyzing customer segments, or bucketing data for visualization, Excel often requires multiple nested formulas, helper columns, and PivotTables. Fortunately, Python offers a simpler, cleaner alternative.

In this post, I’ll show you how to effortlessly create custom frequency bins on the Windsor housing dataset using Python and Pandas, highlighting quick wins that’ll save you time and hassle compared to traditional Excel methods.

You can follow along with the exercise file below:

 

The first step will be to read our data source into Python in Excel as housing_df.

housing_df = xl("housing[#All]", headers=True)

# Create custom intervals
bins = [0, 40000, 75000, 125000, float('inf')]
labels = ['Budget (≤40k)', 'Affordable (40–75k)', 'Midrange (75–125k)', 'High-end (>125k)']

Next, we’ll define custom price ranges or “bins” using the bins list: bins = [0, 40000, 75000, 125000, float('inf')]. These values represent the price cutoff points for categorizing houses. Notice the float('inf') part? That simply means “infinity,” effectively capturing all prices greater than $150,000. Finally, the labels Budget, Midrange, Premium, and Luxury make your groups easy to read, just like creating labels in an Excel lookup table.

Our next steps will be to create a column in the DataFrame with these labeled values and count up the results:

housing_df['PriceBin'] = pd.cut(housing_df['price'], bins=bins, labels=labels)

# Get counts per bin
price_counts = housing_df['PriceBin'].value_counts().sort_index()
price_counts

The first line takes the price column from our dataset (housing_df) and groups each home into our previously defined price ranges using pd.cut(). Next, we use value_counts() to quickly count how many homes fall into each price category. Sorting by index with sort_index() ensures the counts are displayed in the order of the price categories, from lowest to highest.

The result, price_counts, neatly summarizes the number of homes within each pricing bin:

Custom price intervals start

A nice step from here is to visualize the results. We’ll use price_counts.plot(kind='barh') to take the frequency counts of homes per price bin and create a horizontal bar chart. The subsequent lines (plt.title(), plt.xlabel(), and plt.ylabel()) add clear labels and a descriptive title, making the visualization easy to understand at a glance.

To see another practical use of Pandas binning, let’s take our lotsize column and quickly split it into four quartiles using pd.qcut(). This function automatically divides the data into four groups with approximately equal numbers of observations, labeling them ‘Q1’ through ‘Q4’:

housing_df['LotsizeQuartile'] = pd.qcut(
    housing_df['lotsize'],
    q=4,
    labels=['Q1','Q2','Q3','Q4']
)

housing_df['LotsizeQuartile'].value_counts().sort_index()

This code creates a new column LotsizeQuartile in our dataset, assigning each home’s lot size into one of the quartiles. The value_counts() method then counts how many homes fall into each quartile. While the quartiles aim to have an equal number of entries, slight differences can occur if multiple homes share identical lot sizes at the boundaries between quartiles, causing quartile bins to adjust slightly to accommodate ties.

Finally, let’s compare how these two custom categories relate to each other by creating a two-way frequency table using pd.crosstab():

pd.crosstab(housing_df['PriceBin'], housing_df['LotsizeQuartile'])

This two-way frequency table instantly reveals how pricing categories align with property sizes. For instance, you can quickly spot trends like most affordable homes having smaller lots (Q1 and Q2), or higher-end homes primarily appearing in larger lot quartiles (Q3 and Q4). Generating this cross-tabulation in Python is straightforward and automatic: no manual drag-and-drop PivotTable or custom lookup tables required.

If you enjoyed seeing how Python can simplify your Excel workflow and want more practical tips to boost your productivity, check out my mini-course “Python in Excel: Quick Wins” on Gumroad. It’s designed specifically for Excel users eager to leverage Python for immediate, real-world improvements, no programming experience needed. Get your copy now.

What questions do you have about creating custom frequency intervals in Python in Excel or about Python in Excel more generally? Let me know in the comments.

The post Python in Excel: How to create custom bins and frequency tables first appeared on Stringfest Analytics.

To leave a comment for the author, please follow the link and comment on their blog: python - Stringfest Analytics .

Want to share your content on python-bloggers? click here.
Exit mobile version