Python in Excel: How to do statistical bootstrapping with Copilot

Posted on August 17, 2025 by George Mount in Data science | 0 Comments

This article was first published on python - Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

As analysts, we constantly report on KPIs and metrics critical to our businesses. That’s essential… but the numbers we present aren’t always as black-and-white as they seem. Every metric comes with uncertainty, errors, and limitations.

Bootstrapping is a simple yet powerful statistical method that helps you quantify how much you can trust the numbers you’re reporting, whether it’s an average, a total, or a rate. Instead of complicated statistical equations, bootstrapping leverages your actual data to create many simulated samples, then examines how those samples vary. That variation gives you a built-in “confidence range,” clearly showing how sure you can be about your results.

For Excel-savvy business users, bootstrapping is especially helpful whenever you’re analyzing a sample and need to confidently answer questions like “How sure are we?” This applies to everything from customer satisfaction surveys to early-stage product sales figures or performance benchmarking tests.

Let’s put bootstrapping into action with Python, Copilot, and Excel using a fuel mileage dataset. Combining Python’s statistical power with the simplicity of prompting through Copilot, we’ll investigate an important business question: Is there a significant difference in fuel mileage between vehicles from Europe versus those from Asia?

Follow along by downloading the file below:

Download the exercise file here

If you’re not familiar with using Advanced Analysis with Python in Copilot, check out this post:

How to get started with Advanced Analysis with Python for Copilot in Excel

Bootstrapping basics

Bootstrapping works by repeatedly resampling your dataset so each resample looks like a new dataset of the same size. For each one, we calculate the statistic we care about, such as the mean mpg. After running this process 1,000 times, we get a distribution of results that shows how much our estimate can vary. The middle 95% of those values gives us a confidence interval, which is a simple way of saying, “Here’s the range where we expect the true value to fall.”

That means our first prompt will look like this:

“Calculate the average mpg for cars from Asia and use bootstrapping with 1,000 resamples to create a 95% confidence interval. Set the random seed to 42 for reproducibility. Show the result clearly in Excel.”

A few key choices, as determined in the prompt, shape how bootstrapping works. Setting a random seed (like 42) ensures we can reproduce the same results every time. Choosing the number of resamples (1,000 in this case) balances accuracy with computation. More resamples smooth out the estimates, but take longer. Finally, resampling with replacement is important because it allows repeated rows in each sample, imitating the randomness of drawing from a larger population.

Looking at the Copilot output, the average mpg for Asian cars is about 30.45, with a 95% confidence interval ranging from 29.08 to 31.94. In plain terms, this tells us that if we were to repeat the data collection process many times, the average mpg for Asian cars would almost always fall within this range. It doesn’t claim the “true” number is exactly 30.45, but it gives us a range where we can be about 95% confident the real value lies.

Bootstrapping versus traditional statistics

Traditionally, statisticians use formulas based on assumptions about the data, like it being normally distributed, or the sample size being large enough, to calculate confidence intervals. That approach works well in theory but often feels abstract, especially if you’re not fluent in statistical formulas.

Bootstrapping offers a more hands-on alternative: instead of relying on theory, it directly resamples your data thousands of times to see how much your results vary in practice. This makes bootstrapping a nicer alternative: it’s intuitive, flexible, and relies less on assumptions. You get a range that reflects the actual variability in your data, not just a theoretical model of it.

Visualizing bootstrap results

Once we’ve calculated the average mpg and its confidence interval, the next logical question is “what do those bootstrapped results actually look like?” Numbers alone can feel abstract, but plotting a histogram of the bootstrap distribution gives us a visual sense of the variation in our estimates.

To that end, here’s our next prompt to Copilot:

“Plot a histogram of the bootstrap distribution of average mpg for cars from Asia.”

The resulting visualization reinforces what we saw numerically in the confidence interval (29.08 to 31.94). The histogram makes that result feel more tangible: not only do we know the likely range, but we can also see that it’s not equally likely across all points. The bulk of the resamples cluster tightly around the mean, giving us more confidence that ~30.5 mpg is a reliable estimate.

After plotting the bootstrap histogram, the next step is to connect it back to the summary numbers we first calculated. The histogram shows the full distribution of bootstrapped averages, but without a clear marker it can be hard to tell exactly where the 95% confidence interval begins and ends. Let’s ask Copilot to overlay the summary directly on the visualization:

“Add vertical lines to the histogram showing the lower and upper bounds of the 95% confidence interval.”

The histogram plus confidence lines make it obvious that the average isn’t a single “magic number” but a range of plausible values, with most of the probability mass clustered near the middle.

Comparing two bootstrapped distributions

Up to now, we’ve only been looking at Asian cars. That gave us a good grounding in what bootstrapping does, but real analysis often means comparing groups. The natural next step is to make the comparison to European cars:

“Repeat the bootstrapping process for cars from Europe (set the random seed to 42 for reproducibility). Plot the bootstrap distributions of average mpg for both Asian and European cars on the same chart, and add vertical lines showing the lower and upper bounds of each group’s 95% confidence interval.”

The combined histogram shows Asian cars averaging around 30–31 mpg and European cars lower, at about 27–28 mpg. The distributions are mostly separate, but they do meet at the edges. The lower bound of Asia’s confidence interval lines up almost exactly with the upper bound of Europe’s, which means the ranges just barely touch.

That overlap tells us there’s a very small statistical chance the two groups could share the same true mean, though the evidence strongly suggests otherwise. In practice, the results point to Asian cars being more fuel efficient on average, but with the caveat that the difference isn’t quite so absolute that we can rule out some overlap. This nuance is exactly why showing confidence intervals alongside averages matters. They keep us honest about what the data can and cannot say with certainty.

So far we’ve looked at charts to compare Asian and European cars, but when confidence intervals are close together, the lines on a plot can be hard to read. That’s why the next natural step is to summarize the results in a simple table:

“Create a table comparing the bootstrap results for Asian and European cars. Show the average mpg, the lower bound of the 95% confidence interval, and the upper bound of the 95% confidence interval for each group.”

The table shows Asian cars with an average mpg of about 30.45 and a 95% confidence interval ranging from 29.08 to 31.94. European cars average lower at about 27.60, with a confidence interval spanning 26.21 to 29.08. The key detail is that the upper bound for Europe (29.08) and the lower bound for Asia (29.08) meet almost exactly.

Again, this means that while the two groups are generally distinct, there is a slim statistical possibility that their true averages could be the same. In practice, though, the evidence leans strongly toward Asian cars having higher fuel efficiency.

Interpreting the results

For our final prompt, let’s push the analysis into an accessible summary that anyone—executives, managers, or non-technical colleagues—can understand:

“Write a short plain-language interpretation comparing fuel efficiency for Asia vs Europe, explaining what the bootstrapped confidence intervals tell us about their typical performance.”

Copilot’s plain-language summary makes the big takeaway clear: Asian cars tend to get higher miles-per-gallon than European cars, and the confidence intervals show that while their ranges just barely touch, the evidence leans strongly toward Asian cars being more fuel efficient.

If you want more detailed interpretations tied directly to the numbers and visuals in your workbook, consider using Copilot Notebooks in Microsoft Copilot. Notebooks are designed for deeper, step-by-step analysis and can give you richer narrative explanations alongside the calculations.

How to get detailed Excel help with Copilot Notebooks

Conclusion

Bootstrapping is a practical first step into statistical analysis because it makes uncertainty visible. With Copilot, Python, and Excel, you can quickly simulate confidence intervals and compare groups in a way that gives more credibility to your results. Instead of a single answer, you’re showing the range of outcomes your data supports, which is far more useful for real decision-making.

Still, bootstrapping is just the beginning. Once you’re comfortable with resampling, the next logical tools to explore include permutation tests, which directly test whether group differences are significant; regression models, which help explain performance using multiple variables at once; and time series methods, which account for trends and seasonality in data collected over time. These approaches expand your toolkit from describing averages to testing relationships and even forecasting.

It’s important to remember the limits: bootstrapping cannot solve problems of poor or biased data, and Copilot can only assist. It cannot replace your judgment in interpreting results or applying them to business decisions.

For a deeper dive into structured analysis, Copilot Notebooks in Microsoft Copilot can guide you through step-by-step interpretations tied directly to your workbook. From here, you can grow into more advanced methods while still using Excel as a familiar front end for communicating your insights.

The post Python in Excel: How to do statistical bootstrapping with Copilot first appeared on Stringfest Analytics.

To leave a comment for the author, please follow the link and comment on their blog: python - Stringfest Analytics .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers