Advanced analysis with Python in Copilot: How to work with time series data

Posted on October 1, 2024 by George Mount in Data science | 0 Comments

This article was first published on python - Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Time series analysis lets analysts identify patterns, trends, and cyclic fluctuations over time. These insights are essential for accurate forecasting, strategic planning, and informed decision-making.

Despite its significance, working with dates and times in a sophisticated manner is often hindered by Excel, which has become the butt of many internet jokes due to its handling of dates and times:

Optimist: The glass is ½ full.
Pessimist: The glass is ½ empty.
Excel: The glass is January 2nd.
— [email protected] (@jxxf) May 7, 2022

Although Excel has traditionally struggled with handling time series data, integrating Python within Excel has significantly enhanced its capabilities. The popular Pandas package, widely used in this environment, is even named in part after “Panel Data,” a type of time series data.

To make things even more user-friendly, we can now leverage Copilot’s Advanced Analysis features—a generative AI-assisted tool—to run various time series analyses on our data with ease.

To explore this feature on a time series dataset, download the exercise file below, which contains a copy of a well-known monthly shampoo sales dataset:

Download the exercise file here

If you’ve never used this feature before and aren’t sure how to load the dataset into Copilot’s Advanced Analysis, check out the post below for step-by-step instructions:

How to get started with Advanced Analysis with Python for Copilot in Excel

One of the key points covered in that article is that the results from Advanced Analysis can be volatile from one session to another. This means you will likely get different results than what is shown here!

In my run, it began by providing some basic statistics about the data, including descriptive statistics and a basic plot—a pretty good start:

I’ve always found it overly complicated in Excel just to resample data, that is, to change the level of the date aggregation. For example, if I want to visualize the data by quarterly sales instead of monthly, all I have to do is ask Copilot something like, “Can you visualize total shampoo sales by quarter?” and I’ll get the following plot:

As a dataset showing sales over time, it stands to reason that a line chart would typically be used here. However, visualizing total sales over time as a bar chart can emphasize individual time periods, making it easier to compare sales across specific months or years. This approach highlights distinct intervals, which can be useful if the data is categorical or if the focus is on total sales in each period rather than the trend between them.

You could, of course, make those changes in the underlying Python code yourself, or you could see if Copilot can adapt and respond by providing you with a new chart option, which I’ll do:

Python advanced analysis from line to bar plot

Of course, we could continue describing and visualizing this dataset, but let’s dive into building forecasts. It’s always a good idea, when trying to build a predictive model, to take things step by step. For example, with time series data, it’s wise to first assess the stationarity of the dataset.

If you’re not familiar with this concept, stationarity in a time series means that its statistical properties—such as mean, variance, and autocorrelation—remain constant over time. Checking for stationarity is crucial before building a forecast because many forecasting models assume that the time series is stationary for reliable predictions.

While Copilot can definitely provide guidance here, it’s helpful to have some foundational knowledge of the subject as well.

In this case, I’m going to ask Copilot, “Can you check for the stationarity of this dataset?”

The results will include the augmented Dickey-Fuller test, which is a common method for assessing stationarity, along with some interpretation. Copilot determined that this data is not stationary on its own, which could cause issues for forecasting, but it also suggests some potential alternatives to handle this situation.

I’ll definitely keep this in mind, as everything’s being stored in the Copilot chat session for easy reference. Now, let’s ask Copilot:

“Based on what you’ve assessed about the stationarity of this data, use a sensible technique to forecast the next three months of sales. Visualize the results.”

This will prompt Copilot to apply a relevant forecasting method based on the stationarity assessment, likely using something like ARIMA or a similar model suitable for non-stationary data, and then generate a visual representation of the forecasted sales for the next three months:

The jump between the actual and forecasted values in the ARIMA forecast plot above could be due to the model being trained on non-stationary data, making it difficult to accurately predict the transition from historical to forecasted values. Another possibility is a mismatch between the last observed data point and the first predicted point, which may result from how the forecast horizon is set. If the model’s differencing or other parameters aren’t well-tuned, this could also cause a noticeable shift. This is a great opportunity to use Copilot for further analysis or to dig into the Python code to investigate whether there’s truly an issue here.

In conclusion, integrating Python into Excel opens up a wide range of possibilities for time series analysis, far beyond what Excel can traditionally handle on its own. With tools like Copilot, Excel users can now easily perform complex tasks such as resampling data, checking for stationarity, and building advanced forecasting models like ARIMA.

But the potential doesn’t stop there. For example, Excel users can utilize Python’s advanced time series decomposition techniques to break down data into trend, seasonal, and residual components, providing deeper insights into underlying patterns. Additionally, Python enables the application of machine learning models for more sophisticated predictions, anomaly detection to identify unusual events, or even Monte Carlo simulations to model future uncertainties. These techniques allow Excel users to explore time series data in ways that would have been challenging, if not impossible, with Excel alone.

What questions do you have about working with time series data in Python within Excel or using Copilot’s Advanced Analysis features? Let me know in the comments.

The post Advanced analysis with Python in Copilot: How to work with time series data first appeared on Stringfest Analytics.

To leave a comment for the author, please follow the link and comment on their blog: python - Stringfest Analytics .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

Advanced analysis with Python in Copilot: How to work with time series data

Related