MENU

Python in Excel: How to create regular expressions with Copilot

This article was first published on python - Stringfest Analytics , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

After years of being unable to work with regular expressions, Excel has finally introduced a set of convenient and useful functions for doing so:

However, around the same time, regular expressions also became accessible in Excel as a result of its integration with Python:

The addition of regex support in both Excel functions and Python integration demonstrates the broad relevance of regular expressions across various software environments, emphasizing their versatile role in tackling a shared array of text-processing tasks.

Yet, to echo an old programmer’s quip—having to rely on regex to solve a problem in the first place might just mean you’ve already got a problem to deal with!

As this saying suggests, although regex is a capable solution, it often brings a trade-off of increased complexity or unforeseen snags.

Keeping the trickiness of crafting regex in mind, I’d argue that for most Excel users looking to dabble in basic regex tasks, Python is actually the more approachable option compared to Excel’s built-in functions. This is largely because when you ask Copilot for regex assistance, it spits out Python code rather than Excel function syntax (Copilot just seems to play nicer with Python overall.).

Let’s put it to the test with the download file below, which contains some basic text info we’d like to clean up and extract data from based on text patterns (literally regex’s bread and butter!):

 

If you’ve never tried Copilot with Python, such as in Excel’s Advanced Analysis, check out this post to get the basics and start using it:

Let’s kick things off by using natural language to whip up some regex with Python in Excel! I’ll begin by asking for valid email addresses. I’ll tell it where to put the results, switch things up a bit by asking for just the domain instead of the full email addresses, and add that the row should stay blank if no match is found:

Create a new column with the domains of valid email addresses from the Contact Info column. If an entry isn’t a valid email, leave that row blank.

Copilot will now work its magic. keep in mind this is probabilistic, so your results might look a bit different from mine. You might need to tweak the prompt or adjust things, but in the end, you should see those domains pop up in a new column as asked.

Formula created by Python Copilot for RegEx

The cool part? The code it generates is right there for you to check out and reuse! You can dig into it by looking at the code cell in

A42
A42 here, which holds the resulting DataFrame (rows 44-50 give you a preview of the output stored there). If you want more room to explore this generated code, head up to the Formulas tab on the ribbon, click Python, and hit Editor. This will launch a handy text editor where you can view and edit the code.

It’s pretty wild that Copilot built this whole pattern on its own, defined a function, and applied it to your dataset to make a new column… all by itself. Don’t sweat it if you’re not sure you could’ve written out this code as fast as Copilot did… that’s kind of the whole point of generative AI! But if you’ve never seen a function defined in Python or the

apply()
apply() method used in Pandas, this might be a good time to brush up on some Python basics first, since otherwise—wise you’ll be hard pressed to tweak, explain and understand this code when push comes to shove.

Assuming that Python code didn’t intimidate you too much, let’s jump into another example. This time, I want to search for a partial match—meaning I’m checking if a cell contains the pattern, even if there’s other content in there too. Specifically, in our data, one row of the the Additional Info column has phone numbers mixed in with other text. My prompt this time will be:

Check if a Contact Info or Additional Info entry contains a valid US phone number as part of the entry. Add a new column called ‘Valid Phone’ that marks True for entries containing valid numbers and False otherwise.

Valid phone number TRUE/FALSE regex

Sometimes the difference between exact match and contains is a big deal, sometimes it’s not, and sometimes you’ll need to tweak your Copilot prompt or the generated Python code to nail what you’re after. But in this case, it’s pretty awesome that we can search for a match across two columns and whip up a true/false flag so fast.

Let’s keep the good times rolling! This time, I want to throw in a bit of wiggle room with our phrase matching. Specifically, I’m after valid websites, and I want to catch them in EITHER of the two columns. Plus, I want to snag URLs whether they’ve got protocols—like the https://www part—or not. So let’s fire up another prompt to bring this to life:

Check if a Contact Info or Additional Info entry contains a valid website URL as part of the entry. Add a new column called ‘Website’ with the found URLs. Support URLs with and without protocols (e.g., https://, www.).

And just like that, we’ve got a shiny new column with the results:

Regex to check for website output

I hope you’re getting the hang of this! Let’s do one more example. Regular expressions can often be used to spot sensitive stuff like a social security number or credit card numbers. In this case, we can basically do a regex-powered find and replace to swap that data with masked data. Let’s give it a shot with the credit card numbers in this data:

Identify credit card numbers (e.g., patterns like ‘4111-1111-1111-1111’ or ‘5500 0000 0000 0004’) in the ‘Additional Info’ column and replace them with a masked version (e.g., ‘XXXX-XXXX-XXXX-1111’). Create a new column called ‘Masked Info’ with the result.

Masked credit cards output

Great work! We now have that data masked, and we could even remove the original column if we wanted as we push this downstream so it doesn’t risk exposing any personally identifiable information.

There’s just so much you can do with regex—it’s an incredibly powerful tool, and in my view, it’s way more user-friendly to tackle with Python and Copilot than even with Excel’s slick new functions.

I’ll admit, though, sometimes Copilot doesn’t nail it perfectly, especially if you’ve got more advanced scenarios multiple matches in a cell and want to grab all of them, or just the first or last one, and so forth. But that’s no big deal—you can easily get what you want it by jumping to another gen AI tool like Microsoft 365 Copilot, tweaking the code, and pasting it right back into Excel. That’s why it’s so important to be comfortable and familiar with Python code for these copy-paste tweaks, to double-check your work, and more.

What questions do you have about regex with Python, Copilot, and Excel or this power trio in general? Let me know in the comments.

The post Python in Excel: How to create regular expressions with Copilot first appeared on Stringfest Analytics.

To leave a comment for the author, please follow the link and comment on their blog: python - Stringfest Analytics .

Want to share your content on python-bloggers? click here.