Python-bloggers

Create A Pandas Dataframe AI Agent With Generative AI, Python And OpenAI

This article was first published on business-science.io , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Hey guys, this is the first article in my NEW GenAI / ML Tips Newsletter. Today, we’re diving into the world of Generative AI and exploring how it can help companies automate common data science tasks. Specifically, we’ll learn how to create a Pandas dataframe agent that can answer questions about your dataset using Python, Pandas, LangChain, and OpenAI’s API. Let’s get started!

Table of Contents

Here’s what you’ll learn in this article:

This is what you are making today

We’ll use this Generative AI Workflow to combine data (from CSVs or SQL databases) with a Pandas Data Frame Agent that helps us produce common analytics outputs like visualizations and reports.

Get the Code (In the AI-Tip 001 Folder)


SPECIAL ANNOUNCEMENT: AI for Data Scientists Workshop on December 18th

Inside the workshop I’ll share how I built a SQL-Writing Business Intelligence Agent with Generative AI:

What: GenAI for Data Scientists

When: Wednesday December 18th, 2pm EST

How It Will Help You: Whether you are new to data science or are an expert, Generative AI is changing the game. There’s a ton of hype. But how can Generative AI actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free Generative AI for Data Scientists workshop.

Price: Does Free sound good?

How To Join: 👉 Register Here


GenAI/ML-Tips Weekly

This article is part of GenAI/ML Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common Data Science and Generative AI coding tasks. Pretty cool, right?

Here is the link to get set up. 👇

Get the Code (In the GenAI/ML Tip 001 Folder)

This Tutorial is Available in Video (9-minutes)

I have a 9-minute video that walks you through setting up the Pandas Data Frame Agent and running data analysis with it. 👇

Why Generative AI is Transforming Data Science

Generative AI, powered by models like OpenAI’s GPT series, is reshaping the data science landscape. These models can understand and generate human-like text, making it possible to interact with data in more intuitive ways. By integrating Generative AI into data science, you can:

Creating a Pandas dataframe agent combines the power of AI with data science, enabling you to unlock new possibilities in data exploration and interpretation from Natural Language.

What is a Pandas Data Frame Agent?

A Pandas Data Frame Agent automates common Pandas operations from Natural Language inputs.

It can be used to perform:

All from Natural Language prompts.

Make A Pandas Data Frame Agent

Let’s walk through the steps to create a Pandas data frame agent that can answer questions about a dataset using Python, OpenAI’s API, Pandas, and LangChain.

Quick Reminder: You can get all of the code and datasets shown in a Python Script and Jupyter Notebook when you join my GenAI/ML Tips Newsletter.

Code Location: /001_pandas_dataframe_agent

Step 1: Setting Up the Python Environment

First, you’ll need to set up your Python environment and install the required libraries.

pip install openai langchain langchain_openai langchain_experimental pandas plotly pyyaml

Next, import the libraries.

Then run this to access our utility function, parse_json_to_dataframe().

The last part is to set up your OpenAI API Key. Make sure to get an API Key from OpenAI’s API website.

Note: Replace ‘credentials.yml’ with the path to your YAML file containing the OpenAI API key or set the ‘OPENAI_API_KEY’ environment variable directly.

Step 2: Loading and Exploring the Dataset

Load your dataset into a Pandas DataFrame. For this tutorial, we’ll use a sample customer data CSV file. But you could easily use any data that you can get into a Pandas Data Frame:

Run this code to load the customer dataset:

This dataset contains customer information, including sales and geography data.

Step 3: Create the Pandas Data Analysis Agent with LangChain

Initialize the language model and create the Pandas data analysis agent using LangChain.

This is what’s happening:

Pro-Tip: The secret sauce is to use the suffix parameter to specify the output format. Under the hood, this appends the agent’s default prompt template with additional information that describes how to return the information.

Step 4: Interacting with the Pandas Data Frame Agent

Now, you can ask the agent questions about your data. Try running this code with a Natural Language analysis question:

“What are the total sales by geography?”

The agent processes the query and returns a response.

This is where Post Processing comes into play. Remember when I added the suffix parameter to return JSON. The Agent actually burries the JSON in a string.

That’s OK, because I have created a handy little parsing tool that extracts the JSON from the string and converts it to a Pandas Data Frame for us.

Step 5: Visualizing the Results

With a pandas data frame we can then report the results. I’ll do this manually with Plotly, but a great challenge is to extend the code to create an AI agent that makes the visualization code and executes it automatically.

This visualization provides a clear view of sales distribution across different geographical regions.

Quick Reminder: You can get all of the code and datasets shown in a Python Script and Jupyter Notebook when you join my GenAI/ML Tips Newsletter.

Conclusion

By integrating Generative AI with data science, you’ve created a powerful tool that can interact with your data in natural language. This Pandas data analysis agent simplifies the process of extracting insights and can help non-technical stakeholders automate common data manipulations to help them make data-driven decisions.

But there’s so much more to learn in Generative AI and data science.

If you’re excited to become a Generative AI Data Scientist with Python, then keep reading…

Become A Generative AI Data Scientist

The future of data science is AI / ML.

I’ve helped 6,107+ students learn data science and now I’m helping them become Generative AI Data Scientists, skilled in combining Generative AI / ML. With this system they have:

Here’s the system they are taking to become Generative AI Data Scientists:

This is a Live 8-Week Generative AI Bootcamp for Data Scientists that covers:


Enroll In The Next Cohort Here
(And Become A Generative AI Data Scientist in 2025)

To leave a comment for the author, please follow the link and comment on their blog: business-science.io .

Want to share your content on python-bloggers? click here.
Exit mobile version