Learning Guide: Introduction to Pandas, Half-day Workshop

[This article was first published on George J. Mount, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In honor of National Panda day, I have put together a Stringfest learning guide on pandas, the popular Python module for data analysis and manipulation. 

Fun fact: the name pandas comes from so-called “panel data” in econometrics. The primary data structure of interest in pandas is the DataFrame, which is two-dimensional and tabular. This is a very common and useful way to arrange data for data analysis, and Excel or SQL users will find many similarities to how pandas views and uses data — with some (useful) twists.

Take a look at the below half-day workshop and let me know what you think. My goal is for the learner to be ready to conduct exploratory data analysis in Python given their foundations in pandas.

Introduction to Pandas workshop

Lesson 1: Up and running with NumPy

Objective: Student create and operate on NumPy arrays

Description:

  • Installing NumPy
  • Creating arrays
  • Inspecting arrays
  • Reshaping arrays
  • Array mathematics
  • Random numbers

Exercises: Drills

Assets needed: None

Time: 35 minutes

Lesson 2: Introduction to Pandas

Objective: Student can import and create Pandas DataFrames

Description:

  • Installing NumPy
  • NumPy and Pandas
  • Series and DataFrames
  • Columns and indices
  • Creating DataFrames
  • Importing: CSV, Excel

Exercises: Drills

Assets needed: Baseball records

Time: 25 minutes

Lesson 3: Exploring DataFrames

Objective: Student can inspect and explore Pandas DataFrames

Description:

  • Inspecting columns
  • Printing rows
  • Descriptive statistics
  • Checking for missing values
  • Retrieving columns

Exercises: Drills

Assets needed: Baseball records

Time: 40 minutes

Lesson 4: Basic DataFrame manipulation

Objective: Student can perform basic operations on Pandas DataFrames

Description:

  • Sorting and filtering rows
  • Modifying columns
  • Removing columns
  • Manipulating missing values
  • Removing duplicates
  • Aliasing modules

Exercises: Drills

Assets needed: Baseball records

Time: 45 minutes

Lesson 5: Intermediate DataFrame manipulation

Objective: Student can perform intermediate operations on Pandas DataFrames

Description:

  • Creating new columns
  • Reshaping: melting and pivoting
  • Aggregating
  • Merging DataFrames
  • Exporting DataFrames: CSV, Excel

Exercises: Drills

Assets needed: Baseball records

Time: 45 minutes

By the way, the “baseball records” I refer to in the guide come from the Lahman baseball database, one of my all-time favorite datasets.

The only thing better than that dataset would be, well, a panda playing baseball…. oh wait, that’s Pablo Sandoval.

This download is part of my resource library. For exclusive free access, subscribe below.

To leave a comment for the author, please follow the link and comment on their blog: George J. Mount.

Want to share your content on python-bloggers? click here.