In honor of National Panda day, I have put together a Stringfest learning guide on `pandas`, the popular Python module for data analysis and manipulation.

Fun fact: the name `pandas` comes from so-called “panel data” in econometrics. The primary data structure of interest in `pandas` is the DataFrame, which is two-dimensional and tabular. This is a very common and useful way to arrange data for data analysis, and Excel or SQL users will find many similarities to how `pandas` views and uses data — with some (useful) twists.

Take a look at the below half-day workshop and let me know what you think. My goal is for the learner to be ready to conduct exploratory data analysis in Python given their foundations in `pandas`.

### Lesson 1: Up and running with NumPy

Objective: Student create and operate on NumPy arrays

Description:

• Installing NumPy
• Creating arrays
• Inspecting arrays
• Reshaping arrays
• Array mathematics
• Random numbers

Exercises: Drills

Assets needed: None

Time: 35 minutes

### Lesson 2: Introduction to Pandas

Objective: Student can import and create Pandas DataFrames

Description:

• Installing NumPy
• NumPy and Pandas
• Series and DataFrames
• Columns and indices
• Creating DataFrames
• Importing: CSV, Excel

Exercises: Drills

Assets needed: Baseball records

Time: 25 minutes

### Lesson 3: Exploring DataFrames

Objective: Student can inspect and explore Pandas DataFrames

Description:

• Inspecting columns
• Printing rows
• Descriptive statistics
• Checking for missing values
• Retrieving columns

Exercises: Drills

Assets needed: Baseball records

Time: 40 minutes

### Lesson 4: Basic DataFrame manipulation

Objective: Student can perform basic operations on Pandas DataFrames

Description:

• Sorting and filtering rows
• Modifying columns
• Removing columns
• Manipulating missing values
• Removing duplicates
• Aliasing modules

Exercises: Drills

Assets needed: Baseball records

Time: 45 minutes

### Lesson 5: Intermediate DataFrame manipulation

Objective: Student can perform intermediate operations on Pandas DataFrames

Description:

• Creating new columns
• Reshaping: melting and pivoting
• Aggregating
• Merging DataFrames
• Exporting DataFrames: CSV, Excel

Exercises: Drills

Assets needed: Baseball records

Time: 45 minutes

By the way, the “baseball records” I refer to in the guide come from the Lahman baseball database, one of my all-time favorite datasets.

The only thing better than that dataset would be, well, a panda playing baseball…. oh wait, that’s Pablo Sandoval.