Python-bloggers

3 Ways to Read Multiple CSV Files: For-Loop, Map, List Comprehension

This article was first published on business-science.io , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Reading many CSV files is a common task for a data scientist. In this free tutorial, we show you 3 ways to streamline reading CSV files in Python. You’ll read and combine 15 CSV Files using the top 3 methods for iteration.

Python Tips Weekly

This article is part of Python-Tips Weekly, a bi-weekly video tutorial that shows you step-by-step how to do common Python coding tasks.

Here are the links to get set up. 👇

Video Tutorial
Follow along with our Full YouTube Video Tutorial.

This 5-minute video covers reading multiple CSV in python.


(Click image to play tutorial)

Read 15 CSV Files [Tutorial]

This FREE tutorial showcases the awesome power of python for reading CSV files. We’ll read 15 CSV files in this tutorial.

Before we get started, get the Python Cheat Sheet

The Python Ecosystem is LARGE. To help, I’ve curated many of the 80/20 Python Packages, those I use most frequently to get results. Simply Download the Ultimate Python Cheat Sheet to access the entire Python Ecosystem at your fingertips via hyperlinked documentation and cheat sheets.


(Click image to download)

Onto the tutorial.

Project Setup

First, load the libraries. We’ll import pandas and glob.

Second, use glob to extract a list of the file paths for each of the 15 CSV files we need to read in.

Get the code.

Method 1: For-Loop

The most common way to repetitively read files is with a for-loop. It’s a great way for beginners but it’s not the most concise. We’ll show this way first.

Get the code.

We can see that this involves 3-steps:

  1. Instantiating an Empty List: We do this to store our results as we make them in the for-loop.

  2. For-Each filename, read and append: We read using pd.read_csv(), which returns a data frame for each path. Then we append each data frame to our list.

  3. Combine each Data Frame: We use pd.concat() to combine the list of data frames into one big data frame.

PRO-TIP: Combining data frames in lists is a common strategy. Don’t forget to use axis=0 to specify row-wise combining.

Method 2: Using Map

The map() function is a more concise way to iterate. The advantage is that we don’t have to instantiate a list. However, it can be more confusing to beginners.

How it works:

Map takes in two general arguments:

  1. func: A function to iteratively apply

  2. *iterables: One or more iterables that are supplied to the function in order of the functions arguments.

Get the code.

Let’s use it.

Ok, so let’s try map().

Get the code.

We use 3-steps:

  1. Make a Lambda Function: This is an anonymous function that we create on the fly with the first argument that will accept our iterable (each filename in our list of csv file paths).

  2. Supply the iterable: In this case, we provide our list of csv files. The map function will then iteratively supply each element to the function in succession.

  3. Convert to List: The map() function returns a map object. We can then convert this to a list using the list() function.

PRO-TIP: Beginners can be confused by the “map object” that is returned. Just simply use the list() function to extract the results of map() in a list structure.

Method 3: List Comprehension

Because we are returning a list, even easier than map(), we can use a List Comprehension. A list comprehension is a streamlined way of making a for-loop that returns a list. Here’s how it works.

Get the code.

  1. Do this: Add the function that you want to iterate. The parameter must match your looping variable name (next).

  2. For each of these: This is your looping variable name that you create inside of the list comprehension. Each of these are elements that will get passed to your function.

  3. In this: This is your iterable. The list containing each of our file paths.

Summary

There you have it. You now know how to read CSV files using 3 methods:

  1. For-Loops
  2. Map
  3. List Comprehension

But there’s a lot more to learning data science. And if you’re like me, you’re interested in a fast track system that will advance you without wasting time on information you don’t need.

The solution is my course, Data Science Automation with Python

Data Science Automation with Python Course

Tired of struggling to learn data science? Getting stuck in a sea of neverending resources? Eliminate the confusion and speed up your learning in the process.

Businesses are transitioning manual processes to Python for automation. We teach you skills that organizations need right now.

Learn how in our new course, Python for Data Science Automation. Perform an end-to-end business forecast automation using pandas, sktime, and papermill, and learn Python in the process.

To leave a comment for the author, please follow the link and comment on their blog: business-science.io .

Want to share your content on python-bloggers? click here.
Exit mobile version