Want to share your content on python-bloggers? click here.
Reading many CSV files is a common task for a data scientist. In this free tutorial, we show you 3 ways to streamline reading CSV files in Python. You’ll read and combine 15 CSV Files using the top 3 methods for iteration.
Python Tips Weekly
This article is part of Python-Tips Weekly, a bi-weekly video tutorial that shows you step-by-step how to do common Python coding tasks.
Here are the links to get set up. 👇
Video Tutorial
Follow along with our Full YouTube Video Tutorial.
This 5-minute video covers reading multiple CSV in python.
Read 15 CSV Files [Tutorial]
This FREE tutorial showcases the awesome power of python
for reading CSV files. We’ll read 15 CSV files in this tutorial.
Before we get started, get the Python Cheat Sheet
The Python Ecosystem is LARGE. To help, I’ve curated many of the 80/20 Python Packages, those I use most frequently to get results. Simply Download the Ultimate Python Cheat Sheet to access the entire Python Ecosystem at your fingertips via hyperlinked documentation and cheat sheets.
Onto the tutorial.
Project Setup
First, load the libraries. We’ll import pandas
and glob
.
-
Pandas
: The main data wrangling library in Python -
glob
: A library for locating file paths using text searching (regular expressions)
Second, use glob
to extract a list of the file paths for each of the 15 CSV files we need to read in.
Method 1: For-Loop
The most common way to repetitively read files is with a for-loop. It’s a great way for beginners but it’s not the most concise. We’ll show this way first.
We can see that this involves 3-steps:
-
Instantiating an Empty List: We do this to store our results as we make them in the for-loop.
-
For-Each filename, read and append: We read using
pd.read_csv()
, which returns a data frame for each path. Then we append each data frame to our list. -
Combine each Data Frame: We use
pd.concat()
to combine the list of data frames into one big data frame.
PRO-TIP: Combining data frames in lists is a common strategy. Don’t forget to use axis=0
to specify row-wise combining.
Method 2: Using Map
The map()
function is a more concise way to iterate. The advantage is that we don’t have to instantiate a list. However, it can be more confusing to beginners.
How it works:
Map takes in two general arguments:
-
func: A function to iteratively apply
-
*iterables: One or more iterables that are supplied to the function in order of the functions arguments.
Let’s use it.
Ok, so let’s try map()
.
We use 3-steps:
-
Make a Lambda Function: This is an anonymous function that we create on the fly with the first argument that will accept our iterable (each filename in our list of csv file paths).
-
Supply the iterable: In this case, we provide our list of csv files. The map function will then iteratively supply each element to the function in succession.
-
Convert to List: The
map()
function returns a map object. We can then convert this to a list using thelist()
function.
PRO-TIP: Beginners can be confused by the “map object” that is returned. Just simply use the list()
function to extract the results of map()
in a list structure.
Method 3: List Comprehension
Because we are returning a list, even easier than map()
, we can use a List Comprehension. A list comprehension is a streamlined way of making a for-loop that returns a list. Here’s how it works.
-
Do this: Add the function that you want to iterate. The parameter must match your looping variable name (next).
-
For each of these: This is your looping variable name that you create inside of the list comprehension. Each of these are elements that will get passed to your function.
-
In this: This is your iterable. The list containing each of our file paths.
Summary
There you have it. You now know how to read CSV files using 3 methods:
- For-Loops
- Map
- List Comprehension
But there’s a lot more to learning data science. And if you’re like me, you’re interested in a fast track system that will advance you without wasting time on information you don’t need.
The solution is my course, Data Science Automation with Python
Data Science Automation with Python Course
Tired of struggling to learn data science? Getting stuck in a sea of neverending resources? Eliminate the confusion and speed up your learning in the process.
Businesses are transitioning manual processes to Python for automation. We teach you skills that organizations need right now.
Learn how in our new course, Python for Data Science Automation. Perform an end-to-end business forecast automation using pandas
, sktime
, and papermill
, and learn Python in the process.
Want to share your content on python-bloggers? click here.