How To Unlock The Power Of Datetime In Pandas

[This article was first published on Python – Predictive Hacks, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Dates are one of the most used features of Data Analysis. We will show you how you can parse, use, and analyze them with some simple but very useful functions. Let us first create a DateTime object to understand its concept a bit more.

import pandas as pd
import yfinance as yf
from datetime import datetime
import numpy as np

year=2021
month=1
day=1

#create a datetime object
date=datetime(year,month,day)

In the code above we set the year month and date (we can add more if we want like minutes, hours, etc.). A very useful feature of the DateTime object is that we can grab their attributes. This is used a lot with pandas when we want to create columns like “Month”, “Year”, etc.

print('day',date.day)
print('month',date.month)
print('year',date.year)
day 1
month 1
year 2021

String to Datetime in Pandas

In the real-world, in most cases, we are getting dates as strings. That’s why we will show you how to convert them into DateTime objects. You can use the pd.to_datetime function.

pd.to_datetime(['01-01-2021','01-02-2021','01-03-2021'])
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03'], dtype='datetime64[ns]', freq=None)

As you can see even if the date is in an unusual format, the to_datetime function can parse the dates. However, in our example, we have to set the format of the date to be sure that the first number is parsed as a month.

pd.to_datetime(['01-01-2021','01-02-2021','01-03-2021'],format="%m-%d-%Y")
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03'], dtype='datetime64[ns]', freq=None)

You can find all symbols for date formating in the following table

DirectiveDescriptionExample
%aWeekday, short versionWed
%AWeekday, full versionWednesday
%wWeekday as a number 0-6, 0 is Sunday3
%dDay of month 01-3131
%bMonth name, short versionDec
%BMonth name, full versionDecember
%mMonth as a number 01-1212
%yYear, short version, without century18
%YYear, full version2018
%HHour 00-2317
%IHour 00-1205
%pAM/PMPM
%MMinute 00-5941
%SSecond 00-5908
%fMicrosecond 000000-999999548513
%zUTC offset+0100
%ZTimezoneCST
%jDay number of year 001-366365
%UWeek number of year, Sunday as the first day of week, 00-5352
%WWeek number of year, Monday as the first day of week, 00-5352
%cLocal version of date and timeMon Dec 31 17:41:00 2018
%xLocal version of date12/31/18
%XLocal version of time17:41:00
%%A % character%
source https://www.w3schools.com/python/python_datetime.asp

Datetime functions in Pandas

Let’s get our data. For this post we will use the stock Prices of Tesla.

df=yf.download('TSLA',start='2017-11-07',  end='2021-01-01',progress=False)
df
DateTime in Pandas
DatetimeIndex: 291 entries, 2019-11-06 to 2020-12-31
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       291 non-null    float64
 1   High       291 non-null    float64
 2   Low        291 non-null    float64
 3   Close      291 non-null    float64
 4   Adj Close  291 non-null    float64
 5   Volume     291 non-null    int64  

As we can see, the index is already a Datetime Index.

If we are reading the data from a file, we can parse the dates easily by setting the parse_dates variable as a list of the column names of the Dates we want to parse.

#save a sample dataframe with dates
df.reset_index().to_csv('tesla.csv',index=False)

df=pd.read_csv('tesla.csv',parse_dates=['Date'])
df.info()
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       794 non-null    datetime64[ns]
 1   Open       794 non-null    float64       
 2   High       794 non-null    float64       
 3   Low        794 non-null    float64       
 4   Close      794 non-null    float64       
 5   Adj Close  794 non-null    float64       
 6   Volume     794 non-null    int64     

However, as we said before we may have a different format of dates and even if the default parser is great we may need to create our own just to be sure that the parsing is correct. The easiest way is by using lambda functions.

custom_dateparser = lambda x: datetime.strptime(x, "%Y-%m-%d")

df = pd.read_csv('test.csv', parse_dates=['Date'], date_parser=custom_dateparser)

Aggregate Dates

Having our index as DateTime can help us aggregate our data according to a rule using the resample function. For example, we may want to aggregate our data by year.

#Here the rule A is for End of  Year. In other words, group by Year.
df.resample(rule='A').mean()
Datetime In Pandas
#Q is for quarter end
df.resample(rule='Q').max()
Pandas Datetime overview

You can find all the availlable rules for the resample function in the following Table

AliasDescription
Bbusiness day frequency
Ccustom business day frequency
Dcalendar day frequency
Wweekly frequency
Mmonth end frequency
SMsemi-month end frequency (15th and end of month)
BMbusiness month end frequency
CBMcustom business month end frequency
MSmonth start frequency
SMSsemi-month start frequency (1st and 15th)
BMSbusiness month start frequency
CBMScustom business month start frequency
Qquarter end frequency
BQbusiness quarter end frequency
QSquarter start frequency
BQSbusiness quarter start frequency
A, Yyear end frequency
BA, BYbusiness year end frequency
AS, YSyear start frequency
BAS, BYSbusiness year start frequency
BHbusiness hour frequency
Hhourly frequency
T, minminutely frequency
Ssecondly frequency
L, msmilliseconds
U, usmicroseconds
Nnanoseconds
source: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

Shifting Dates

Using the same alias as resample we can shift the dates in our Data using the tshift function.

#M is for end of month
df.tshift(freq='M').head()
How To Unlock The Power Of Datetime In Pandas 1

It changes the index of the dataframe to be the end of the month for every date of the same month.

If you are new to Pandas, we may want to check the Pandas Group by Tips.

To leave a comment for the author, please follow the link and comment on their blog: Python – Predictive Hacks.

Want to share your content on python-bloggers? click here.