How to Create Dummy Pandas Data Frames

[This article was first published on Python – Predictive Hacks, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Mainly for testing purposes, sometimes we want to create some dummy data frames. Pandas give us this possibility with the util.testing package.


Dummy Data Frame

By default, it creates 30 rows with 4 columns called A,B,C and D and the index alpha-numeric.

import pandas as pd
pd.util.testing.makeDataFrame()
 
ABCD
1kd8TfzEiX1.032690-0.5739850.671357-0.005690
jvoji1iqSq-0.921305-1.4221190.799405-0.757337
9kVZnQU29u0.7726751.559491-0.585057-1.661675
EUriSO6l9C-0.489347-1.317456-1.0844170.217104
LuyVlcJAgF-1.5590431.473184-0.0299680.250103

Dummy Data Frame with Missing Values

It assigns some NaN values randomly.

pd.util.testing.makeMissingDataframe()
  
ABCD
8SscPSnyy3-0.1868940.8677060.976297-0.768294
h3cvhbkSWTNaN0.083227-0.5703440.633503
CI0V1MUGal-0.025917-1.909735-0.270712-1.622608
IeLbykQMB2NaN-0.4149580.479902-1.418628
QDn4bxJpAU-0.602611-1.1102270.425438-0.467016

Dummy Data Frame of Time-Series format

Here the index is as Time Series

pd.util.testing.makeTimeDataFrame()
 
ABCD
2000-01-030.8242571.3672411.448037-0.649556
2000-01-04-0.6404700.1892390.681814-0.737980
2000-01-05-0.8288751.2398000.0037760.744634
2000-01-061.0566021.6608390.546301-0.521864
2000-01-070.285226-0.2698750.697068-0.295571

Dummy Data Frame of Mixed Types

It creates a mixed dummy data containing categorical, date-time and continuous variables.

pd.util.testing.makeMixedDataFrame()
 
ABCD
00.00.0foo12009-01-01
11.01.0foo22009-01-02
22.00.0foo32009-01-05
33.01.0foo42009-01-06
44.00.0foo52009-01-07

Dummy Data Frame with Periodical data

It creates dummy data frames with periodical data.

pd.util.testing.makePeriodFrame()
 
ABCD
2000-01-031.5865590.2906120.609690-0.155839
2000-01-04-0.540105-0.478986-1.0649011.302807
2000-01-051.1135940.611258-0.574987-1.149406
2000-01-06-0.841371-0.2949330.023008-0.097956
2000-01-07-0.0802832.5888330.0054250.150920

More rows and columns?

In case we want more rows and columns than the default which are 30 and 4 respectively, we can define the testing.N as the number of rows and testing.K as the number of columns.

pd.util.testing.N = 10
pd.util.testing.K = 5
pd.util.testing.makeDataFrame()
 

To leave a comment for the author, please follow the link and comment on their blog: Python – Predictive Hacks.

Want to share your content on python-bloggers? click here.