Import data into the querier (now on Pypi), a query language for Data Frames

[This article was first published on T. Moudiki's Webpage - Python, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

The querier is a query language that helps you to retrieve data from Python Data Frames. In this post from October 25, we presented the querier and different verbs constituting its grammar for wrangling data: concat, delete, drop, filtr, join, select, summarize, update, request. In this other post from November 22, we showed how our querier verbs can be composed to form data wrangling pipelines.

The querier is now available on Pypi, and can be installed from the command line as:

pip install querier

We now show how to import data from csv and SQL databases. In example 1, we import data from csv. Then in example 2 we import data from a relational database (sqlite3).

Example 1

Import data from csv, and chain the querier’s operations select, filtr, summarize.

import pandas as pd
import querier as qr
import sqlite3 
import sys


# data -----

url = ('https://raw.github.com/pandas-dev'
   '/pandas/master/pandas/tests/data/tips.csv')


# Example 1 - Import from csv -----

qrobj1 = qr.Querier(source=url)

df1 = qrobj1\
.select(req="tip, sex, smoker, time")\
.filtr(req="smoker == 'No'")\
.summarize(req="sum(tip), sex, time", group_by="sex, time")\
.df

print(df1)
   sum_tip     sex    time
0    88.28  Female  Dinner
1    61.49  Female   Lunch
2   243.17    Male  Dinner
3    58.83    Male   Lunch

Example 2

Import data from an sqlite3 database, and chain the querier’s operations select, filter, summarize.

# Example 2 - Import from sqlite3 -----

# an sqlite3 database  connexion
con = sqlite3.connect('people.db')
 
with con:
    cur = con.cursor()    
    cur.execute("CREATE TABLE Population(id INTEGER PRIMARY KEY, name TEXT, age INT, sex TEXT)")
    cur.execute("INSERT INTO Population VALUES(NULL,'Michael',19, 'M')")
    cur.execute("INSERT INTO Population VALUES(NULL,'Sandy', 41, 'F')")
    cur.execute("INSERT INTO Population VALUES(NULL,'Betty', 34, 'F')")
    cur.execute("INSERT INTO Population VALUES(NULL,'Chuck', 12, 'M')")
    cur.execute("INSERT INTO Population VALUES(NULL,'Rich', 24, 'M')")
    
# create querier object from the sqlite3 database 
qrobj2 = qr.Querier(source='people.db', table="Population")    

# filter on people with age >= 20
df2 = qrobj2.select(req="name, age, sex").filtr(req="age >= 20").df

print("df2: ")
print(df2)
print("\n")

# avg. age for people with age >= 20, groupped by sex
qrobj3 = qr.Querier(source='people.db', table="Population")  
df3 = qrobj3.select(req="name, age, sex").filtr(req="age >= 20")\
.summarize("avg(age), sex", group_by="sex").df

print("df3: ")
print(df3)
print("\n")
df2: 
    name  age sex
1  Sandy   41   F
2  Betty   34   F
4   Rich   24   M


df3: 
   avg_age sex
0     37.5   F
1     24.0   M

Note: I am currently looking for a gig. You can hire me on Malt or send me an email: thierry dot moudiki at pm dot me. I can do descriptive statistics, data preparation, feature engineering, model calibration, training and validation, and model outputs’ interpretation. I am fluent in Python, R, SQL, Microsoft Excel, Visual Basic (among others) and French. My résumé? Here!

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python.

Want to share your content on python-bloggers? click here.