Want to share your content on python-bloggers? click here.
I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible
SQL.
A major complaint with SQL is that it composes statements by right-ward nesting.
That is: a sequence of operations A -> B -> C
is represented as SELECT C FROM SELECT B FROM SELECT A
. However, the SQL 99 standard introduced the WITH
statement and common table expressions (ref). WITH
statements allow forward composition.
Let’s take a look at asking the data_algebra
(ref) to emit SQL with and without common table expressions.
First we set up some example data.
import sqlite3
from data_algebra.data_ops import *
import data_algebra.test_util
import data_algebra.SQLite
d = data_algebra.default_data_model.pd.DataFrame({
'x': [1, 2, 3]
})
d
x | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
Next we set up our calculations. Please note the order they are performed.
ops = describe_table(d, table_name='d') .\
extend({'z': 'x + 1'}) .\
extend({'q': 'z + 2'}) .\
extend({'h': 'q + 3'})
ops
TableDescription( table_name='d', column_names=[ 'x']) .\ extend({ 'z': 'x + 1'}) .\ extend({ 'q': 'z + 2'}) .\ extend({ 'h': 'q + 3'})
res_pandas = ops.transform(d)
res_pandas
x | z | q | h | |
---|---|---|---|---|
0 | 1 | 2 | 4 | 7 |
1 | 2 | 3 | 5 | 8 |
2 | 3 | 4 | 6 | 9 |
expect = data_algebra.default_data_model.pd.DataFrame({
'x': [1, 2, 3],
'z': [2, 3, 4],
'q': [4, 5, 6],
'h': [7, 8, 9]
})
assert data_algebra.test_util.equivalent_frames(res_pandas, expect)
db_model = data_algebra.SQLite.SQLiteModel()
with sqlite3.connect(":memory:") as conn:
db_model.prepare_connection(conn)
db_handle = db_model.db_handle(conn)
db_handle.insert_table(d, table_name='d')
sql_regular = db_handle.to_sql(ops, pretty=True, use_with=False, annotate=True)
res_regular = db_handle.read_query(sql_regular)
sql_with = db_handle.to_sql(ops, pretty=True, use_with=True, annotate=True)
res_with = db_handle.read_query(sql_with)
assert data_algebra.test_util.equivalent_frames(res_regular, expect)
assert data_algebra.test_util.equivalent_frames(res_with, expect)
The standard nested SQL for these operations looks like the following.
print(sql_regular)
SELECT -- extend({ 'h': 'q + 3'}) "x", "z", "q", "q" + 3 AS "h" FROM (SELECT -- extend({ 'q': 'z + 2'}) "x", "z", "z" + 2 AS "q" FROM (SELECT -- extend({ 'z': 'x + 1'}) "x", "x" + 1 AS "z" FROM "d") "extend_0") "extend_1"
Notice the variables are generated in reverse order (h, q, z instead lf z, q, h).
The common table expression version looks like this, which involves less nesting and values move forward notation.
print(sql_with)
WITH "extend_0" AS (SELECT -- extend({ 'z': 'x + 1'}) "x", "x" + 1 AS "z" FROM "d"), "extend_1" AS (SELECT -- extend({ 'q': 'z + 2'}) "x", "z", "z" + 2 AS "q" FROM "extend_0") SELECT -- extend({ 'h': 'q + 3'}) "x", "z", "q", "q" + 3 AS "h" FROM "extend_1"
It is interesting to note when WITH
or common table expressions became widely available. The Wikipedia has the versions (and hence dates) (ref) showing when common table expressions are supported in the following
databases.
- Teradata (starting with version 14) (2012)
- Microsoft SQL Server (starting with version 2005)
- Oracle (with recursion since 11g release 2) (2009)
- PostgreSQL (since 8.4) (2009)
- MariaDB (since 10.2) (2017)
- MySQL (since 8.0) (2016)
- SQLite (since 3.8.3) (2014)
- DB2 (starting with version 11.5 Mod Pack 2 (ref) (2019)
Some of the cost of implementing common table expressions, is they are where databases allow recursive or fixed-point semantic extensions. From the database point of view these are major semantic changes, not mere notational conveniences.
Want to share your content on python-bloggers? click here.