Use ChatGPT to Translate SQL between DBMS

This article was first published on Technical Posts – The Data Scientist , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Each of modern DBMS such as MySQL, PostgreSQL, SQL Server and Oracle have its own specific dialect of SQL and procedure language, so queries and stored procedures need some adjustments when converting between them. For instance, it may use different data types, built-in functions or SQL clauses.

ChatGPT has proven capabilities in translation between different programming languages as well as code generation from scratch. This whitepaper explores how good it can translate SQL and procedure language between dialects of the most popular DBMS.

Example 1. Simple SELECT Query

Let’s start from translating this SELECT-query from MySQL to PostgreSQL, MS SQL and Oracle:

SELECT publisher_id,
       GROUP_CONCAT(book_id) AS book_ids -- merge book IDs per publisher
FROM books
WHERE publish_date BETWEEN now()-1 year AND NOW()
GROUP BY publisher_id;

In PostgreSQL: 

SELECT publisher_id,
       STRING_AGG(book_id::text, ',') AS book_ids -- merge book IDs per publisher
FROM books
WHERE publish_date BETWEEN NOW() - INTERVAL '1 year' AND NOW()
GROUP BY publisher_id;

As we can see, ChatGPT replaced GROUP_CONCAT by STRING_AGG, used explicit types casting to comply with PostgreSQL requirements of strict types matching and used keyword INTERVAL to get the same date as now of previous year. The query is converted correctly.

In MS SQL: 

SELECT publisher_id, 
       STRING_AGG(CAST(book_id AS VARCHAR), ',') AS book_ids -- merge book IDs per publisher 
FROM books
WHERE publish_date BETWEEN DATEADD(YEAR, -1, GETDATE()) AND GETDATE()
GROUP BY publisher_id;

For this conversion ChatGPT replaced GROUP_CONCAT by STRING_AGG casting book_id to VARCHAR for aggregation, converted NOW into GETDATE and used DATEADD(YEAR, -1, GETDATE()) to get the same date as now of previous year. The query is converted correctly.

In Oracle: 

SELECT publisher_id, -- Selects the publisher ID 
LISTAGG(book_id, ',') WITHIN GROUP (ORDER BY book_id) AS book_ids -- merge book IDs per publisher, ordered by book_id 
FROM books 
WHERE publish_date BETWEEN ADD_MONTHS(SYSDATE, -12) AND SYSDATE 
GROUP BY publisher_id;

For this conversion ChatGPT replaced GROUP_CONCAT by LISTAGG, converted NOW to SYSDATE and used ADD_MONTHS(SYSDATE, -12) to get the date 1 year ago. The query is converted correctly.

Ok, simple conversion goes well, let us try with something more sophisticated.

Example 2. Stored Procedure Returning a Result Set

Now ask ChatGPT to translate this stored procedure from T-SQL to PL/pgSQL

In MS SQL: 

CREATE PROCEDURE GetSalesByRange 
@DateFrom DATETIME, 
@DateTo DATETIME 
AS
SELECT 
C.CustomerID, 
C.FirstName, 
C.LastName, 
C.Email AS CustomerEmail, 
C.Mobile, 
C.Address, 
C.City, 
C.ZipCode, 
CY.[Name] AS Country, 
TT.TicketTypeID, 
TT.[Name] AS TicketType, 
PZ.PriceZoneID, 
PZ.[Name] AS PriceZone, 
ST.FinalPrice AS Price, 
ST.Created, 
ST.TransactionType, 
COALESCE(
VME.ExternalEventID, 
IIF(E.ExternalID = '', NULL, E.ExternalID), '0') AS ExternalID, 
E.EventID, 
ST.RowName, 
ST.SeatName 
FROM [Event] E WITH (NOLOCK) 
INNER JOIN EventCache EC WITH (NOLOCK) ON E.EventID = EC.EventID 
INNER JOIN SaleTicket ST WITH (NOLOCK) ON E.EventID = ST.EventID 
INNER JOIN Customer C WITH (NOLOCK) ON ST.CustomerID = C.CustomerID 
INNER JOIN Country CY WITH (NOLOCK) ON C.CountryID = CY.CountryID 
INNER JOIN TicketType TT WITH (NOLOCK) ON ST.TicketTypeID = TT.TicketTypeID 
INNER JOIN PriceZone PZ WITH (NOLOCK) ON ST.PriceZoneID = PZ.PriceZoneID 
LEFT OUTER JOIN VenueManagementEvent VME ON VME.EventID = E.EventID 
WHERE ST.Created BETWEEN @DateFrom AND @DateTo 
ORDER BY ST.Created 
GO 

And ChatGPT replies: 

CREATE OR REPLACE FUNCTION CustomerGetSalesByRange(
DateFrom TIMESTAMP,
DateTo TIMESTAMP
)
RETURNS TABLE(
CustomerID INT, 
FirstName TEXT, 
LastName TEXT, 
CustomerEmail TEXT,
   Mobile TEXT, 
Address TEXT, 
City TEXT, 
ZipCode TEXT, 
Country TEXT,
TicketID INT, 
TicketType TEXT, 
PriceZoneID INT, 
PriceZone TEXT, 
Price DECIMAL, 
Created TIMESTAMP, 
TransactionType TEXT, 
ExternalID TEXT, 
EventID INT, 
Section TEXT, 
RowName TEXT, 
SeatName TEXT
) AS $$
BEGIN

  — Return the query result

  RETURN QUERY

  SELECT
     C.CustomerID, C.FirstName, C.LastName, 
C.Email AS CustomerEmail, 
C.Mobile, C.Address, C.City, 
C.ZipCode, 
CY.Name AS Country, 
ST.TicketID, TT.TicketTypeID, 
TT.Name AS TicketType, 
PZ.PriceZoneID, 
PZ.Name AS PriceZone, 
ST.FinalPrice AS Price, 
ST.Created, ST.TransactionType,
     COALESCE(
VME.ExternalEventID, 
CASE WHEN E.ExternalID = '' THEN NULL ELSE E.ExternalID END, 
'0'
) AS ExternalID,
     E.EventID, 
ES."Name" AS Section, 
ST.RowName, ST.SeatName
  FROM "Event" E
     INNER JOIN EventCache EC ON E.EventID = EC.EventID
     INNER JOIN SaleTicket ST ON E.EventID = ST.EventID
     INNER JOIN EventSection ES ON ST.EventSectionID = ES.EventSectionID
     INNER JOIN Customer C ON ST.CustomerID = C.CustomerID
     INNER JOIN Country CY ON C.CountryID = CY.CountryID
     INNER JOIN TicketType TT ON ST.TicketTypeID = TT.TicketTypeID
     INNER JOIN PriceZone PZ ON ST.PriceZoneID = PZ.PriceZoneID
     LEFT JOIN VenueManagementEvent VME ON VME.EventID = E.EventID
  WHERE ST.Created BETWEEN DateFrom AND DateTo
  ORDER BY ST.Created;
END;
$$ LANGUAGE plpgsql;

In this conversion WITH (NOLOCK) hint is specific to SQL Server was removed since it doesn’t exist in PostgreSQL. DATETIME type was converted into TIMESTAMP. The IIF function of SQL Server was replaced by similar CASE WHEN pattern of PostgreSQL. Those transformation are all correct.

At the same time ChatGPT attempted to return a table to simplify using this function and made assumption about data types for this purpose. This is not correct in terms of semantics because the converter cannot get enough information from this code to make correct guess about data types.  

The workaround could be opening a cursor for result of the query and returning it to the caller. This time we put explicit request to use cursor in the command prompt of ChatGPT:

Convert this T-SQL code to PL/pgSQL, return cursor instead of table

Here is the result of conversion made by ChatGPT:

CREATE OR REPLACE FUNCTION GetSalesByRange(
DateFrom TIMESTAMP, 
DateTo TIMESTAMP
)
RETURNS REFCURSOR AS $$
DECLARE
    result_cursor REFCURSOR;
BEGIN

    — Open the cursor for the desired query

    OPEN result_cursor FOR

    SELECT 
        C.CustomerID, 
        C.FirstName, 
        C.LastName, 
        C.Email AS CustomerEmail, 
        C.Mobile, 
        C.Address, 
        C.City, 
        C.ZipCode, 
        CY."Name" AS Country, 
        TT.TicketTypeID, 
        TT."Name" AS TicketType, 
        PZ.PriceZoneID, 
        PZ."Name" AS PriceZone, 
        ST.FinalPrice AS Price, 
        ST.Created, 
        ST.TransactionType, 
        COALESCE(
            VME.ExternalEventID, 
            CASE WHEN E.ExternalID = '' THEN NULL ELSE E.ExternalID END, '0') AS ExternalID, 
        E.EventID, 
        ST.RowName, 
        ST.SeatName 
    FROM "Event" E
    INNER JOIN "EventCache" EC ON E.EventID = EC.EventID 
    INNER JOIN "SaleTicket" ST ON E.EventID = ST.EventID 
    INNER JOIN "Customer" C ON ST.CustomerID = C.CustomerID 
    INNER JOIN "Country" CY ON C.CountryID = CY.CountryID 
    INNER JOIN "TicketType" TT ON ST.TicketTypeID = TT.TicketTypeID 
    INNER JOIN "PriceZone" PZ ON ST.PriceZoneID = PZ.PriceZoneID 
    LEFT JOIN "VenueManagementEvent" VME ON VME.EventID = E.EventID 
    WHERE ST.Created BETWEEN DateFrom AND DateTo 
    ORDER BY ST.Created;

    — Return the cursor to the caller

    RETURN result_cursor;
END;
$$ LANGUAGE plpgsql;

Let’s examine the converted code. It declares a REFCURSOR variable named result_cursor. The cursor is opened with the query to retrieve sales data based on the provided date range (DateFrom and DateTo). The cursor is returned to the caller using the RETURN result_cursor statement. The conversion is correct.

ChatGPT Pitfalls and Solutions

Being a great assistant for code translation and migration tasks, ChatGPT or any other AI might face a challenge it cannot resolve. For example:

  1. Misinterpretation of the code design. As we can see from Example 2, sometimes ChatGPT may not correctly recognize the algorithm and/or design of the translated code. As a solution, you can provide detailed description of what the code is supposed to do, precisely specify meaning of input and output arguments. Split the code into simpler parts.
  2. Unfamiliarity with some features of DBMS. Different databases have distinguished proprietary features, which can lead to confusion or errors when translating queries, and ChatGPT may not be able to resolve all of this properly. Before migrating to a new database management system, make sure that you or your colleagues have deep expertise in specific features and syntax of that system.
  3. Performance Issues. Even if the same logic is implemented across different dialects, performance can be dramatically distinguished due to optimizations or inefficiencies in query planning, indexing, and caching mechanisms specific to each DBMS.

There are common recommendations to mitigate pitfalls:

  • After translating code, comprehensive testing is crucial. This includes functional testing to ensure that the translated code produces the same results, as well as performance testing to ensure that query efficiency is maintained.
  • For critical areas of translation (like data types, functions, and procedural language differences), refer to the official documentation of both the source and target DBMS.
  • While ChatGPT can assist with translations, manual intervention is often necessary to ensure that the translated code works optimally and behaves as expected in the new environment.

Audit of Translation

Besides direct translation of the specified SQL and procedure language fragments, ChatGPT can offer the following capabilities: 

  • Suggest Alternatives. Some proprietary features of the source DBMS may not have direct equivalents in the target. In this case, ChatGPT can suggest a workaround. 
  • Guide Through Complex Translation. ChatGPT can guide users through complex translations, highlighting challenges and potential issues in execution of resulting query or converted stored procedure, function, trigger, etc. This includes differences in processing transaction, error handling, indexing, and other nuances. 
  • Fine tune performance. ChatGPT can suggest ways to simplify overly complex queries by breaking down subqueries or removing unnecessary joins. For example, replacing correlated subqueries with JOIN operations can improve performance in many cases. If the query involves a large JOIN, ChatGPT might recommend the creation of indexes on the join keys. It can also point out when covering indexes might be beneficial to avoid extra table lookups.

By providing those insights and recommendations, ChatGPT helps to obtain smooth and safe translation of SQL queries or procedure language between popular RDBMS.

To leave a comment for the author, please follow the link and comment on their blog: Technical Posts – The Data Scientist .

Want to share your content on python-bloggers? click here.