Category Archives for Query Optimization

Repeating Calculations in SQL

Don’t Repeat Yourself (SQL Version)

Sometimes we have a calculation in a query, and we need to use the result of that calculation in multiple places.

The way I usually see that done is to repeat the calculation multiple times.  For example:

SELECT DISTINCT
    A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent) AS FirstValue,
    100 - (A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent)) AS SecondValue,
    25 * (A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent)) AS ThirdValue
FROM
    TableOne A
    INNER JOIN TableTwo B ON B.AKey = A.AKey
    INNER JOIN TableThree C ON C.BKey = B.BKey
    INNER JOIN TableFour D ON D.CKey = C.CKey

There’s one main calculation in there, which is being used for multiple columns:

A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent)

Besides being difficult to read and understand, it’s really easy to make a mistake when you have to do updates.

And this is a fairly simple, made-up example.  I’ve seen queries that used CASE statements to perform different calculations depending on various criteria, with the entire CASE statement repeated multiple times throughout the query.  Ugly stuff…

When writing code, we try to use the Don’t Repeat Yourself (DRY) Principle, but this applies to queries as well.

The Solution

What we can do is simplify the main query to get rid of the repeated code, and then wrap it with another query that uses those calculated values:

SELECT
    InnerValue, (100 - InnerValue) AS SecondValue, (25 * InnerValue) AS ThirdValue
FROM
(
    SELECT DISTINCT
        A.SomeNumber * LOG(B.NextNumber) + POWER(C.ThisNumber, D.Exponent) AS InnerValue
    FROM
        TableOne A
        INNER JOIN TableTwo B ON B.AKey = A.AKey
        INNER JOIN TableThree C ON C.BKey = B.BKey
        INNER JOIN TableFour D ON D.CKey = C.CKey
) AS T

The main calculation is now done in a subquery.  The outer query uses InnerValue for the SecondValue and ThirdValue columns, instead of repeating the calculation itself.

So now the calculation is written only once, and only needs to be updated in one place.  Plus, it’s much easier to read, understand, and debug.