CROSS APPLY Keeps Me Sane

There hasn’t been a great deal of routine from year to year throughout my career.  Microsoft has moved from VB6 to progressively more advanced incarnations of the .NET framework and versions of SQL Server have added Intellisense and more features than I could have imagined when I first started working with 6.5.  CRUD is a constant, though, particularly the “R” when I’ve needed to display information in a UI or generate reports whether I own the application in question or I’m working with systems developed by third parties.  Like many developers I’ve found myself working with tables that weren’t designed 100% in the way that I’d like or that users want to see the values so there’s always a certain amount of transformation and finagling that happens between the tables and the results that I need.  Add requirements to filter and sort query results on arbitrarily-transformed fields and the SQL involved can be more than a little convoluted to write, decipher, and maintain.  Enter CROSS APPLY.
I once inherited a report that was intended to calculate the average amount of time that it took for administrative users to complete their assigned tasks in a particular system. The basic structure was more or less sensible, making use of a derived table to get the aggregates for each user (simplified for brevity):
SELECT t.FullName, t.TotalTasks,
---additional fields
FROM
(
SELECT ul.FullName,
SUM (CASE WHEN ut.task_completed = 1 THEN 1 ELSE 0 END) CompletedTasks,
AVG(DATEDIFF (minute, ut.create_timestamp ,ut.complete_timestamp)) AverageMinutesToComplete
FROM dbo.user_list ul
LEFT OUTER JOIN dbo.user_tasks ut ON ut.owner_id= ul.[user_id]
WHERE ut.create_timestamp >= @startDate AND ut.create_timestamp <= @endDate
) t

Since the task table stores the creation and completion dates without the overhead of an additional column to hold the calculated “time to complete”, the only way to get the information we’re after is to perform the calculation in the moment for the smallest unit of time that we want to see and then build from there whenever we want to add hours, days, or weeks.

Seeing DATEDIFF used to get the average number of minutes it takes each user to complete their assigned tasks, it seems logical that we’d use the same approach to get those other units as they’re needed and then do the math so that instead of “1 hour and 65 minutes” the report should show “1 hour and 5 minutes”, “25 hours and 5 minutes” should be “1 day, 1 hour, and 5 minutes”, and so on.  Instead, the original query used a slightly different method for getting those values.

SELECT t.FullName, t.TotalTasks,
CASE WHEN
CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END > 0 THEN CAST(FLOOR (t.AverageMinutesToComplete/(24 *60.0)) as varchar(3)) + ' day'
+
CASE WHEN FLOOR (t.AverageMinutesToComplete/(24 *60.0)) >= 2 THEN 's ' ELSE ' ' END
ELSE '' END
AS AverageDaysToComplete,
CASE WHEN
CASE WHEN (t.AverageMinutesToComplete - ((CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * ( 24*60 ))) >= 60
THEN CAST (FLOOR(( t.AverageMinutesToComplete - ( (CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * (24* 60)) )/60.0) as varchar(2))
ELSE 0 END > 0 THEN
CAST
(
CASE WHEN (t.AverageMinutesToComplete - ((CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * ( 24*60 ))) >= 60
THEN CAST (FLOOR(( t.AverageMinutesToComplete - ( (CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * (24* 60)) )/60.0) as varchar(5 ))
ELSE 0 END
AS varchar(2)) + ' hour' +
CASE WHEN
CASE WHEN (t.AverageMinutesToComplete - ((CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * ( 24*60 ))) >= 60
THEN CAST (FLOOR(( t.AverageMinutesToComplete - ( (CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END) * (24* 60)) )/60.0) as varchar(2))
ELSE 0 END  >= 2
THEN 's ' ELSE ' ' END
ELSE '' END
AS AverageHoursToComplete,
---remainder of query

While those two statements handle the requirements that the report will display the time to completion as “2 days, 1 hour, and 5 minutes” by making sure that each full day’s worth of minutes is subtracted from the total before calculating the number of hours and then going on in a third statement to adjust for the full days and hours to find the number of minutes remaining, I find it much more difficult to read and hold in my head than some of the other options.

Using DATEDIFF to get the average number of days and hours before applying the math to display the values realistically would require less typing and make more sense than the massive amount of text above, but for the purposes of this example it can serve as a placeholder for transformations that are necessarily complex and don’t have alternatives built into T-SQL.  Taking the CROSS APPLY approach to shift those calculations out of the query’s field list and into the FROM clause alongside the tables and derived tables.

SELECT
--field list,
CASE WHEN days.AverageDaysToComplete >= 1 THEN
    CAST(days.AverageDaysToComplete as varchar(3))  + 'day' +
         CASE WHEN days.AverageDays >= 2 THEN 's ' ELSE ' ' END AverageDaysToComplete,
CASE WHEN hours.AverageHoursToComplete >= 1 THEN
    CAST(hours.AverageHoursToComplete as varchar(2))  + 'hour' +
         CASE WHEN hours.AverageHoursToComplete >= 2 THEN 's ' ELSE ' ' END AverageHoursToComplete ,
FROM
--derived table
CROSS APPLY (SELECT CASE WHEN t.AverageMinutesToComplete >= (24* 60)
THEN FLOOR (t.AverageMinutesToComplete/(24 *60.0))
ELSE 0 END AS AverageDaysToComplete ) days
CROSS APPLY (SELECT CASE WHEN (t.AverageMinutesToComplete - (days.AverageDaysToComplete * ( 24*60 ))) >= 60
THEN FLOOR(( t.AverageMinutesToComplete - (days.AverageDaysToComplete * (24* 60)) )/60.0))
ELSE 0 END AS AverageHoursToComplete) hours

In this instance , there’s no real performance gain when using the CROSS APPLY approach vs. leaving all of the heavy lifting inline with the rest of the fields in the SELECT but there are other instances where it may outperform a correlated subquery.  Apart from that, declaring those values once and treating them as columns that can be used anywhere throughout the query without the need to recalculate them every time they’re needed can prevent typos, logical errors, carpal tunnel, and migraines in this and numerous other cases.

Leave a Reply

Your email address will not be published. Required fields are marked *

*