2008-08-11

Oracle's Lag in MS SQL Server 2005

I am currently porting our new database from Oracle 10g to MS SQL Server 2005, and I have it all done except the views that use the Oracle LAG and LEAD functions (non-ANSI).

What these functions provide (for the MS SQL camp) is the ability to get the next or previous rows when sorted. In my case I have a value that is the ‘volume change since the start’ at time intervals, and I want the relative change between each interval.

So PL/SQL of the view is:

SELECT t.*,
    t.volume - LAG(volume) OVER (PARTITION BY group_number  ORDER BY timestamp) AS volume_change
FROM volume_table t;

The partition clause splits the data into different buckets, then each bucket is sorted, with all results returned.

Asking on the NZ .Net User Group mailing list I got a pointer to this MS feedback page, but the solution presented there gives me an error “Incorrect syntax near ‘ROWS’.“ when I run this query against SQL 2K5

SELECT MIN(volume) OVER(PARTITION BY group_number  ORDER BY timestamp
    ROWS BETWEEN 1 PRECEDING  AND 1 PRECEDING) change
FROM volume_table; GO

I had a side point showing why I wanted to avoid sub-select, as the performance of a different query had an orders of magnitude improvement from changing to using a LAG function, yet that same sub-select query runs just as fast as the “improved” Oracle statement in MS SQL Server, so I’ll just stick to the main topic, and post about that another day…

Chris recently showed how to use Common Table Expressions (CTE) (sort of auto-magic temp table) to find the first entries for a day, which is very close to what I was want, but the filtering is hard coded. I could not see how to make it dynamic, so I used the idea, and started massaging the concept, till I finally got what I wanted.

Conceptually the Oracle solution could be done using cursors under the hood to provide the rolling previous (LAG) rows, where-as here I’m doing many look-ups but the table is not getting re-created as in the nested select method.

So my code is as follows:

WITH Rows( vol_diff, time, rn, gn ) AS (
  SELECT v.volume,
    v.timestamp,
    Row_Number() OVER (PARTITION BY group_number ORDER BY timestamp),
    group_number
  FROM volume_table v
), PrevRows( timestamp, prev_vol, group_number) AS (
  SELECT a.time, b.vol_diff, a.gn
  FROM Rows a
  LEFT JOIN Rows b
    ON a.rn = b.rn + 1  AND a.gn = b.gn
)
SELECT v.*, v.volume - p.prev_vol as volume_change
FROM volume_table v
LEFT JOIN PrevRows p
    ON v.timestamp = p.timestamp
    AND v.group_number = p.group_number; GO

So I use two CTE tables, one to partition and sort the data, the second to do a lag based join, then I can select the lagged based data, by matching the time and group to the current entry.It works a treat, and I will do some performance testing tomorrow once my production data has finished loading into my db.

After the results of the not discussed query I expect that the sub-select will be just as performant.