Slow sub-select maybe LAG or LEAD might help

One of the queries in our application is used to edit event entries.  Event entries have a start time and an end time, but when they are processed from the data files, they can be entered with just a start time (ie when the event started), but in cases of problems (crashes) these entries will not have matching entries with end time.

So in our editor, we are wanting to know the upper limit of when the event could have ended, which is the next start time.

ROWID TIME_START TIME_END OTHER_FIELDS
1 2008-1-1 null data
2 2008-1-2 2008-1-3 more data

So for row 1 we want 2008-1-2 as the suggested end time.

Originally we had a query that looked like this:

SELECT other_fields,
  TIME_START,
  TIME_END,
  NVL((SELECT MIN(TIME_START)           FROM some_view
         WHERE MACHINE_ID = T1.MACHINE_ID
           AND TIME_START > T1.TIME_START),
       TIME_START + 1/24) "NEXT_TIME_START"
  FROM some_view T1
  WHERE MACHINE_ID = machine_id
    AND NVL(TIME_END,
            NVL((SELECT MIN(TIME_START)
                   FROM some_view WHERE MACHINE_ID = T1.MACHINE_ID
                   AND TIME_START > T1.TIME_START),
                TIME_START + 1/24)) > start_time_range
    AND TIME_START < end_time_range ORDER BY TIME_START

 

So this is a little trickier than described above as it deals with the edge case of when there is no next entry.  In that case the current start time plus 1 hour is used. I coloured all the optional/variables yellow.

This performs fine with small data sets, but when we had over a million rows in that view, the selection of one days data (~2700 rows) took 551 seconds (>9 minutes), and this was one of many large views, so the overall effect was a >30 minute wait.

I rewrote the query with the use of LEAD, and the performance went to 4 seconds (with data retrieval) or 0.6 seconds for a count(*) operation.  Here’s that query now using LEAD.

SELECT *
    FROM(SELECT other_fields,
           TIME_START,
           TIME_END,
           NVL( LEAD(TIME_START) OVER (PARTITION BY machine_id ORDER BY time_start),
                TIME_START + 1/24) AS "NEXT_TIME_START",
           FROM some_view T1
           WHERE MACHINE_ID = machine_id )
    WHERE NEXT_TIME_START > start_time_range
      AND TIME_START < end_time_range
    ORDER BY TIME_START

 

Now that I review the code I notice I also rearranged the code to not repeat the next_time_start calculation, so I am now not sure all the performance improvements can be attributed to LEAD, but I’ll take that ~138 times improvement either way.