One of the queries in our application is used to edit event entries. Event entries have a start time and an end time, but when they are processed from the data files, they can be entered with just a start time (ie when the event started), but in cases of problems (crashes) these entries will not have matching entries with end time.
So in our editor, we are wanting to know the upper limit of when the event could have ended, which is the next start time.
ROWID TIME_START TIME_END OTHER_FIELDS
1 2008-1-1 null data
2 2008-1-2 2008-1-3 more data
So for row 1 we want 2008-1-2 as the suggested end time.
Originally we had a query that looked like this:
SELECT other_fields,
TIME_START,
TIME_END,
NVL((
SELECT MIN(TIME_START)
FROM some_view
WHERE MACHINE_ID = T1.MACHINE_ID
AND TIME_START > T1.TIME_START
), TIME_START + 1/24) "NEXT_TIME_START"
FROM some_view T1
WHERE MACHINE_ID = machine_id
AND NVL(TIME_END,
NVL((
SELECT MIN(TIME_START)
FROM some_view WHERE MACHINE_ID = T1.MACHINE_ID
AND TIME_START > T1.TIME_START),
TIME_START + 1/24)) > start_time_range
AND TIME_START < end_time_range ORDER BY TIME_START
So this is a little trickier than described above as it deals with the edge case of when there is no next entry. In that case the current start time plus 1 hour is used.
This performs fine with small data sets, but when we had over a million rows in that view, the selection of one days data (~2700 rows) took 551 seconds (>9 minutes), and this was one of many large views, so the overall effect was a >30 minute wait.
I rewrote the query with the use of LEAD, and the performance went to 4 seconds (with data retrieval) or 0.6 seconds for a count(*) operation. Here’s that query now using LEAD.
SELECT *
FROM(SELECT other_fields,
TIME_START,
TIME_END,
NVL( LEAD(TIME_START) OVER (PARTITION BY machine_id ORDER BY time_start),
TIME_START + 1/24) AS "NEXT_TIME_START",
FROM some_view T1
WHERE MACHINE_ID = machine_id )
WHERE NEXT_TIME_START > start_time_range
AND TIME_START < end_time_range
ORDER BY TIME_START
Now that I review the code I notice I also rearranged the code to not repeat the next_time_start
calculation, so I am now not sure all the performance improvements can be attributed to LEAD, but I’ll take that ~138 times improvement either way.