SQL overlapping time ranges

There was a nice stack overflow question today, on one of my favorite SQL themes of overlapping date ranges.

I was quite pleased with my “find distinct’s, join, and pick the winners per block” code:

with data (ITEM_ID, PRICE, START_DATE, END_DATE, LOADED_DATETIME) as (
    select * from values
    ('A', 1.00,	'2023-01-01'::date, '2023-01-31'::date, '2023-11-01'::date),
    ('A', 2.00,	'2023-01-10'::date, '2023-01-15'::date, '2023-11-02'::date),
    ('A', 3.00,	'2023-01-14'::date, '2023-01-17'::date, '2023-11-03'::date),
    ('A', 4.00,	'2023-01-15'::date, '2023-01-18'::date, '2023-11-04'::date),
    ('B', 4.00,	'2023-01-10'::date, '2023-01-31'::date, '2023-11-01'::date)
), dist_dates as (
    select distinct item_id, start_date from data
    union 
    select distinct item_id, end_date from data 
), id_date_ranges as (
    select 
        item_id, 
        start_date, 
        lead(start_date) over (partition by item_id 
            order by start_date) as end_date
    from dist_dates
    qualify end_date is not null
)
select 
    dr.*,
    d.price,
    d.loaded_datetime
from id_date_ranges as dr
join data as d
    on d.item_id = dr.item_id 
        and dr.start_date < d.end_date
        and d.start_date < dr.end_date
qualify row_number() over (partition by dr.item_id, dr.start_date 
    order by d.loaded_datetime desc) = 1
order by 1,2;

But if you use highly overlapping data like the above, you get results like:

Row 4 & 5 are over sliced
Row 4 & 5 are over sliced

This could be worked around with a blind merge, but that might merge rows that appear the same but where not part of the same event/streak to be merged. The workaround is to allocate a sequence value to each row, before chopping, and then after the join, only merges rows of the same source id.

so using SQE8() and CONDITIONAL_CHANGE_EVENT we get:

with data (ITEM_ID, PRICE, START_DATE, END_DATE, LOADED_DATETIME) as (
    select * from values
    ('A', 1.00,	'2023-01-01'::date, '2023-01-31'::date, '2023-11-01'::date),
    ('A', 2.00,	'2023-01-10'::date, '2023-01-15'::date, '2023-11-02'::date),
    ('A', 3.00,	'2023-01-14'::date, '2023-01-17'::date, '2023-11-03'::date),
    ('A', 4.00,	'2023-01-15'::date, '2023-01-18'::date, '2023-11-04'::date),
    ('B', 4.00,	'2023-01-10'::date, '2023-01-31'::date, '2023-11-01'::date)
), sdata as (
    select 
        *, 
        seq8() as seq_id 
    from data
), dist_dates as (
    select distinct item_id, start_date from data
    union 
    select distinct item_id, end_date from data 
), id_date_ranges as (
    select 
        item_id, 
        start_date, 
        lead(start_date) over (partition by item_id 
            order by start_date) as end_date
    from dist_dates
    qualify end_date is not null
), over_sliced as (
    select 
        dr.*,
        d.price,
        d.loaded_datetime
        ,d.seq_id
    from id_date_ranges as dr
    join sdata as d
        on d.item_id = dr.item_id 
            and dr.start_date < d.end_date
            and d.start_date < dr.end_date
    qualify row_number() over (partition by dr.item_id, dr.start_date 
        order by d.loaded_datetime desc) = 1
)
select
    *
    ,conditional_change_event(seq_id) over (partition by item_id 
        order by start_date) as cce
from over_sliced
order by 1,2;
Row 4 & 5 identified as from the same source event
Row 4 & 5 identified as from the same source event

So now we can min/max those dates, to get a single row back:

with data (ITEM_ID, PRICE, START_DATE, END_DATE, LOADED_DATETIME) as (
    select * from values
    ('A', 1.00,	'2023-01-01'::date, '2023-01-31'::date, '2023-11-01'::date),
    ('A', 2.00,	'2023-01-10'::date, '2023-01-15'::date, '2023-11-02'::date),
    ('A', 3.00,	'2023-01-14'::date, '2023-01-17'::date, '2023-11-03'::date),
    ('A', 4.00,	'2023-01-15'::date, '2023-01-18'::date, '2023-11-04'::date),
    ('B', 4.00,	'2023-01-10'::date, '2023-01-31'::date, '2023-11-01'::date)
), sdata as (
    select 
        *, 
        seq8() as seq_id 
    from data
), dist_dates as (
    select distinct item_id, start_date from data
    union 
    select distinct item_id, end_date from data 
), id_date_ranges as (
    select 
        item_id, 
        start_date, 
        lead(start_date) over (partition by item_id 
            order by start_date) as end_date
    from dist_dates
    qualify end_date is not null
), over_sliced as (
    select 
        dr.*,
        d.price,
        d.loaded_datetime
        ,d.seq_id
    from id_date_ranges as dr
    join sdata as d
        on d.item_id = dr.item_id 
            and dr.start_date < d.end_date
            and d.start_date < dr.end_date
    qualify row_number() over (partition by dr.item_id, dr.start_date 
        order by d.loaded_datetime desc) = 1
), slices_grouped as (
    select
        *
        ,conditional_change_event(seq_id) over (partition by item_id 
            order by start_date) as cce
    from over_sliced
)
select 
    item_id
    ,min(start_date) as start_date
    ,max(end_date) as end_date
    ,any_value(price) as price
    ,any_value(loaded_datetime) as loaded_datetime
from slices_grouped
group by item_id, cce
order by 1,2;
All fixed
All fixed