Thursday, October 18, 2012

RANGE BETWEEN and leap years

Answering a question on the OTN forum was a bit tricky to get an analytic sum using a RANGE BETWEEN that would handle leap years, but in the end I came up with a workaround that satisfies the requirement. Along the way I realized why there are two different INTERVAL datatypes :-)



Let's make a sales table to demo this:

create table sales (
   day   date
 , qty   number
)
/

And populate with some data for specific days in 2010, 2011 and 2012:

begin
   insert into sales values (date '2010-10-01', 1);
   insert into sales values (date '2010-10-02', 2);
   insert into sales values (date '2010-10-03', 3);
   insert into sales values (date '2010-10-04', 4);
   insert into sales values (date '2010-10-05', 5);
   insert into sales values (date '2010-10-06', 6);
   insert into sales values (date '2010-10-07', 7);
   insert into sales values (date '2011-10-01', 1);
   insert into sales values (date '2011-10-02', 2);
   insert into sales values (date '2011-10-03', 3);
   insert into sales values (date '2011-10-04', 4);
   insert into sales values (date '2011-10-05', 5);
   insert into sales values (date '2011-10-06', 6);
   insert into sales values (date '2011-10-07', 7);
   insert into sales values (date '2012-10-01', 1);
   insert into sales values (date '2012-10-02', 2);
   insert into sales values (date '2012-10-03', 3);
   insert into sales values (date '2012-10-04', 4);
   insert into sales values (date '2012-10-05', 5);
   insert into sales values (date '2012-10-06', 6);
   insert into sales values (date '2012-10-07', 7);
   commit;
end;
/

First business requirement is "for each day we want the sales of that day + the three preceding days" - no problem:

select to_char(day, 'YYYY-MM-DD') day
     , qty
     , sum(qty) over (
          order by day
          range between interval '3' day preceding
                    and current row
       ) four_days
  from sales
 order by day
/

DAY               QTY  FOUR_DAYS
---------- ---------- ----------
2010-10-01          1          1
2010-10-02          2          3
2010-10-03          3          6
2010-10-04          4         10
2010-10-05          5         14
2010-10-06          6         18
2010-10-07          7         22
2011-10-01          1          1
2011-10-02          2          3
2011-10-03          3          6
2011-10-04          4         10
2011-10-05          5         14
2011-10-06          6         18
2011-10-07          7         22
2012-10-01          1          1
2012-10-02          2          3
2012-10-03          3          6
2012-10-04          4         10
2012-10-05          5         14
2012-10-06          6         18
2012-10-07          7         22

21 rows selected.

Then is added another requirement "we also want the sales for the same 4 days of the previous year".
Well we might try this one then:

select to_char(day, 'YYYY-MM-DD') day
     , qty
     , sum(qty) over (
          order by day
          range between interval '3' day preceding
                    and current row
       ) four_days
     , sum(qty) over (
          order by day
          range between interval '1' year + interval '3' day preceding
                    and interval '1' year preceding
       ) four_lastyear
  from sales
 order by day
/

          range between interval '1' year + interval '3' day preceding
                                          *
ERROR at line 10:
ORA-30081: invalid data type for datetime/interval arithmetic

Now why won't that work? Hmmm... OK, we can try with days instead of years:

select to_char(day, 'YYYY-MM-DD') day
     , qty
     , sum(qty) over (
          order by day
          range between interval '3' day preceding
                    and current row
       ) four_days
     , sum(qty) over (
          order by day
          range between interval '368' day(3) preceding
                    and interval '365' day(3) preceding
       ) four_lastyear
  from sales
 order by day
/

DAY               QTY  FOUR_DAYS FOUR_LASTYEAR
---------- ---------- ---------- -------------
2010-10-01          1          1
2010-10-02          2          3
2010-10-03          3          6
2010-10-04          4         10
2010-10-05          5         14
2010-10-06          6         18
2010-10-07          7         22
2011-10-01          1          1             1
2011-10-02          2          3             3
2011-10-03          3          6             6
2011-10-04          4         10            10
2011-10-05          5         14            14
2011-10-06          6         18            18
2011-10-07          7         22            22
2012-10-01          1          1             3
2012-10-02          2          3             6
2012-10-03          3          6            10
2012-10-04          4         10            14
2012-10-05          5         14            18
2012-10-06          6         18            22
2012-10-07          7         22            18

21 rows selected.

Works for 2011 but not for 2012, because 2012 is a leap year. OK, we use a different set of days:

select to_char(day, 'YYYY-MM-DD') day
     , qty
     , sum(qty) over (
          order by day
          range between interval '3' day preceding
                    and current row
       ) four_days
     , sum(qty) over (
          order by day
          range between interval '369' day(3) preceding
                    and interval '366' day(3) preceding
       ) four_lastyear
  from sales
 order by day
/

DAY               QTY  FOUR_DAYS FOUR_LASTYEAR
---------- ---------- ---------- -------------
2010-10-01          1          1
2010-10-02          2          3
2010-10-03          3          6
2010-10-04          4         10
2010-10-05          5         14
2010-10-06          6         18
2010-10-07          7         22
2011-10-01          1          1
2011-10-02          2          3             1
2011-10-03          3          6             3
2011-10-04          4         10             6
2011-10-05          5         14            10
2011-10-06          6         18            14
2011-10-07          7         22            18
2012-10-01          1          1             1
2012-10-02          2          3             3
2012-10-03          3          6             6
2012-10-04          4         10            10
2012-10-05          5         14            14
2012-10-06          6         18            18
2012-10-07          7         22            22

21 rows selected.

Well, now 2012 works but not 2011. We need somehow to identify if the year has 366 or 365 days. For that purpose we can use the year interval:

select to_char(
          day
        , 'YYYY-MM-DD'
       ) day
     , to_char(
          day - interval '1' year
        , 'YYYY-MM-DD'
       ) day_lastyear
     , day - (day - interval '1' year) days
  from sales
 order by day
/

DAY        DAY_LASTYE       DAYS
---------- ---------- ----------
2010-10-01 2009-10-01        365
2010-10-02 2009-10-02        365
2010-10-03 2009-10-03        365
2010-10-04 2009-10-04        365
2010-10-05 2009-10-05        365
2010-10-06 2009-10-06        365
2010-10-07 2009-10-07        365
2011-10-01 2010-10-01        365
2011-10-02 2010-10-02        365
2011-10-03 2010-10-03        365
2011-10-04 2010-10-04        365
2011-10-05 2010-10-05        365
2011-10-06 2010-10-06        365
2011-10-07 2010-10-07        365
2012-10-01 2011-10-01        366
2012-10-02 2011-10-02        366
2012-10-03 2011-10-03        366
2012-10-04 2011-10-04        366
2012-10-05 2011-10-05        366
2012-10-06 2011-10-06        366
2012-10-07 2011-10-07        366

21 rows selected.

We saw that the correct range was 369-366 for leap years and 368-365 for non-leap years. So let's get data for the ranges 369, 368-366 and 365:

select to_char(day, 'YYYY-MM-DD') day
     , qty
     , sum(qty) over (
          order by day
          range between interval '3' day preceding
                    and current row
       ) four_days
     , sum(qty) over (
          order by day
          range between interval '369' day(3) preceding
                    and interval '369' day(3) preceding
       ) d1
     , sum(qty) over (
          order by day
          range between interval '368' day(3) preceding
                    and interval '366' day(3) preceding
       ) d2
     , sum(qty) over (
          order by day
          range between interval '365' day(3) preceding
                    and interval '365' day(3) preceding
       ) d3
  from sales
 order by day
/

DAY               QTY  FOUR_DAYS         D1         D2         D3
---------- ---------- ---------- ---------- ---------- ----------
2010-10-01          1          1
2010-10-02          2          3
2010-10-03          3          6
2010-10-04          4         10
2010-10-05          5         14
2010-10-06          6         18
2010-10-07          7         22
2011-10-01          1          1                                1
2011-10-02          2          3                     1          2
2011-10-03          3          6                     3          3
2011-10-04          4         10                     6          4
2011-10-05          5         14          1          9          5
2011-10-06          6         18          2         12          6
2011-10-07          7         22          3         15          7
2012-10-01          1          1                     1          2
2012-10-02          2          3                     3          3
2012-10-03          3          6                     6          4
2012-10-04          4         10          1          9          5
2012-10-05          5         14          2         12          6
2012-10-06          6         18          3         15          7
2012-10-07          7         22          4         18

21 rows selected.

And then we can calculate the correct sum:

select to_char(day, 'YYYY-MM-DD') day
     , qty
     , four_days
     , case day - (day - interval '1' year)
          when 366 then nvl(d1,0) + nvl(d2,0)
                   else nvl(d2,0) + nvl(d3,0)
       end four_lastyear
  from (
   select day
        , qty
        , sum(qty) over (
             order by day
             range between interval '3' day preceding
                       and current row
          ) four_days
        , sum(qty) over (
             order by day
             range between interval '369' day(3) preceding
                       and interval '369' day(3) preceding
          ) d1
        , sum(qty) over (
             order by day
             range between interval '368' day(3) preceding
                       and interval '366' day(3) preceding
          ) d2
        , sum(qty) over (
             order by day
             range between interval '365' day(3) preceding
                       and interval '365' day(3) preceding
          ) d3
     from sales
)
 order by day
/

DAY               QTY  FOUR_DAYS FOUR_LASTYEAR
---------- ---------- ---------- -------------
2010-10-01          1          1             0
2010-10-02          2          3             0
2010-10-03          3          6             0
2010-10-04          4         10             0
2010-10-05          5         14             0
2010-10-06          6         18             0
2010-10-07          7         22             0
2011-10-01          1          1             1
2011-10-02          2          3             3
2011-10-03          3          6             6
2011-10-04          4         10            10
2011-10-05          5         14            14
2011-10-06          6         18            18
2011-10-07          7         22            22
2012-10-01          1          1             1
2012-10-02          2          3             3
2012-10-03          3          6             6
2012-10-04          4         10            10
2012-10-05          5         14            14
2012-10-06          6         18            18
2012-10-07          7         22            22

21 rows selected.

(I used the same numbers for all three years to make it easy to spot when four_lastyear is correct :-)

Let's just take a closer look at the interval calculation:

select to_char(
          day
        , 'YYYY-MM-DD'
       ) day
     , to_char(
          day - interval '1' year - interval '3' day
        , 'YYYY-MM-DD'
       ) day_lastthree
  from sales
 order by day
/

DAY        DAY_LASTTH
---------- ----------
2010-10-01 2009-09-28
2010-10-02 2009-09-29
2010-10-03 2009-09-30
2010-10-04 2009-10-01
2010-10-05 2009-10-02
2010-10-06 2009-10-03
2010-10-07 2009-10-04
2011-10-01 2010-09-28
2011-10-02 2010-09-29
2011-10-03 2010-09-30
2011-10-04 2010-10-01
2011-10-05 2010-10-02
2011-10-06 2010-10-03
2011-10-07 2010-10-04
2012-10-01 2011-09-28
2012-10-02 2011-09-29
2012-10-03 2011-09-30
2012-10-04 2011-10-01
2012-10-05 2011-10-02
2012-10-06 2011-10-03
2012-10-07 2011-10-04

21 rows selected.

No problem. That calculation is first substracting an INTERVAL YEAR TO MONTH literal from a DATE column - the result will be a DATE. Then it subtracts an INTERVAL DAY TO SECOND literal from the resulting DATE which in turn returns yet another DATE. Perfect.

Now look at this one - strictly "mathematically" this should be equivalent:

select to_char(
          day
        , 'YYYY-MM-DD'
       ) day
     , to_char(
          day - (interval '1' year + interval '3' day)
        , 'YYYY-MM-DD'
       ) day_lastthree
  from sales
 order by day
/

          day - (interval '1' year + interval '3' day)
                                   *
ERROR at line 6:
ORA-30081: invalid data type for datetime/interval arithmetic

Now we fail - because we cannot add an INTERVAL DAY TO SECOND datatype to an INTERVAL YEAR TO MONTH.

Why not? Because an interval of one year or an interval of one month cannot be expressed as a certain number of days!

Subtract 1 year from a date in October 2012 will subtract 366 days, but subtracting 1 year from a date in October 2011 will subtract 365 days.

And so it is not possible to express "one year and 3 days" in a single interval expression - it needs two interval expressions of two different datatypes.

And therefore it cannot be used in a RANGE BETWEEN expression - we need to do the cryptic workaround shown above.

No comments:

Post a Comment