Friday, September 27, 2013

Top selling items - revisited in 12c

April last year I blogged about TOP-N reporting using Top selling items as example. In Oracle 12c we now have a new FETCH FIRST syntax, so in this post I'll revisit the Top selling items example showing where and how FETCH FIRST can be used and where you still need to know the analytic function methods.


I use the same tables of items and monthly sales as in last years blog post:

create table items(
   item  varchar2(10) primary key,
   grp   varchar2(10),
   name  varchar2(20)
)
/

create table sales (
   item  varchar2(10) references items (item),
   mth   date,
   qty   number
)
/

begin
  insert into items values ('101010','AUTO','Brake disc');
  insert into items values ('102020','AUTO','Snow chain');
  insert into items values ('103030','AUTO','Sparc plug');
  insert into items values ('104040','AUTO','Oil filter');
  insert into items values ('105050','AUTO','Light bulb');
  
  insert into items values ('201010','MOBILE','Handsfree');
  insert into items values ('202020','MOBILE','Charger');
  insert into items values ('203030','MOBILE','iGloves');
  insert into items values ('204040','MOBILE','Headset');
  insert into items values ('205050','MOBILE','Cover');
  
  insert into sales values ('101010',date '2011-04-01',10);
  insert into sales values ('101010',date '2011-05-01',11);
  insert into sales values ('101010',date '2011-06-01',12);
  insert into sales values ('102020',date '2011-03-01', 7);
  insert into sales values ('102020',date '2011-07-01', 8);
  insert into sales values ('103030',date '2011-01-01', 6);
  insert into sales values ('103030',date '2011-02-01', 9);
  insert into sales values ('103030',date '2011-11-01', 4);
  insert into sales values ('103030',date '2011-12-01',14);
  insert into sales values ('104040',date '2011-08-01',22);
  insert into sales values ('105050',date '2011-09-01',13);
  insert into sales values ('105050',date '2011-10-01',15);
  
  insert into sales values ('201010',date '2011-04-01', 5);
  insert into sales values ('201010',date '2011-05-01', 6);
  insert into sales values ('201010',date '2011-06-01', 7);
  insert into sales values ('202020',date '2011-03-01',21);
  insert into sales values ('202020',date '2011-07-01',23);
  insert into sales values ('203030',date '2011-01-01', 7);
  insert into sales values ('203030',date '2011-02-01', 7);
  insert into sales values ('203030',date '2011-11-01', 6);
  insert into sales values ('203030',date '2011-12-01', 8);
  insert into sales values ('204040',date '2011-08-01',35);
  insert into sales values ('205050',date '2011-09-01',13);
  insert into sales values ('205050',date '2011-10-01',15);
  
  commit;
end;
/

And the same base query to compute total sales for 2011 by item:

select i.grp
     , i.item
     , max(i.name) name
     , sum(s.qty) qty
  from items i
  join sales s
    on s.item = i.item
 where s.mth between date '2011-01-01' and date '2011-12-01'
 group by i.grp, i.item
 order by i.grp, sum(s.qty) desc, i.item
/

GRP        ITEM       NAME                        QTY
---------- ---------- -------------------- ----------
AUTO       101010     Brake disc                   33
AUTO       103030     Sparc plug                   33
AUTO       105050     Light bulb                   28
AUTO       104040     Oil filter                   22
AUTO       102020     Snow chain                   15
MOBILE     202020     Charger                      44
MOBILE     204040     Headset                      35
MOBILE     203030     iGloves                      28
MOBILE     205050     Cover                        28
MOBILE     201010     Handsfree                    18

10 rows selected.

Pre-12c we could use analytic function ROW_NUMBER() like this:

select g.item
     , g.name
     , g.qty
  from (
    select i.item
         , max(i.name) name
         , sum(s.qty) qty
         , row_number() over (order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
       and i.grp = 'AUTO'
     group by i.item
) g
 where g.rnk <= 3
 order by g.rnk, g.item
/

ITEM       NAME                        QTY
---------- -------------------- ----------
103030     Sparc plug                   33
101010     Brake disc                   33
105050     Light bulb                   28

select g.item
     , g.name
     , g.qty
  from (
    select i.item
         , max(i.name) name
         , sum(s.qty) qty
         , row_number() over (order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
       and i.grp = 'MOBILE'
     group by i.item
) g
 where g.rnk <= 3
 order by g.rnk, g.item
/

ITEM       NAME                        QTY
---------- -------------------- ----------
202020     Charger                      44
204040     Headset                      35
203030     iGloves                      28

For the MOBILE group using ROW_NUMBER gives us just one of the two items that sold 28 pieces.

Same results can be found using 12c FETCH FIRST [number] ROWS ONLY:

select i.item
     , max(i.name) name
     , sum(s.qty) qty
  from items i
  join sales s
    on s.item = i.item
 where s.mth between date '2011-01-01' and date '2011-12-01'
   and i.grp = 'AUTO'
 group by i.item
 order by sum(s.qty) desc
 fetch first 3 rows only
/

ITEM       NAME                        QTY
---------- -------------------- ----------
103030     Sparc plug                   33
101010     Brake disc                   33
105050     Light bulb                   28

select i.item
     , max(i.name) name
     , sum(s.qty) qty
  from items i
  join sales s
    on s.item = i.item
 where s.mth between date '2011-01-01' and date '2011-12-01'
   and i.grp = 'MOBILE'
 group by i.item
 order by sum(s.qty) desc
 fetch first 3 rows only
/

ITEM       NAME                        QTY
---------- -------------------- ----------
202020     Charger                      44
204040     Headset                      35
203030     iGloves                      28

Let us look at the explain plan for using ROW_NUMBER analytic function:


And then the explain plan for using FETCH FIRST 3 ROWS ONLY:



They are rather identical. FETCH FIRST [number] ROWS ONLY is internally rewritten to use ROW_NUMBER() function - the difference being in the first plan there is an explicit inline view with filter predicate G.RNK <= 3, while the second plan has a system generated inline view filtering on rowlimit_$$_rownumber.

Using both these methods, we only get one of items iGloves and Cover. Which one will be "random" in the sense that Oracle is lazy and just returns the first one it retrieves by whatever access plan it happens to use.

In order to get consistent results we can decide how to "break the ties" if add some more to the ORDER BY to make it unique. For example we might decide that in case of ties, always return lowest item number first:

         , row_number() over (order by sum(s.qty) desc, i.item) rnk

Or similarly for 12c:

 order by sum(s.qty) desc, i.item
 fetch first 3 rows only


In order to actually retrieve the tied rows, we can use analytic function RANK() rather than ROW_NUMBER():

select g.item
     , g.name
     , g.qty
  from (
    select i.item
         , max(i.name) name
         , sum(s.qty) qty
         , rank() over (order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
       and i.grp = 'AUTO'
     group by i.item
) g
 where g.rnk <= 3
 order by g.rnk, g.item
/

ITEM       NAME                        QTY
---------- -------------------- ----------
101010     Brake disc                   33
103030     Sparc plug                   33
105050     Light bulb                   28

select g.item
     , g.name
     , g.qty
  from (
    select i.item
         , max(i.name) name
         , sum(s.qty) qty
         , rank() over (order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
       and i.grp = 'MOBILE'
     group by i.item
) g
 where g.rnk <= 3
 order by g.rnk, g.item
/

ITEM       NAME                        QTY
---------- -------------------- ----------
202020     Charger                      44
204040     Headset                      35
203030     iGloves                      28
205050     Cover                        28

In 12c the same can be accomplished using FETCH FIRST [number] ROWS WITH TIES:

select i.item
     , max(i.name) name
     , sum(s.qty) qty
  from items i
  join sales s
    on s.item = i.item
 where s.mth between date '2011-01-01' and date '2011-12-01'
   and i.grp = 'AUTO'
 group by i.item
 order by sum(s.qty) desc
 fetch first 3 rows with ties
/

ITEM       NAME                        QTY
---------- -------------------- ----------
103030     Sparc plug                   33
101010     Brake disc                   33
105050     Light bulb                   28

select i.item
     , max(i.name) name
     , sum(s.qty) qty
  from items i
  join sales s
    on s.item = i.item
 where s.mth between date '2011-01-01' and date '2011-12-01'
   and i.grp = 'MOBILE'
 group by i.item
 order by sum(s.qty) desc
 fetch first 3 rows with ties
/

ITEM       NAME                        QTY
---------- -------------------- ----------
202020     Charger                      44
204040     Headset                      35
203030     iGloves                      28
205050     Cover                        28

WITH TIES syntax works the way that once the desired number of rows has been fetched, if there are any more rows with the same value as the last one, then those rows will also be fetched.

Again we can compare the explain plan of using analytic RANK:


With the explain plan of FETCH FIRST 3 ROWS WITH TIES


Again they are very much alike - WITH TIES is rewritten to use RANK() and there is a filter predicate on rowlimit_$$_rank.

One difference here is that using analytic function RANK allows us to control the output order of the tied rows, because we have two ORDER BY clauses:

, rank() over (order by sum(s.qty) desc) rnk
 ...
 order by g.rnk, g.item

The ORDER BY within the analytic function is non-unique, therefore we get our ties. But we can guarantee consistent results by making the final ORDER BY in the select "unique."

This is not possible with FETCH FIRST ROWS WITH TIES syntax as we only have one ORDER BY clause, so the order that iGloves and Cover will be presented is non-deterministic - it can be different from execution to execution if the access plan changes. It probably won't matter, but some end users might become confused if the report suddenly changes the order of output ;-).


With the analytic functions we have a third method for a TOP-N query. If we define TOP-N as "find the top N values of QTY, return all items that have sold those quantities", then we should get four rows for group AUTO. This can be accomplished using DENSE_RANK():

select g.item
     , g.name
     , g.qty
  from (
    select i.item
         , max(i.name) name
         , sum(s.qty) qty
         , dense_rank() over (order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
       and i.grp = 'AUTO'
     group by i.item
) g
 where g.rnk <= 3
 order by g.rnk, g.item
/

ITEM       NAME                        QTY
---------- -------------------- ----------
101010     Brake disc                   33
103030     Sparc plug                   33
105050     Light bulb                   28
104040     Oil filter                   22

select g.item
     , g.name
     , g.qty
  from (
    select i.item
         , max(i.name) name
         , sum(s.qty) qty
         , dense_rank() over (order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
       and i.grp = 'MOBILE'
     group by i.item
) g
 where g.rnk <= 3
 order by g.rnk, g.item
/

ITEM       NAME                        QTY
---------- -------------------- ----------
202020     Charger                      44
204040     Headset                      35
203030     iGloves                      28
205050     Cover                        28

This cannot (yet) be accomplished by FETCH FIRST syntax. But as we have seen that FETCH FIRST is internally rewritten as using the analytic functions, then we could easily imagine a syntax similar to this:

select i.item
     , max(i.name) name
     , sum(s.qty) qty
  from items i
  join sales s
    on s.item = i.item
 where s.mth between date '2011-01-01' and date '2011-12-01'
   and i.grp = 'AUTO'
 group by i.item
 order by sum(s.qty) desc
 fetch first 3 rows dense_ranked /* hypothetical and unsupported syntax */
/

Or perhaps something like:

 fetch first 3 values /* hypothetical and unsupported syntax */

Maybe in some future release this will be possible ;-).


One thing that the analytic method can do is produce TOP-N in "partitions", like for example this one:

select g.grp
     , g.item
     , g.name
     , g.qty
  from (
    select i.grp
         , i.item
         , max(i.name) name
         , sum(s.qty) qty
         , rank() over (partition by i.grp order by sum(s.qty) desc) rnk
      from items i
      join sales s
        on s.item = i.item
     where s.mth between date '2011-01-01' and date '2011-12-01'
     group by i.grp, i.item
) g
 where g.rnk <= 3
 order by g.grp, g.rnk, g.item
/

GRP        ITEM       NAME                        QTY
---------- ---------- -------------------- ----------
AUTO       101010     Brake disc                   33
AUTO       103030     Sparc plug                   33
AUTO       105050     Light bulb                   28
MOBILE     202020     Charger                      44
MOBILE     204040     Headset                      35
MOBILE     203030     iGloves                      28
MOBILE     205050     Cover                        28

7 rows selected.

There is no PARTITION BY for the FETCH FIRST syntax, but for fun I tried doing it using another new 12c construct - the lateral inline view:

select g.grp
     , r.item
     , r.name
     , r.qty
  from (
      select 'AUTO'   grp from dual union all
      select 'MOBILE' grp from dual
  ) g
  cross join lateral (
      select i.item
           , max(i.name) name
           , sum(s.qty) qty
        from items i
        join sales s
          on s.item = i.item
       where s.mth between date '2011-01-01' and date '2011-12-01'
         and i.grp = g.grp
       group by i.item
       order by sum(s.qty) desc
       fetch first 3 rows with ties
   ) r
 order by g.grp, r.qty desc, r.item
/

GRP    ITEM       NAME                        QTY
------ ---------- -------------------- ----------
AUTO   101010     Brake disc                   33
AUTO   103030     Sparc plug                   33
AUTO   105050     Light bulb                   28
MOBILE 202020     Charger                      44
MOBILE 204040     Headset                      35
MOBILE 203030     iGloves                      28
MOBILE 205050     Cover                        28

7 rows selected.

But that uses hardcoded values for the groups, so I went further and produced this:

with data as (
   select i.grp
        , i.item
        , max(i.name) name
        , sum(s.qty) qty
     from items i
     join sales s
       on s.item = i.item
    where s.mth between date '2011-01-01' and date '2011-12-01'
    group by i.grp, i.item
)
select g.grp
     , r.item
     , r.name
     , r.qty
  from (
      select distinct
             d.grp
        from data d
  ) g
  cross join lateral (
      select d.item
           , d.name
           , d.qty
        from data d
       where d.grp = g.grp
       order by d.qty desc
       fetch first 3 rows with ties
   ) r
 order by g.grp, r.qty desc, r.item
/

GRP        ITEM       NAME                        QTY
---------- ---------- -------------------- ----------
AUTO       101010     Brake disc                   33
AUTO       103030     Sparc plug                   33
AUTO       105050     Light bulb                   28
MOBILE     202020     Charger                      44
MOBILE     204040     Headset                      35
MOBILE     203030     iGloves                      28
MOBILE     205050     Cover                        28

7 rows selected.

Sure, this eliminates the hardcoding of the groups, but at the cost of creating a temporary table that is accessed multiple times. No, this is not a good idea, this was just fun to write, but not recommended as a solution. Use the analytic function method of using PARTITION BY instead, it performs much better :-).


So the new FETCH FIRST syntax in 12c is a nice syntax to do some of the analytic TOP-N queries using simpler code more easy to read. But as yet it cannot replace the analytics in all cases, so it will do you good to know analytics even when coding for 12c :-).

The 12c documentation shows the row limiting clause with explanation as well as examples.
The script for the demos above can be downloaded here.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete