Thursday, January 3, 2013

Recursive subquery graph traversing

In December a user Silpa asked a question on AskTom on "Bi-directional hierarchical query," which inspired me to fool around with recursive subquery factoring (available from version 11.2) giving Silpa a solution which he seemed to find useful. Since then I've fooled around a little more with it, particularly concerning cycles in the graph data.

Silpa gave a table like this for testing:

create table network_table (
   origin      number
 , destination number
)
/

And some data as well:

insert into network_table values (11, 12)
/
insert into network_table values (12, 13)
/
insert into network_table values (14, 11)
/
insert into network_table values (14, 15)
/
insert into network_table values (14, 16)
/
insert into network_table values (14, 17)
/
insert into network_table values (18, 11)
/
insert into network_table values (19, 110)
/
insert into network_table values (111, 112)
/
insert into network_table values (23, 15)
/
insert into network_table values (23, 24)
/
commit
/

What I found very useful was drawing these data as nodes in a graph:


Silpas requirement was to start at a given node and select the entire network reachable from that node - no matter which direction the arrow points. If starting at for example node 11, the select should return all the blue (origin, destination) connections.

Silpa had tried using CONNECT BY hierachical queries, but it didn't quite work out. I came up with this solution using recursive subquery factoring instead:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select n.origin
        , n.destination
        , 11 origin_recur
        , case n.origin
             when 11 then destination
                     else origin
          end destination_recur
        , 1 lvl
     from network_table n
    where n.origin = 11
       or n.destination = 11
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search breadth first by origin_recur, destination_recur set ordering
select origin
     , destination
     , origin_recur
     , destination_recur
     , lvl
  from net
 order by ordering
/

Giving this output of the 9 blue arrows from the drawing:

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL
---------- ----------- ------------ ----------------- -----
        11          12           11                12     1
        14          11           11                14     1
        18          11           11                18     1
        12          13           12                13     2
        14          15           14                15     2
        14          16           14                16     2
        14          17           14                17     2
        23          15           15                23     3
        23          24           23                24     4

The "trick" is that we create origin_recur and destination_recur. In the "starting" part of the subquery (before UNION ALL) origin_recur is always 11 (my starting point) and destination_recur is "the other node". In the recursive part of the subquery (after UNION ALL) we find the next level by joining on either origin or destination equal to "prior" destination_recur (plus we make sure we don't "double back" on ourselves.) And for that level we again make sure origin_recur is "where we came from" and destination_recur is "the other node", no matter which way the arrow originally points. In other words: origin_recur and destination_recur represents the blue arrows but they just all have the direction modified to always be the direction we traverse the graph.

This technique would have been impossible to do with a CONNECT BY, as the recursion is done on calculated columns. In a CONNECT BY clause PRIOR can only be used on table columns, not calculated column aliases. But that is possible in version 11.2 using the recursive subquery factoring.

The above output shows all level 1's first, then level 2, and so on. That is caused by the BREADTH FIRST clause. It can be changed to DEPTH FIRST:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select n.origin
        , n.destination
        , 11 origin_recur
        , case n.origin
             when 11 then destination
                     else origin
          end destination_recur
        , 1 lvl
     from network_table n
    where n.origin = 11
       or n.destination = 11
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
select origin
     , destination
     , origin_recur
     , destination_recur
     , lvl
  from net
 order by ordering
/

Which gives us output that traverses all the way down each branch first before going to the next branch (looking perhaps a little more like you would be used to from CONNECT BY):

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL
---------- ----------- ------------ ----------------- -----
        11          12           11                12     1
        12          13           12                13     2
        14          11           11                14     1
        14          15           14                15     2
        23          15           15                23     3
        23          24           23                24     4
        14          16           14                16     2
        14          17           14                17     2
        18          11           11                18     1

So far so good, but what happens when you add another connection to the graph:

insert into network_table values (24, 11)
/

Shown on the graph (the green arrow) we see we have a cycle:


If we run the sql again, we get this error:

ERROR at line 1:
ORA-32044: cycle detected while executing recursive WITH query

Recursive subquery factoring has a CYCLE clause that can help us to avoid this error:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select n.origin
        , n.destination
        , 11 origin_recur
        , case n.origin
             when 11 then destination
                     else origin
          end destination_recur
        , 1 lvl
     from network_table n
    where n.origin = 11
       or n.destination = 11
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select origin
     , destination
     , origin_recur
     , destination_recur
     , lvl
     , is_cycle
  from net
 order by ordering
/

Giving us this output:

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL I
---------- ----------- ------------ ----------------- ----- -
        11          12           11                12     1 N
        12          13           12                13     2 N
        14          11           11                14     1 N
        14          15           14                15     2 N
        23          15           15                23     3 N
        23          24           23                24     4 N
        24          11           24                11     5 N
        11          12           11                12     6 N
        12          13           12                13     7 N
        14          11           11                14     6 Y
        18          11           11                18     6 N
        14          16           14                16     2 N
        14          17           14                17     2 N
        18          11           11                18     1 N
        24          11           11                24     1 N
        23          24           24                23     2 N
        23          15           23                15     3 N
        14          15           15                14     4 N
        14          11           14                11     5 N
        11          12           11                12     6 N
        12          13           12                13     7 N
        18          11           11                18     6 N
        24          11           11                24     6 Y
        14          16           14                16     5 N
        14          17           14                17     5 N

The CYCLE clause creates a column IS_CYCLE that is Y when a cycle has been detected. As destination_recur is specified, it will be Y whenever a destination_recur is found that already exists as destination_recur in the same branch of the recursion. Therefore we get a lot of repeats in the output, because the new connection from 24 to 11 is not detected as a cycle, but when we continue again from 11 then we repeat a subbranch we have already visited and get a cycle.

We can improve on it by starting our recursion in a different way. We can create a fictional "level 0" which makes certain that when we go around and hit 11 again, it will be recognized as a cycle:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select 0 origin
        , 11 destination
        , 0 origin_recur
        , 11 destination_recur
        , 0 lvl
     from dual
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select origin
     , destination
     , origin_recur
     , destination_recur
     , lvl
     , is_cycle
  from net
 order by ordering
/

Giving us a somewhat better output:

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL I
---------- ----------- ------------ ----------------- ----- -
         0          11            0                11     0 N
        11          12           11                12     1 N
        12          13           12                13     2 N
        14          11           11                14     1 N
        14          15           14                15     2 N
        23          15           15                23     3 N
        23          24           23                24     4 N
        24          11           24                11     5 Y
        14          16           14                16     2 N
        14          17           14                17     2 N
        18          11           11                18     1 N
        24          11           11                24     1 N
        23          24           24                23     2 N
        23          15           23                15     3 N
        14          15           15                14     4 N
        14          11           14                11     5 Y
        14          16           14                16     5 N
        14          17           14                17     5 N

This time when it gets to 24->11, it declares a cycle and stops. But there remains another problem: we traverse the cycle in the graph twice, once in each direction. The CYCLE clause cannot help us here.

A simple solution can be to just filter away our fictional level 0 as well as the cycles, and then a plain DISTINCT will remove our duplicates:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select 0 origin
        , 11 destination
        , 0 origin_recur
        , 11 destination_recur
        , 0 lvl
     from dual
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select distinct
       origin
     , destination
  from net
 where is_cycle != 'Y'
   and lvl != 0
/

This gives us the 10 connections of the network:

    ORIGIN DESTINATION
---------- -----------
        11          12
        12          13
        14          11
        14          15
        23          15
        23          24
        14          17
        18          11
        24          11
        14          16

That approach may be OK for many purposes, but it is not guaranteed to be in the correct order. This time it looks like we were lucky, but we cannot be certain that for example a HASH distinct will keep the order for us.

So we can instead use the analytic function ROW_NUMBER to help us:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select 0 origin
        , 11 destination
        , 0 origin_recur
        , 11 destination_recur
        , 0 lvl
     from dual
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select origin
     , destination
     , origin_recur
     , destination_recur
     , lvl
     , is_cycle
     , row_number() over (
          partition by origin, destination
          order by ordering
       ) rn
     , ordering
  from net
 order by ordering
/

The result will be that the first occurrence in order of ORDERING of a given (origin, destination) pair will be numbered 1, subsequent occurrences numbered 2,...:

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL I  RN   ORDERING
---------- ----------- ------------ ----------------- ----- - --- ----------
         0          11            0                11     0 N   1          1
        11          12           11                12     1 N   1          2
        12          13           12                13     2 N   1          3
        14          11           11                14     1 N   1          4
        14          15           14                15     2 N   1          5
        23          15           15                23     3 N   1          6
        23          24           23                24     4 N   1          7
        24          11           24                11     5 Y   1          8
        14          16           14                16     2 N   1          9
        14          17           14                17     2 N   1         10
        18          11           11                18     1 N   1         11
        24          11           11                24     1 N   2         12
        23          24           24                23     2 N   2         13
        23          15           23                15     3 N   2         14
        14          15           15                14     4 N   2         15
        14          11           14                11     5 Y   2         16
        14          16           14                16     5 N   2         17
        14          17           14                17     5 N   2         18

With this approach the "reverse" traversal of the cycle in the graph gets RN=2, so we can filter those away:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select 0 origin
        , 11 destination
        , 0 origin_recur
        , 11 destination_recur
        , 0 lvl
     from dual
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select *
  from (
   select origin
        , destination
        , origin_recur
        , destination_recur
        , lvl
        , is_cycle
        , row_number() over (
             partition by origin, destination
             order by ordering
          ) rn
        , ordering
     from net
       )
 where rn = 1
   and lvl != 0
 order by ordering
/

Giving us an output of the desired 10 rows:

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL I  RN   ORDERING
---------- ----------- ------------ ----------------- ----- - --- ----------
        11          12           11                12     1 N   1          2
        12          13           12                13     2 N   1          3
        14          11           11                14     1 N   1          4
        14          15           14                15     2 N   1          5
        23          15           15                23     3 N   1          6
        23          24           23                24     4 N   1          7
        24          11           24                11     5 Y   1          8
        14          16           14                16     2 N   1          9
        14          17           14                17     2 N   1         10
        18          11           11                18     1 N   1         11

Using this method we actually could use the original recursion rather than the one with a fictional level 0:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select n.origin
        , n.destination
        , 11 origin_recur
        , case n.origin
             when 11 then destination
                     else origin
          end destination_recur
        , 1 lvl
     from network_table n
    where n.origin = 11
       or n.destination = 11
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select *
  from (
   select origin
        , destination
        , origin_recur
        , destination_recur
        , lvl
        , is_cycle
        , row_number() over (
             partition by origin, destination
             order by ordering
          ) rn
        , ordering
     from net
       )
 where rn = 1
 order by ordering
/

It gives us the same rows as before:

    ORIGIN DESTINATION ORIGIN_RECUR DESTINATION_RECUR   LVL I  RN   ORDERING
---------- ----------- ------------ ----------------- ----- - --- ----------
        11          12           11                12     1 N   1          1
        12          13           12                13     2 N   1          2
        14          11           11                14     1 N   1          3
        14          15           14                15     2 N   1          4
        23          15           15                23     3 N   1          5
        23          24           23                24     4 N   1          6
        24          11           24                11     5 N   1          7
        18          11           11                18     6 N   1         11
        14          16           14                16     2 N   1         12
        14          17           14                17     2 N   1         13

But we do notice a slight problem: The 18->11 connection is shown here to level 6 where it is more correct to call it level 1.

So we stick to the solution with "level 0" giving us this final query stripped of superfluous information:

with net (origin, destination, origin_recur, destination_recur, lvl)
as (
   select 0 origin
        , 11 destination
        , 0 origin_recur
        , 11 destination_recur
        , 0 lvl
     from dual
   union all
   select n.origin
        , n.destination
        , net.destination_recur origin_recur
        , case n.origin
             when net.destination_recur then n.destination
                                        else n.origin
          end destination_recur
        , net.lvl + 1 lvl
     from net
     join network_table n
          on (n.origin = net.destination_recur and n.destination != net.origin_recur)
          or (n.destination = net.destination_recur and n.origin != net.origin_recur)
   )
   search depth first by origin_recur, destination_recur set ordering
   cycle destination_recur set is_cycle to 'Y' default 'N'
select origin
     , destination
     , lvl
  from (
   select origin
        , destination
        , lvl
        , row_number() over (
             partition by origin, destination
             order by ordering
          ) rn
        , ordering
     from net
       )
 where rn = 1
   and lvl != 0
 order by ordering
/

Giving this final output:

    ORIGIN DESTINATION   LVL
---------- ----------- -----
        11          12     1
        12          13     2
        14          11     1
        14          15     2
        23          15     3
        23          24     4
        24          11     5
        14          16     2
        14          17     2
        18          11     1

All in all a fun challenge and definitely a case for me to understand the differences of using a recursive subquery rather than a connect by hierarcical query :-) .

3 comments:

  1. Kim,

    very nice !

    We discussed a similar, but slightly simpler problem on
    http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:489772591421#5496106000346987396

    Matthias

    ReplyDelete
  2. Hi,

    any idea why I get the error "ORA-30654: missing DEFAULT keyword" error as soon as I try to use "cycle destination_recur set is_cycle to 'Y' default 'N'"?

    Thanks!

    ReplyDelete
  3. Hi, Karl

    I haven't experienced that error, but there's a fellow with similar error on Oracle Forums:
    https://community.oracle.com/thread/3516715?start=0&tstart=0

    Seems he found that the error occurs with cursor_sharing=force only.
    If that is the case for you as well (I haven't tested) then it would appear that the cursor sharing code that exchanges literals for bind variables possibly erroneously exchanges the 'N' (and perhaps the 'Y') to a bind variable, where the recursive subquery factoring cycle clause requires a literal.

    If you are using cursor sharing force and find that cursor sharing exact solves the error, then you have a case for opening an SR with support or searching MOS for a possible patch that may already exists.

    ReplyDelete