Monday, June 23, 2014

MEMBER OF comparison of PL/SQL and SQL

In the Kscope14 sunday symposium today, Steven Feuerstein explained that MEMBER OF syntax was slow in SQL and fast in PL/SQL. I challenged him that perhaps it was missing indexes on the nested table? My mistake - I got the task of testing it and see if that was the case... So I tested and was surprised at the answer.

UPDATE 2014-10-16: Based on Iudiths comment to this post, I decided to re-test this - both to delve a bit deeper into whether she has a point or not, and also to try it out on a 12.1.0.2.0 instance.

So these tests are now executed on 12.1.0.2.0:

We creat a nested table type and a table with a column of that type and populate it:

create type nested_varchar as table of varchar2(100)
/

create table father_table (
   id       integer primary key
 , name     varchar2(100)
 , children nested_varchar
)
   nested table children store as child_table
/

declare
   nested_variable   nested_varchar := nested_varchar();
begin
   nested_variable.extend(999999);
   for i in 1..999999 loop
      nested_variable(i) := 'Member number '||to_char(i,'FM099999');
   end loop;
   insert into father_table values (
      1
    , 'The father of 999999 children'
    , nested_variable
   );
   commit;
end;
/

begin
   dbms_stats.gather_table_stats(user,'FATHER_TABLE');
   dbms_stats.gather_table_stats(user,'CHILD_TABLE');
end;
/

Then we test MEMBER OF syntax in PL/SQL and SQL and compare to using WHERE on the child table in SQL:

declare
   nested_variable   nested_varchar := nested_varchar();
   time0             number;
   time1             number;
   time2             number;
   time3             number;
   time4             number;
   foundid           father_table.id%type;
begin
   time0 := dbms_utility.get_time;
   select children
     into nested_variable
     from father_table
    where id = 1;
   time1 := dbms_utility.get_time;
   if 'Member number 007500' member of nested_variable then
      dbms_output.put_line('PLSQL member found');
   end if;
   time2 := dbms_utility.get_time;
   begin
      select /* before index */ id
        into foundid
        from father_table
       where 'Member number 007500' member of children;
      dbms_output.put_line('SQL member found');
   exception
      when no_data_found then
         null;
   end;
   time3 := dbms_utility.get_time;
   begin
      select /* before index */ distinct f.id
        into foundid
        from father_table f
           , table ( f.children ) c
       where c.column_value = 'Member number 007500';
      dbms_output.put_line('SQL table found');
   exception
      when no_data_found then
         null;
   end;
   time4 := dbms_utility.get_time;
   dbms_output.put_line('Fill nested: '||(time1-time0));
   dbms_output.put_line('PLSQL hsecs: '||(time2-time1));
   dbms_output.put_line('SQL 1 hsecs: '||(time3-time2));
   dbms_output.put_line('SQL 2 hsecs: '||(time4-time3));
end;
/

DBMS_OUTPUT of the above looks like this:

PLSQL member found
SQL member found
SQL table found
Fill nested: 47
PLSQL hsecs: 0
SQL 1 hsecs: 29
SQL 2 hsecs: 4

It takes some time to populate the nested variable from the data in the tables, yes. But when PL/SQL has the nested table populated, it is very fast.

Definitely the SQL MEMBER OF looks to be the slowest implementation of MEMBER OF, but let's look at a couple of query plans:

      select /* before index */ id
        from father_table
       where 'Member number 007500' member of children;

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    20 |  1647   (1)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   TABLE ACCESS FULL| FATHER_TABLE |     1 |    20 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| CHILD_TABLE  |   999K|    36M|  1644   (1)| 00:00:01 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter('Member number 007500'MEMBER OF"CHILDREN")
   3 - filter("NESTED_TABLE_ID"=:B1)

Two full table scans, naturally, we do not have any indexes in place. Looking then at the TABLE version:

      select /* before index */ distinct f.id
        from father_table f
           , table ( f.children ) c
       where c.column_value = 'Member number 007500';

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |     1 |    58 |  1646   (1)| 00:00:01 |
|   1 |  HASH UNIQUE                  |              |     1 |    58 |  1646   (1)| 00:00:01 |
|   2 |   NESTED LOOPS                |              |     1 |    58 |  1645   (1)| 00:00:01 |
|   3 |    NESTED LOOPS               |              |     1 |    58 |  1645   (1)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL         | CHILD_TABLE  |     1 |    38 |  1644   (1)| 00:00:01 |
|*  5 |     INDEX UNIQUE SCAN         | SYS_C0013555 |     1 |       |     0   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID| FATHER_TABLE |     1 |    20 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter("C"."COLUMN_VALUE"='Member number 007500')
   5 - access("C"."NESTED_TABLE_ID"="F"."SYS_NC0000300004$")

Again a full table scan of the CHILD_TABLE, but notice Bytes=38 for that operation in this case. The plan above had Bytes=36M for the same full table scan. Looks like Iudith has a valid point, when using MEMBER OF the SQL engine wishes to materialize the entire CHILDREN object before applying MEMBER OF, while using TABLE allows the SQL engine to full scan walk through the CHILD_TABLE but only keep the desired row.

Let us try adding an index and see if or how that helps:

create index child_table_value_ix on child_table (
   column_value, nested_table_id
)
/

begin
   dbms_stats.gather_table_stats(user,'FATHER_TABLE');
   dbms_stats.gather_table_stats(user,'CHILD_TABLE');
end;
/

And then we try the test again:

declare
   nested_variable   nested_varchar := nested_varchar();
   time0             number;
   time1             number;
   time2             number;
   time3             number;
   time4             number;
   foundid           father_table.id%type;
begin
   time0 := dbms_utility.get_time;
   select children
     into nested_variable
     from father_table
    where id = 1;
   time1 := dbms_utility.get_time;
   if 'Member number 007500' member of nested_variable then
      dbms_output.put_line('PLSQL member found');
   end if;
   time2 := dbms_utility.get_time;
   begin
      select /* after index */ id
        into foundid
        from father_table
       where 'Member number 007500' member of children;
      dbms_output.put_line('SQL member found');
   exception
      when no_data_found then
         null;
   end;
   time3 := dbms_utility.get_time;
   begin
      select /* after index */ distinct f.id
        into foundid
        from father_table f
           , table ( f.children ) c
       where c.column_value = 'Member number 007500';
      dbms_output.put_line('SQL table found');
   exception
      when no_data_found then
         null;
   end;
   time4 := dbms_utility.get_time;
   dbms_output.put_line('Fill nested: '||(time1-time0));
   dbms_output.put_line('PLSQL hsecs: '||(time2-time1));
   dbms_output.put_line('SQL 1 hsecs: '||(time3-time2));
   dbms_output.put_line('SQL 2 hsecs: '||(time4-time3));
end;
/

And the output this time is:

PLSQL member found
SQL member found
SQL table found
Fill nested: 49
PLSQL hsecs: 0
SQL 1 hsecs: 31
SQL 2 hsecs: 0

The SQL that uses TABLE operator on children and a WHERE clause now uses the index to become fast. The MEMBER OF in SQL has not improved. Let's compare the two query plans again:

      select /* after index */ id
        from father_table
       where 'Member number 007500' member of children;

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |    20 |  1647   (1)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   TABLE ACCESS FULL| FATHER_TABLE |     1 |    20 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| CHILD_TABLE  |   999K|    36M|  1644   (1)| 00:00:01 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter('Member number 007500'MEMBER OF"CHILDREN")
   3 - filter("NESTED_TABLE_ID"=:B1)

      select /* after index */ distinct f.id
        from father_table f
           , table ( f.children ) c
       where c.column_value = 'Member number 007500';

------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                      |     1 |    58 |     5  (20)| 00:00:01 |
|   1 |  HASH UNIQUE                  |                      |     1 |    58 |     5  (20)| 00:00:01 |
|   2 |   NESTED LOOPS                |                      |     1 |    58 |     4   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |                      |     1 |    58 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | CHILD_TABLE_VALUE_IX |     1 |    38 |     3   (0)| 00:00:01 |
|*  5 |     INDEX UNIQUE SCAN         | SYS_C0013555         |     1 |       |     0   (0)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID| FATHER_TABLE         |     1 |    20 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("C"."COLUMN_VALUE"='Member number 007500')
   5 - access("C"."NESTED_TABLE_ID"="F"."SYS_NC0000300004$")

The plan for the MEMBER OF query is unchanged, the plan using TABLE now uses index instead of full table scan.

The conclusion seems to be that when using the object oriented approach in SQL, the SQL engine populates the object completely even if the object is not selected but only used for MEMBER OF. There is no optimizing shortcut allowing the engine to transform the MEMBER OF into a relational query.

In PL/SQL MEMBER OF seems very fast - but remember that is only when you have the object variable populated. If you are doing object variables in "pure" PL/SQL, fine, but it is not a good idea to select data from a table into a PL/SQL variable just to do the fast PL/SQL MEMBER OF as that just takes longer time in total to get the data out of the table...

Thanks to Iudith for the comment ;-)

3 comments:

  1. Hello Kim,

    Just a wild guess ... don't be angry on me if it looks stupid ...

    Maybe most of the time in the second test is spent by "materializing" the 1M elements collection,
    (allocating memory, a.s.o.), even if this happens "behind the scenes",
    and NOT with the MEMBER OF testing itself ?

    In the PL/SQL this allocation also happens, but is not included in the time measured,
    so maybe you can repeat the test with the first time value being recorded BEFORE the first SELECT.

    In the 3rd test, that uses the TABLE() operator, I think that Oracle is just reading the child table
    "row by row" (either with a full table scan or with the index), but without materializing the whole collection
    at any point ... so this may be a good reason for being much faster ...

    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete
    Replies
    1. Thanks, Iudith

      It seems like your guess is not so wild ;-)

      I've updated this post with query plans that seem to support your point. At the same time I got this re-tested on 12.1.0.2.0 and it has not been improved upon. Using the object oriented syntax in SQL seems to require the object to be materialized from the relational data, and it does not look like there is an optimizer shortcut to transform MEMBER OF syntax into relational query.

      Delete
  2. Hello Kim,

    Thanks a lot for the update and for the detailed tests :):):)

    I am glad that I could contribute my two cents :)

    You are right indeed, once the collections are stored in the database using normal relational tables,
    we could expect Oracle "to go all the way" and "convert" the object-relational operators into
    "pure SQL", which still "beats" the newer (object oriented) "beast" :):)

    This approach would be similar to what is done for some XML-related syntax.

    If using PL/SQL then, of course, it doess not make much sense to fill an entire large collection
    for just performing one single MEMBER OF test, but it does make sense to fill such a collection once,
    and perform lots of tests against it, or, for cases where the collection is first created in PL/SQL from some
    application activity and only afterwards stored in the database.

    For the SQL slow case, it could be interesting to check for example, if we have several values (ex. stored in another table) and have to check each of those values against the same collection, whether Oracle is
    "clever enough" to materialize that collection only once, and not separately for each input value
    on the left side of the MEMBER OF condition.

    In general, for real life cases, probably nested tables containing the "child values for one father"
    that we would decide to store in the database using the object oriented features are supposed to be "reasonably small" so that to lower a lot the impact of this problem and still allow us to enjoy the
    "nice syntax" constructs.

    The new join types and collection related syntax added in 12c are not less beautiful, though.


    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete