1 2 Previous Next 20 Replies Latest reply on Apr 10, 2017 9:56 AM by Jonathan Lewis

    How to determine Selectivity when using character arithmetic: Academic question

    perfdba

      Hi all,

       

      i am brushing through selectivity basics by going through various blogs & presentation on selectivity and have this question ,academic only,about range arithmetic with characters.

      (i didnt find any other blog on exactly same subject so posting here)

       

      DDL: i created the table and collected statistics without histograms.

       

      create table t1 as
      with v1 as (select rownum n from dual connect by level <= 1e4)
      select
      rownum id,
      mod(rownum-1,200) mod_200,
      trunc(dbms_random.value(0,300)) rand_300,
      mod(rownum-1,10000) mod_10000,
      trunc(sqrt(dbms_random.value(0,75000))) sqrt,
      trunc(sysdate) +
      trunc(dbms_random.value(0,1000)) date_1000,
      dbms_random.string('l',6) alpha_06,
      dbms_random.string('l',20) alpha_20
      from v1, v1
      where
      rownum <= 1e6;

       

       

      SQL> explain plan for select * from t1 where alpha_06 like ‘mm%’;

       

      Explained.

      SQL> @pln

      PLAN_TABLE_OUTPUT
      ————————————————————————————————————————————————————————————————————————————————————————————————————
      Plan hash value: 3617692013

      ————————————————————————–
      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
      ————————————————————————–
      | 0 | SELECT STATEMENT | | 157 | 8792 | 1669 (1)| 00:00:21 |
      |* 1 | TABLE ACCESS FULL| T1 | 157 | 8792 | 1669 (1)| 00:00:21 |
      ————————————————————————–

      Predicate Information (identified by operation id):
      —————————————————

        1 – filter(“ALPHA_06” LIKE ‘mm%’)

       

      Questino. How is number 157 derived exactly. When i do the actual count it returns 1529.

       

        COUNT(*)
      ———-
      1529

       

      I tried various calculations but none seem to get closer to 157. Just one that 26(number of letters) *6 =156 but it is not making sense w.r.t. explain plan

       

       

      DB version:11.2.0.4.0

       

      Regds,
      Sachin

        • 1. Re: How to determine Selectivity when using character arithmetic: Academic question
          Andrew Sayer

          Never tried to calculate it but you could...

          Consider that your predicate can be rewritten

           

          alpha_06 like 'mm%'

          is equiv to

          alpha_06 >= 'mm' And alpha_06 < 'mn'

           

          Remember that any valid character could be used in this range (a-z,A-Z,0-9 all other characters..)

           

          You have a range of values that could fit in that gap, you also have a range of low and high values of your table's column stat.

           

          That might be enough to get you started, I dont have anything in front of me to work with so I can't follow through with some actual numbers, but you should get an idea of where to start.

           

          -edit

          Converting the range into raw (like what the low and high value stats would use) will probably be a good idea. You can convert the hex to decimal and do math with that.

          • 2. Re: How to determine Selectivity when using character arithmetic: Academic question
            Mustafa KALAYCI

            could it be dynamic sampling? was there a "Note - dynamic sampling used for this statement (level=2)" kind of statement in execution plan?

            when I run your code, my execution plan indicates 1774 rows and the actual one was 1465.

            • 3. Re: How to determine Selectivity when using character arithmetic: Academic question
              perfdba

              hi Andrew,

              Thanks for the reply . i did the steps you mentioned already , unfortunately numbers dont match up with what i see from the explain plan.

              • 4. Re: How to determine Selectivity when using character arithmetic: Academic question
                perfdba

                hi Mustafa,

                It is not dynamic sampling. There is no note.

                optimizer_dynamic_sampling is set to 2 already in the DB.

                when you run the code it is in which version and do you see dynamic sampling the notes section?

                 

                Just one note about the dynamic sampling in the DB where i am testing. Only when i set the optimizer_dynamic_sampling does the sampling kick in and change the estimated row to something different.

                 

                SQL> alter session set optimizer_dynamic_sampling=11;

                 

                Session altered.

                 

                SQL> explain plan for select *FROM t1 where ALPHA_06 like 'mm%';

                 

                Explained.

                 

                SQL> @pln

                 

                PLAN_TABLE_OUTPUT

                ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                Plan hash value: 3617692013

                 

                --------------------------------------------------------------------------

                | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

                --------------------------------------------------------------------------

                |   0 | SELECT STATEMENT  |      |  1750 | 98000 |  1669   (1)| 00:00:21 |

                |*  1 |  TABLE ACCESS FULL| T1   |  1750 | 98000 |  1669   (1)| 00:00:21 |

                --------------------------------------------------------------------------

                 

                Predicate Information (identified by operation id):

                ---------------------------------------------------

                 

                   1 - filter("ALPHA_06" LIKE 'mm%')

                 

                Note

                -----

                   - dynamic sampling used for this statement (level=11)

                • 5. Re: How to determine Selectivity when using character arithmetic: Academic question
                  Mustafa KALAYCI

                  hi,

                   

                  my db is 11.2.0.1

                   

                  show parameter dynamic;
                  NAME                       TYPE    VALUE 
                  -------------------------- ------- ----- 
                  optimizer_dynamic_sampling integer 2 
                  
                  create table t1 as
                  with v1 as (select rownum n from dual connect by level <= 1e4)
                  select
                  rownum id,
                  mod(rownum-1,200) mod_200,
                  trunc(dbms_random.value(0,300)) rand_300,
                  mod(rownum-1,10000) mod_10000,
                  trunc(sqrt(dbms_random.value(0,75000))) sqrt,
                  trunc(sysdate) +
                  trunc(dbms_random.value(0,1000)) date_1000,
                  dbms_random.string('l',6) alpha_06,
                  dbms_random.string('l',20) alpha_20
                  from v1, v1
                  where
                  rownum <= 1e6;
                  
                  explain plan for select * from t1 where alpha_06 like 'mm%';
                  Plan FOR succeeded.
                  
                  Plan hash value: 3617692013
                  
                  --------------------------------------------------------------------------
                  | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
                  --------------------------------------------------------------------------
                  |   0 | SELECT STATEMENT  |      |  1501 |  5977K|  2357   (2)| 00:00:29 |
                  |*  1 |  TABLE ACCESS FULL| T1   |  1501 |  5977K|  2357   (2)| 00:00:29 |
                  --------------------------------------------------------------------------
                  
                  Predicate Information (identified by operation id):
                  ---------------------------------------------------
                  
                     1 - filter("ALPHA_06" LIKE 'mm%')
                  
                  Note
                  -----
                     - dynamic sampling used for this statement (level=2)
                  
                  
                  select count(8) from t1 where alpha_06 like 'mm%';
                  
                  1476
                  

                   

                  I did not change any default settings in that db and by default level 2 dynamic sampling, quite accurate results.

                  • 6. Re: How to determine Selectivity when using character arithmetic: Academic question
                    perfdba

                    What do you see after collecting statistics without histograms?

                    • 7. Re: How to determine Selectivity when using character arithmetic: Academic question
                      Mustafa KALAYCI

                      hi,

                       

                      here:

                       

                      exec dbms_stats.gather_table_stats(user, 'T1');
                      PL/SQL procedure successfully completed.
                      
                      
                      explain plan for select * from t1 where alpha_06 like 'mm%';
                      Plan FOR succeeded.
                      
                      
                      select * from table(dbms_xplan.display);
                      Plan hash value: 3617692013
                      
                      --------------------------------------------------------------------------
                      | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
                      --------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT  |      |  1140 | 63840 |  2360   (2)| 00:00:29 |
                      |*  1 |  TABLE ACCESS FULL| T1   |  1140 | 63840 |  2360   (2)| 00:00:29 |
                      --------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         1 - filter("ALPHA_06" LIKE 'mm%')
                      
                      

                       

                      there is a change after gathering statistics by default parameter.

                       

                      so I also gather statistics by adding estimated_percent and cascade (there is no index on table) and result is more accurate again:

                       

                      exec dbms_stats.gather_table_stats(user, 'T1', estimate_percent=>100, cascade => true);
                      PL/SQL procedure successfully completed.
                      
                      
                      explain plan for select * from t1 where alpha_06 like 'mm%';
                      Plan FOR succeeded.
                      
                      
                      select * from table(dbms_xplan.display);
                      Plan hash value: 3617692013
                      
                      --------------------------------------------------------------------------
                      | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
                      --------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT  |      |  1589 | 88984 |  2360   (2)| 00:00:29 |
                      |*  1 |  TABLE ACCESS FULL| T1   |  1589 | 88984 |  2360   (2)| 00:00:29 |
                      --------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         1 - filter("ALPHA_06" LIKE 'mm%')
                      
                      • 8. Re: How to determine Selectivity when using character arithmetic: Academic question
                        perfdba

                        the way you are gathering statistics histograms will get gathered, by default,  for reference https://blogs.oracle.com/optimizer/entry/how_does_the_method_opt

                         

                        Please user the below command and let me know what you see for explain plan results.

                         

                        exec dbms_stats.gather_table_stats(user, 'T1',method_opt=>'FOR TABLE FOR ALL COLUMNS SIZE 1');

                        • 9. Re: How to determine Selectivity when using character arithmetic: Academic question
                          Mustafa KALAYCI

                          sorry, my mistake. here it is and here 157:

                           

                          exec dbms_stats.gather_table_stats(user, 'T1', method_opt=>'FOR TABLE FOR ALL COLUMNS SIZE 1');
                          PL/SQL procedure successfully completed.
                          
                          
                          explain plan for select * from t1 where alpha_06 like 'mm%';
                          Plan FOR succeeded.
                          
                          
                          select * from table(dbms_xplan.display);
                          Plan hash value: 3617692013
                          
                          --------------------------------------------------------------------------
                          | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
                          --------------------------------------------------------------------------
                          |   0 | SELECT STATEMENT  |      |   157 |  8792 |  2360   (2)| 00:00:29 |
                          |*  1 |  TABLE ACCESS FULL| T1   |   157 |  8792 |  2360   (2)| 00:00:29 |
                          --------------------------------------------------------------------------
                          
                          Predicate Information (identified by operation id):
                          ---------------------------------------------------
                          
                             1 - filter("ALPHA_06" LIKE 'mm%')
                          

                           

                          also same result if I use 100% estimate percent.

                          • 10. Re: How to determine Selectivity when using character arithmetic: Academic question
                            mathguy

                            The optimizer sees the first two letters are between 'aa' and 'zz'. But the optimizer doesn't know there are letters only (unless it has histograms available); the first letter is between 'a' and 'z', but when we go to the first two characters, things like 'm3' and 'm*' and 'mG' are between 'lz' and 'mz', not just 'mc' and 'mm' and 'mr'.

                             

                            The count is 25 * 256 + 26 = 6400 + 26 = 6426  (it is NOT 26 * 256, because it is known that 'a3', for example, is not possible - it is not between 'aa' and 'zz' - and similarly 'z~' is not possible because it is greater than 'zz'). I am using the fact that digits come before letters, and ~ comes after letters, in ASCII and therefore in all or almost all character sets.

                             

                            1 million divided by 6426 is approx. 156.6. This is close enough; perhaps only 255 characters are valid (the "nul" - character 0 in ASCII - may not be valid) so you get 25*255 + 26 = 6401, and 1 million divided by 6401 is 156.23 - round it up and you get 157.

                             

                            I believe this is pretty much what Andrew said in Reply 1.

                            1 person found this helpful
                            • 11. Re: How to determine Selectivity when using character arithmetic: Academic question
                              Jonathan Lewis

                              The DDL looks like one of mine - was it from one of my blogs, or from one of the presentations that I've given on selectivity, and somewhere online I'm fairly sure I've described the method.

                              The selectivity calculation is the standard "range you want" divided by "total range".

                               

                              If you did a dbms_random.seed(0) that usually goes into the scripts I publish (and depending on which version you're running on) then the low and high values are probably very close to 'aaaa??' and 'zzz???' (which the '?' means unknown, not literally a question mark).

                               

                              As Andrew said, the calculation is the same as that for:  column >= 'mm' and column < 'mn' - i.e. using the next possible value for the 2nd character.

                               

                              For a very good approximation calculate the numberic values that would represent the min, the max, 'mm' and 'mn' if they were the endpoint values stored for a histogram - the algorithm is here: https://jonathanlewis.wordpress.com/2010/10/05/frequency-histogram-4/   (I've assumed you've done a 100% sample so the min and max would be stored as the low and high in user_tab_columns - if not then translate the low and high back to their ascii equivalents).

                               

                              For example, if min = 'aaaaud' and max = 'zzzypj' then you get:

                              select

                                      (568191422636698000000000000000000000 - 568171140227094000000000000000000000)  /

                                      (635944374132308000000000000000000000-505627904303159000000000000000000000)     sel0

                              from dual

                              ;

                               

                                SEL0

                              ----------

                              .00015564

                               

                               

                              Add 1/num_distinct  (998848 in my case) because one end of the range is bounded, giving: 1.5664115332863458704427500480554e-4

                              Multiply by num_rows in table (1,000,000) and round (up) -> 157 Q.E.D.

                               

                              Regards

                              Jonathan Lewis

                              • 12. Re: How to determine Selectivity when using character arithmetic: Academic question
                                Jonathan Lewis

                                This works because of the data construction, not as a general rule.

                                 

                                When I first wrote the code to generate the data I'd pick 6 characters and restricted to lower case only so that there was a good chance of getting very close to 1,000,000 distinct values in 1,000,000 rows, and a good chance of having a low value starting with lots of 'a' and a high with lots of 'z'.

                                 

                                Your approximation is adequate because of that a-z assumption and could, of course, be modified to suit any other known low/high range.

                                 

                                Regards

                                Jonathan Lewis

                                • 13. Re: How to determine Selectivity when using character arithmetic: Academic question
                                  perfdba

                                  hi Jonathan,

                                  Thanks a lot ! Yes it is from your presentation only. Actually what i  meant by the question was that the my calculation isnt correct, not that the formula/method was incorrect . Maybe i didnt put it correctly. Your presentation is very helpful!

                                   

                                  It seems that i got the a large range and the wrong values when converting from hex to decimal.

                                  i.e. i didnt get below 2 numbers and had got something else which led to a slightly higher cardinality than 157.

                                   

                                  (568191422636698000000000000000000000 - 568171140227094000000000000000000000) 

                                   

                                   

                                  Regds.

                                  • 14. Re: How to determine Selectivity when using character arithmetic: Academic question
                                    perfdba

                                    hi mathguy,

                                    i am not sure if i am following what you say correctly.

                                    25 * 256 + 26

                                    If you get a chance can you plesae give a  brief explanation of each of the numbers. Is it because we have 127 ASCII characters? even if it is i didnt understand still

                                    1 2 Previous Next