1 2 Previous Next 27 Replies Latest reply on Nov 29, 2016 11:02 AM by Jonathan Lewis

    using sys_guid() as primary key

    Mustafa KALAYCI

      hello everyone,

       

      This is not a question just wonder about your ideas and experiences. I just want to learn your opinions about the subject. I already search forum and google it, read blogs etc. is there anyone using this? many blog says sys_guid can be used but it is a little bit "slower" against sequences. so I run my tests and it is much more faster than sequences. here is my test:

      oracle 11.2.0.3 DB on an Exadata X3 (1/8)

       

       

      drop table t_Seq;
      drop table t_raw;
      
      drop sequence seq1 ;
      create sequence seq1; -- default 20 cache
      
      create table t_seq
          ( id     number(9) primary key
          , filler varchar2(1000)
          );
      
      insert into t_seq
        select seq1.nextval ,'sdfsf' from dual connect by level < 1000000; 
        -- this takes 25 seconds
      
      drop table t_seq;
      drop sequence seq1;
      
      create sequence seq1 cache 10000; 
      
      create table t_seq
          ( id     number(9) primary key
          , filler varchar2(1000)
          );
      
      insert into t_seq
        select seq1.nextval ,'sdfsf' from dual connect by level < 1000000; 
        -- this takes 13 seconds
      
      
      create table t_raw
          ( id     raw(16) primary key
          , filler varchar2(1000)
          );
          
      insert into t_raw
        select sys_guid(),'sdfsf' from dual connect by level < 1000000; 
        -- this takes 3.8 seconds    
            
      select segment_name, bytes from user_Segments where segment_name in (select index_name from dba_indexes where table_name in ('T_SEQ', 'T_RAW'));
      SYS_C0069970    28311552 --sys_guid
      SYS_C0069969    15728640 --sequence
      
      select segment_name, bytes from user_Segments where segment_name in ('T_SEQ', 'T_RAW');
      T_RAW    32505856
      T_SEQ    18874368
      
      

       

      so first create a sequence with default 20 cache, and then test it with 10.000 cache. best sequence performance is 13 seconds (with 10.000 cahce) bu sys_guid is already much more faster, 3.8 seconds. I also check the sizes of course both index and table. SYS_GUID seems bigger twice than the sequnce (but of course sequnce number is starting from 1 and table column is number(9), if this was bigger, space would be bigger).

       

      so I just want to test sequence and sys_guid:

       

      drop sequence seq1;
      create sequence seq1 ;
      
      declare
        x number(38);
        function sf return number is
        begin
          return seq1.nextval;
        end;
      begin
        for i in 1..100000 loop
          x := sf;
        end loop;
      end;
      /  
      -- this takes 6 seconds
      
      drop sequence seq1;
      create sequence seq1 cache 10000;
      declare
        x number(38);
        function sf return number is
        begin
          return seq1.nextval;
        end;
      begin
        for i in 1..100000 loop
          x := sf;
        end loop;
      end;
      / 
      -- this takes 4.8 seconds
      
      declare
        x raw(16);
        function sf return varchar2 is
        begin
          return sys_guid();
        end;
      begin
        for i in 1..100000 loop
          x := sf;
        end loop;
      end;
      /  
      -- this takes 2.9 seconds.
      

       

      times become more closer but still sys_guid is faster. also I realized, if I run sequence codes twice without dropping sequences, time is getting longer. also SYS_GUID can be a default value for a table column.

       

      so what do you think? thanks.

        • 1. Re: using sys_guid() as primary key
          JohnWatson2

          You might want to consider your primary key index, particularly as this is a RAC. Using sys_guid will spread the index inserts across the whole range of the index, whereas your sequence will insert into the right hand edge. Unless you reverse key it. This can mean playing games between efficient use of the buffer cache and block contention problems.

          • 2. Re: using sys_guid() as primary key
            Mustafa KALAYCI

            hi JohnWatson2,

             

            "block contention" yes I didn't think about it. good point. I also understand on my example that NEXTVAL is a plsql action right? my insert statements with sequences takes much more longer than plsql example but SYS_GUID has almost same performans both sql and plsql.

            • 3. Re: using sys_guid() as primary key
              Jonathan Lewis

              Good effort,

              A couple of thoughts, though:

               

              a) For this type of comparison I would do things like flush buffer_cache before eash run and and switch logfiles to make sure that the second test didn't suffer any side effects from the firsrt test (e.g. massive writes to disc as an unlucky log file is switched).

               

              b) Watch out for the side effects of the large (1M) connect by allocating lots of PGA - your figures showed a relatively small difference on the insert/select, it's possible that a large part of the difference was O/S time allocating memory for your session to use from the PGA.

               

              c) An insert on its own is only a starting point - if you think the results suggest it might be worth pursuing then run up (say) 8 concurrent processes to loop through the same total volume of data with a small pause (say 0.01 sec) between inserts.  Check the effects on workload, completion rate, size of index etc.

               

              d) Rather than starting a sequence-based test from 1 I would try to emulate what a production system would do at steady state and start with (e.g.) 10M.

               

              =============

               

              The thought I take from your experiments so far is that a sys_guid() is 16 bytes fixed, a number is variable but typically stored in about (N+1)/2 bytes where N is the number of digits, so the space usage in table and index are likely to be significantly different while the timing difference you show is only a few seconds over 1M rows.

               

              Add in my previous experience (which seems to be confirmed by the examples in the manual - e.g.https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions187.htm#SQLRF06120    (check the 6th byte / 12th digit) - the Oracle implementation of sys_guid() has a component that makes it largely sequential so it's  behaviour may not be too different from some of the behaviour seen in sequences (notwithstanding John's comments about random insertion to indexes.)

               

              I have a vague memory of reading something that described how the sys_guid() was generated and this included a timestamp and the MAC and some other bits - so you need to do some testing to see if the timestamp is the leading few bytes and whether it is dictated by (e.g.) the logon time or the time of the first call to sys_guid(). The way in which the leading edge is generated could have a dramatic effect on storage and contention in the index.

               

              Regards

              Jonathan Lewis

              • 4. Re: using sys_guid() as primary key
                JohnWatson2

                Just had another look (work is slow today - America is mostly on holiday) and I get very different timings. On my PC which is an Intel i7 2.3GHz (your X3 has faster Xeon processors) the sequence is a lot faster. Depending on the cache, it does generate a lot of redo whereas sys_guid generates none.

                 

                orclz>
                orclz> set autot trace stat
                orclz> set timing on
                orclz> select sys_guid() from dual connect by level < 1000000;
                
                
                999999 rows selected.
                
                
                Elapsed: 00:00:32.28
                
                
                Statistics
                ----------------------------------------------------------
                          0  recursive calls
                          0  db block gets
                          0  consistent gets
                          0  physical reads
                          0  redo size
                   30933832  bytes sent via SQL*Net to client
                     733877  bytes received via SQL*Net from client
                      66668  SQL*Net roundtrips to/from client
                          1  sorts (memory)
                          0  sorts (disk)
                     999999  rows processed
                
                
                orclz> create sequence s1;
                
                
                Sequence created.
                
                
                Elapsed: 00:00:00.02
                orclz> select s1.nextval from dual connect by level < 1000000;
                
                
                999999 rows selected.
                
                
                Elapsed: 00:00:15.35
                
                
                Statistics
                ----------------------------------------------------------
                      50012  recursive calls
                     151429  db block gets
                      50009  consistent gets
                          0  physical reads
                   36196692  redo size
                   18913645  bytes sent via SQL*Net to client
                     733877  bytes received via SQL*Net from client
                      66668  SQL*Net roundtrips to/from client
                          1  sorts (memory)
                          0  sorts (disk)
                     999999  rows processed
                
                
                orclz> create sequence s2 cache 10000;
                
                
                Sequence created.
                
                
                Elapsed: 00:00:00.01
                orclz> select s2.nextval from dual connect by level < 1000000;
                
                
                999999 rows selected.
                
                
                Elapsed: 00:00:04.32
                
                
                Statistics
                ----------------------------------------------------------
                        109  recursive calls
                        304  db block gets
                        109  consistent gets
                          0  physical reads
                      72720  redo size
                   18913645  bytes sent via SQL*Net to client
                     733877  bytes received via SQL*Net from client
                      66668  SQL*Net roundtrips to/from client
                          1  sorts (memory)
                          0  sorts (disk)
                     999999  rows processed
                
                
                orclz>
                

                 

                I think you really may need to think about the index. Using sys_guid, you could end up caching a complete copy of the index in both nodes. Do you want to do that? It is arguable both ways.

                • 5. Re: using sys_guid() as primary key
                  Jonathan Lewis

                  John,

                   

                  What O/S are you using ? If it's a Windows version of Oracle perhaps it behaves completely differently.

                  Here's my output on a Linux VM with 12.1.0.2 (similar on 11.2.0.4):

                   

                   

                  SQL> select sys_guid() from dual connect by level <= 10;

                   

                  SYS_GUID()

                  --------------------------------

                  4221329487460613E0530100007F4893

                  4221329487470613E0530100007F4893

                  4221329487480613E0530100007F4893

                  4221329487490613E0530100007F4893

                  42213294874A0613E0530100007F4893

                  42213294874B0613E0530100007F4893

                  42213294874C0613E0530100007F4893

                  42213294874D0613E0530100007F4893

                  42213294874E0613E0530100007F4893

                  42213294874F0613E0530100007F4893

                   

                  10 rows selected.

                   

                  For a single session the sys_guid()s are sequential (check the 12th character); and for multiple sessions that start at about the same time the values are sequential per session and start with very similar values.

                   

                  Regards

                  Jonathan Lewis

                   

                  P.S.  It's just possible, though a little unlikely given the scale of the difference that most of the time difference is about memory allocation. Are the timings virtually unchanged if you repeat run sys_guid() test twice in a row?

                  • 6. Re: using sys_guid() as primary key
                    Mustafa KALAYCI

                    wow Johnathan, great points. as you said I must re-do my tests for many things and I will. Also for your D step, I mentioned about it a little bit. when I run second time sequence examples, it takes more time and I believe when number gets higher it will take more time but of course I will create a full script about it in a few hours.

                    • 7. Re: using sys_guid() as primary key
                      JohnWatson2

                      I did get repeatable results. I'm using Windows, I had thought that it would be the chip that matters, not the OS, but perhaps not:

                       

                      orclz> set autot off

                      orclz> select sys_guid() from dual connect by level <= 10;

                       

                       

                      SYS_GUID()

                      --------------------------------

                      8E456A92F12D424FA92A34B8B675DEA8

                      87A1B48419B74CE0B05C8637D9A5C6E4

                      0321B5EE21CC42B1A291FF171F26E601

                      4F0A5EA2F33C40D1B7D0E802FBC0677A

                      5935EB51454A4D249C147CC3E1387AFD

                      23087816FE384D7BA0200ABB1AF4A751

                      65341B80F4B94B37A9F5B3A251A5BC6E

                      3481A89750604B39BA64E7ADF8F38A53

                      0B58E5FB0C594C34A46021B75ABA7304

                      8E904982B57841DC821660E18EE80EF8

                       

                       

                      10 rows selected.

                       

                       

                      Elapsed: 00:00:00.00

                      orclz> host ver

                       

                       

                      Microsoft Windows [Version 10.0.14393]

                       

                       

                      orclz> select * from v$version where rownum=1;

                       

                       

                      BANNER                                                                              CON_ID

                      -------------------------------------------------------------------------------- ----------

                      Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0

                       

                       

                      Elapsed: 00:00:00.01

                      orclz>

                      • 8. Re: using sys_guid() as primary key
                        Mustafa KALAYCI

                        John, thanks for showing statistics, I didn't look at them. as you present, sequence cause consistent gets (because of internal commit and data dictionary updates I guess). after I rearrange my test scripts, I will also check statistics too and of course I will be checking sys_guid data generation method so I can understand index structure more clear. as you said it is not something I want to hold whole index in memory.

                         

                        edit: my windows 10 machine creates sys_guids same as John's

                         

                        SQL> select sys_guid() from dual connect by level <= 10;

                         

                         

                        SYS_GUID()

                        --------------------------------

                        33439FC5513D4B4ABEFC8D25D4A2EE23

                        CA090D329DEB4B89AE4EB37F235725C3

                        4DBBA9E9B4D644ADB281F22E60DD56C9

                        40A8C07DAE1C45A2A41862EDE28A0DCB

                        B107C3325F034E27A73F46C984B8A2E3

                        4156B5CB15494551A0E528FEFD5B73FA

                        AEB7F05EB0E5431388C4DC5F6DABA216

                        9BF17367421C4E78B4C8CCC51E430971

                        05677DC4B9574266831B293BA9003549

                        5626B34D1098436CB57B366BFF966DCF

                         

                         

                        10 rows selected.

                         

                        edit2: same configuration with John, windows and oracle too.

                        • 9. Re: using sys_guid() as primary key
                          rp0428

                          First let me add something others haven't yet mentioned.

                           

                          There is a REAL ADVANTAGE to using GUIDs in that they should, by their nature, ALWAYS be unique. So you can just use that function to get a new key value whenever you need one.

                           

                          When you use NUMBERS for the key you always have to deal with the possibility that the number you use might already exist in the data.

                           

                          So people do things like MAX(myKeyValue) to figure out how to set/reset their sequence generators to avoid conflicts when data is moved from environment to environment. That can even cause an issue for a minor data fix like adding new value to a small lookup table.

                           


                          declare 

                          1.   x raw(16); 
                          2. function sf return varchar2 is 
                          3. begin 
                          4. return sys_guid(); 
                          5. end; 
                          6. begin 
                          7. for i in 1..100000 loop 
                          8.     x := sf; 
                          9. end loop; 
                          10. end; 
                          11. /   
                          12. -- this takes 2.9 seconds. 

                           

                          times become more closer but still sys_guid is faster

                          It could be even faster than that. Your function uses SYS_GUID to obtain a RAW but is returning a VARCHAR2 and storing in a RAW.

                           

                          That would result in two implicit conversions - one from RAW to VARCHAR2 and another from VARCHAR2 to RAW.

                           

                          Try it again using the RAW for the return datatype.

                          also SYS_GUID can be a default value for a table column.

                          Yes - but in 12c a sequence.nextval can be a default value also

                          https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#SQLRF01402

                          The DEFAULT expression can include the sequence pseudocolumns CURRVAL and NEXTVAL, as long as the sequence exists and you have the privileges necessary to access it.

                          And see this AskTom article for additional info

                          Ask Tom: On Oracle Database 12c, Part 1

                          Improved Defaults

                          The ability to create a default column value has existed in SQL for a while. The functionality has been somewhat limited, however, with various restrictions. For example, you were restricted from using a SEQUENCE object to supply a default value. Additionally, if a default value was to be inserted into or updated in a table, you had to either use the DEFAULT keyword in the SQL statement or leave the column out of the INSERT statement entirely. Furthermore, adding a new column that permits NULL values with a default value was an offline operation. In Oracle Database 12c, however, these restrictions and functionality limitations have been removed.

                          As usual, though, there are usually the practical considerations to consider when choosing a method and those often override any 'apparent' speed advantage.

                           

                          1. RAW values are not inherently 'user friendly' and even the human-readable form (using tests/varchar) isn't really something most developers would be familiar with.

                           

                          2. Any user tool (sql dev, toad, utl_file, Java, etc) that produces CSV, or similar files, will have to convert the value to a 'text' form. That will typically be VARCHAR2 taking 32 ascii bytes to represent the 16 byte raw. A typical numeric value will take far fewer bytes.

                           

                          Some of those tools don't really support exporting table data that includes LOBs (blob, clob, nclob). Haven't checked lately to see if they support RAW.

                           

                          3. Any report/query that 'displays' the data will also have to use a 'text' form. Again that means an automatic column width of 32 characters if that value is to be fully displayed.

                           

                          There is a REAL ADVANTAGE to using GUIDs that I haven't seen mentioned.

                           

                          Since primary key data is an important value if you are exporting data to files with the intent of moving it around (and in my experience a lot of people do such exports) using guids could be a deal killer.

                           

                          I'm a strong proponent of using surrogate key values rather than actual data except in specific, appropriate use cases.

                           

                          But while using guids may be a technical solution I don't consider it to be a practical solution given the penchant for exporting and reloading data that developers have shown (rightly or wrongly) over the years.

                           

                          For me there would have to be a REAL, demonstrated value that the standard NUMBER solution can't provide. I've never seen one.

                          • 10. Re: using sys_guid() as primary key
                            Jonathan Lewis

                            John,

                             

                            The reason I asked was that when you said PC I thought Windows, and I believe Microsoft invented the GUID concept.

                             

                            With those thoughts in mind I wondered whether the Windows implementation of sys_guid() is just a call to the Windows GUID function while the Unix (et. al.) implementation is an in-house routine that operates much faster.  It's possible for an in-house call to take short cuts (e.g. remember the MAC rather than requesting it every time) but perhaps a call to a "3rd party" function has to live with whatever it gets.

                             

                            Regards

                            Jonathan Lewis

                            • 11. Re: using sys_guid() as primary key
                              Mustafa KALAYCI

                              rp0428, thank you very much for the warning, I don't know how did I do that mistake about varchar2 and raw conversion. your points are really good too for me. especially export & import. I must consider this. as Johnathan mentioned how guid is generated is important, could it generate same guid at another db at different times...

                               

                              edit:

                              declare  
                                x raw(16);  
                                function sf return raw is  
                                begin  
                                  return sys_guid();  
                                end;  
                              begin  
                                for i in 1..100000 loop  
                                  x := sf;  
                                end loop;  
                              end;  
                              /    
                              

                               

                              this takes 2.6 on exadata (not 2.9 after removing raw varchar2 conversion)

                              • 12. Re: using sys_guid() as primary key
                                Mustafa KALAYCI

                                Jonathan,

                                 

                                I was just writing about that. I was doing my tests again with flushing buffer caches etc. Now I am at home and I'm doing the tests on my laptop windows 10, i7 2.4GHz cpu, oracle 12c. interestingly SYS_GUID takes too much time, for my first insert tests,

                                 

                                sequence insert with cahce 10000 takes 4.9 seconds -- less then exadata, on exadata it was 13 seconds

                                but SYS_GUID insert takes 43 seconds. -- too slow than exadata, on exadata it was 3.8 seconds

                                 

                                my laptop is really strong one, single core performance is better than my exadata (I tested this before many times), 16GB ram, samsung evo ssd etc. but sys_Guid has a huge time difference on my machine now. I believe this is because windows but I am not sure.

                                • 13. Re: using sys_guid() as primary key
                                  Mustafa KALAYCI

                                  Hello everyone,

                                   

                                  I tried to develop my tests according to your suggestions.

                                   

                                  first, about ordered and not order guids, on my exadata guids are kind of ordered so windows and unix/linux has different logic which Jonathan had mentioned before.

                                  Windows:

                                  select sys_guid() from dual connect by level < 5;
                                  B5C9C96EEB71471CBD684CB3DA96175F
                                  395076AB0FC44250956E85F4F3125B2A
                                  1F5237170FFD4410AB1A48C850E49BD0
                                  7ACF01F9B3454BA482ACA638155FC0FC
                                  

                                   

                                  Exadata:

                                  select sys_guid() from dual connect by level < 5;
                                  42276A59957ADDCEE0537901740A8332
                                  42276A59957BDDCEE0537901740A8332
                                  42276A59957CDDCEE0537901740A8332
                                  42276A59957DDDCEE0537901740A8332
                                  

                                   

                                  so my first tests again:

                                   

                                  the shortest one: 1.8 seconds and the longest one: 8.2 seconds.

                                   

                                  I know codes could be much more better but I just want to be quick. what do you think?

                                   

                                  EDIT: After Andrew's post I deleted sample code because of append hint, please see my next post for example and the results.

                                  • 14. Re: using sys_guid() as primary key
                                    Andrew Sayer

                                    Not managed to read the whole test case yet but I notice that you are trying to test an insert /*+append*/ from many sessions at once. Direct path inserts require an exclusive lock on the table so you are not doing any concurrent inserts.

                                     

                                    Demo to prove the locks (from live SQL)

                                     

                                    drop table source_Tab;

                                    drop table dest_tab;

                                    create table source_Tab as select * from all_objects where rownum <=2000;

                                    select count(*) from source_tab;

                                    create table dest_tab as select * from source_tab where 1=0;

                                    insert /*+append*/ into dest_tab select * from source_tab;

                                    declare

                                    pragma autonomous_transaction;

                                    begin

                                    insert /*+append*/ into dest_tab select * from source_tab;

                                    end;

                                    /

                                    Table dropped. Table dropped.

                                    Table created.

                                    Result Set 4

                                    COUNT(*)
                                    2000

                                    Table created. 2000 row(s) inserted.

                                    ORA-00060: deadlock detected while waiting for resource ORA-06512: at line 4 ORA-06512: at "SYS.DBMS_SQL", line 1707

                                     

                                     

                                    You'll have to remove that append hint to really test concurrent inserts.

                                    1 2 Previous Next