Forum Stats

  • 3,814,188 Users
  • 2,258,835 Discussions
  • 7,892,611 Comments

Discussions

using sys_guid() as primary key

Mustafa_KALAYCI
Mustafa_KALAYCI Member Posts: 3,383 Bronze Crown
edited Nov 29, 2016 6:02AM in General Database Discussions

hello everyone,

This is not a question just wonder about your ideas and experiences. I just want to learn your opinions about the subject. I already search forum and google it, read blogs etc. is there anyone using this? many blog says sys_guid can be used but it is a little bit "slower" against sequences. so I run my tests and it is much more faster than sequences. here is my test:

oracle 11.2.0.3 DB on an Exadata X3 (1/8)

drop table t_Seq;drop table t_raw;drop sequence seq1 ;create sequence seq1; -- default 20 cachecreate table t_seq    ( id     number(9) primary key    , filler varchar2(1000)    );insert into t_seq  select seq1.nextval ,'sdfsf' from dual connect by level < 1000000;   -- this takes 25 secondsdrop table t_seq;drop sequence seq1;create sequence seq1 cache 10000; create table t_seq    ( id     number(9) primary key    , filler varchar2(1000)    );insert into t_seq  select seq1.nextval ,'sdfsf' from dual connect by level < 1000000;   -- this takes 13 secondscreate table t_raw    ( id     raw(16) primary key    , filler varchar2(1000)    );    insert into t_raw  select sys_guid(),'sdfsf' from dual connect by level < 1000000;   -- this takes 3.8 seconds          select segment_name, bytes from user_Segments where segment_name in (select index_name from dba_indexes where table_name in ('T_SEQ', 'T_RAW'));SYS_C0069970    28311552 --sys_guidSYS_C0069969    15728640 --sequenceselect segment_name, bytes from user_Segments where segment_name in ('T_SEQ', 'T_RAW');T_RAW    32505856T_SEQ    18874368

so first create a sequence with default 20 cache, and then test it with 10.000 cache. best sequence performance is 13 seconds (with 10.000 cahce) bu sys_guid is already much more faster, 3.8 seconds. I also check the sizes of course both index and table. SYS_GUID seems bigger twice than the sequnce (but of course sequnce number is starting from 1 and table column is number(9), if this was bigger, space would be bigger).

so I just want to test sequence and sys_guid:

drop sequence seq1;create sequence seq1 ;declare  x number(38);  function sf return number is  begin    return seq1.nextval;  end;begin  for i in 1..100000 loop    x := sf;  end loop;end;/  -- this takes 6 secondsdrop sequence seq1;create sequence seq1 cache 10000;declare  x number(38);  function sf return number is  begin    return seq1.nextval;  end;begin  for i in 1..100000 loop    x := sf;  end loop;end;/ -- this takes 4.8 secondsdeclare  x raw(16);  function sf return varchar2 is  begin    return sys_guid();  end;begin  for i in 1..100000 loop    x := sf;  end loop;end;/  -- this takes 2.9 seconds.

times become more closer but still sys_guid is faster. also I realized, if I run sequence codes twice without dropping sequences, time is getting longer. also SYS_GUID can be a default value for a table column.

so what do you think? thanks.

William RobertsonDom BrooksMustafa_KALAYCIDejan T.AndrewSayer
«13

Comments

  • JohnWatson2
    JohnWatson2 Member Posts: 4,426 Silver Crown
    edited Nov 25, 2016 5:13AM

    You might want to consider your primary key index, particularly as this is a RAC. Using sys_guid will spread the index inserts across the whole range of the index, whereas your sequence will insert into the right hand edge. Unless you reverse key it. This can mean playing games between efficient use of the buffer cache and block contention problems.

    William RobertsonDom Brooks
  • Mustafa_KALAYCI
    Mustafa_KALAYCI Member Posts: 3,383 Bronze Crown
    edited Nov 25, 2016 6:22AM

    hi @JohnWatson2,

    "block contention" yes I didn't think about it. good point. I also understand on my example that NEXTVAL is a plsql action right? my insert statements with sequences takes much more longer than plsql example but SYS_GUID has almost same performans both sql and plsql.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,941 Blue Diamond
    edited Nov 25, 2016 10:28AM

    Good effort,

    A couple of thoughts, though:

    a) For this type of comparison I would do things like flush buffer_cache before eash run and and switch logfiles to make sure that the second test didn't suffer any side effects from the firsrt test (e.g. massive writes to disc as an unlucky log file is switched).

    b) Watch out for the side effects of the large (1M) connect by allocating lots of PGA - your figures showed a relatively small difference on the insert/select, it's possible that a large part of the difference was O/S time allocating memory for your session to use from the PGA.

    c) An insert on its own is only a starting point - if you think the results suggest it might be worth pursuing then run up (say) 8 concurrent processes to loop through the same total volume of data with a small pause (say 0.01 sec) between inserts.  Check the effects on workload, completion rate, size of index etc.

    d) Rather than starting a sequence-based test from 1 I would try to emulate what a production system would do at steady state and start with (e.g.) 10M.

    =============

    The thought I take from your experiments so far is that a sys_guid() is 16 bytes fixed, a number is variable but typically stored in about (N+1)/2 bytes where N is the number of digits, so the space usage in table and index are likely to be significantly different while the timing difference you show is only a few seconds over 1M rows.

    Add in my previous experience (which seems to be confirmed by the examples in the manual - e.g.https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions187.htm#SQLRF06120    (check the 6th byte / 12th digit) - the Oracle implementation of sys_guid() has a component that makes it largely sequential so it's  behaviour may not be too different from some of the behaviour seen in sequences (notwithstanding John's comments about random insertion to indexes.)

    I have a vague memory of reading something that described how the sys_guid() was generated and this included a timestamp and the MAC and some other bits - so you need to do some testing to see if the timestamp is the leading few bytes and whether it is dictated by (e.g.) the logon time or the time of the first call to sys_guid(). The way in which the leading edge is generated could have a dramatic effect on storage and contention in the index.

    Regards

    Jonathan Lewis

    Mustafa_KALAYCI
  • JohnWatson2
    JohnWatson2 Member Posts: 4,426 Silver Crown
    edited Nov 25, 2016 11:00AM

    Just had another look (work is slow today - America is mostly on holiday) and I get very different timings. On my PC which is an Intel i7 2.3GHz (your X3 has faster Xeon processors) the sequence is a lot faster. Depending on the cache, it does generate a lot of redo whereas sys_guid generates none.

    orclz>orclz> set autot trace statorclz> set timing onorclz> select sys_guid() from dual connect by level < 1000000;999999 rows selected.Elapsed: 00:00:32.28Statistics----------------------------------------------------------          0  recursive calls          0  db block gets          0  consistent gets          0  physical reads          0  redo size   30933832  bytes sent via SQL*Net to client     733877  bytes received via SQL*Net from client      66668  SQL*Net roundtrips to/from client          1  sorts (memory)          0  sorts (disk)     999999  rows processedorclz> create sequence s1;Sequence created.Elapsed: 00:00:00.02orclz> select s1.nextval from dual connect by level < 1000000;999999 rows selected.Elapsed: 00:00:15.35Statistics----------------------------------------------------------      50012  recursive calls     151429  db block gets      50009  consistent gets          0  physical reads   36196692  redo size   18913645  bytes sent via SQL*Net to client     733877  bytes received via SQL*Net from client      66668  SQL*Net roundtrips to/from client          1  sorts (memory)          0  sorts (disk)     999999  rows processedorclz> create sequence s2 cache 10000;Sequence created.Elapsed: 00:00:00.01orclz> select s2.nextval from dual connect by level < 1000000;999999 rows selected.Elapsed: 00:00:04.32Statistics----------------------------------------------------------        109  recursive calls        304  db block gets        109  consistent gets          0  physical reads      72720  redo size   18913645  bytes sent via SQL*Net to client     733877  bytes received via SQL*Net from client      66668  SQL*Net roundtrips to/from client          1  sorts (memory)          0  sorts (disk)     999999  rows processedorclz>

    I think you really may need to think about the index. Using sys_guid, you could end up caching a complete copy of the index in both nodes. Do you want to do that? It is arguable both ways.

    Mustafa_KALAYCI
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,941 Blue Diamond
    edited Nov 25, 2016 11:26AM

    John,

    What O/S are you using ? If it's a Windows version of Oracle perhaps it behaves completely differently.

    Here's my output on a Linux VM with 12.1.0.2 (similar on 11.2.0.4):

    SQL> select sys_guid() from dual connect by level <= 10;

    SYS_GUID()

    --------------------------------

    4221329487460613E0530100007F4893

    4221329487470613E0530100007F4893

    4221329487480613E0530100007F4893

    4221329487490613E0530100007F4893

    42213294874A0613E0530100007F4893

    42213294874B0613E0530100007F4893

    42213294874C0613E0530100007F4893

    42213294874D0613E0530100007F4893

    42213294874E0613E0530100007F4893

    42213294874F0613E0530100007F4893

    10 rows selected.

    For a single session the sys_guid()s are sequential (check the 12th character); and for multiple sessions that start at about the same time the values are sequential per session and start with very similar values.

    Regards

    Jonathan Lewis

    P.S.  It's just possible, though a little unlikely given the scale of the difference that most of the time difference is about memory allocation. Are the timings virtually unchanged if you repeat run sys_guid() test twice in a row?

  • Mustafa_KALAYCI
    Mustafa_KALAYCI Member Posts: 3,383 Bronze Crown
    edited Nov 25, 2016 11:30AM

    wow Johnathan, great points. as you said I must re-do my tests for many things and I will. Also for your D step, I mentioned about it a little bit. when I run second time sequence examples, it takes more time and I believe when number gets higher it will take more time but of course I will create a full script about it in a few hours.

  • JohnWatson2
    JohnWatson2 Member Posts: 4,426 Silver Crown
    edited Nov 25, 2016 11:32AM

    I did get repeatable results. I'm using Windows, I had thought that it would be the chip that matters, not the OS, but perhaps not:

    orclz> set autot off

    orclz> select sys_guid() from dual connect by level <= 10;

    SYS_GUID()

    --------------------------------

    8E456A92F12D424FA92A34B8B675DEA8

    87A1B48419B74CE0B05C8637D9A5C6E4

    0321B5EE21CC42B1A291FF171F26E601

    4F0A5EA2F33C40D1B7D0E802FBC0677A

    5935EB51454A4D249C147CC3E1387AFD

    23087816FE384D7BA0200ABB1AF4A751

    65341B80F4B94B37A9F5B3A251A5BC6E

    3481A89750604B39BA64E7ADF8F38A53

    0B58E5FB0C594C34A46021B75ABA7304

    8E904982B57841DC821660E18EE80EF8

    10 rows selected.

    Elapsed: 00:00:00.00

    orclz> host ver

    Microsoft Windows [Version 10.0.14393]

    orclz> select * from v$version where rownum=1;

    BANNER                                                                              CON_ID

    -------------------------------------------------------------------------------- ----------

    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0

    Elapsed: 00:00:00.01

    orclz>

  • Mustafa_KALAYCI
    Mustafa_KALAYCI Member Posts: 3,383 Bronze Crown
    edited Nov 25, 2016 11:36AM

    John, thanks for showing statistics, I didn't look at them. as you present, sequence cause consistent gets (because of internal commit and data dictionary updates I guess). after I rearrange my test scripts, I will also check statistics too and of course I will be checking sys_guid data generation method so I can understand index structure more clear. as you said it is not something I want to hold whole index in memory.

    edit: my windows 10 machine creates sys_guids same as John's

    SQL> select sys_guid() from dual connect by level <= 10;

    SYS_GUID()

    --------------------------------

    33439FC5513D4B4ABEFC8D25D4A2EE23

    CA090D329DEB4B89AE4EB37F235725C3

    4DBBA9E9B4D644ADB281F22E60DD56C9

    40A8C07DAE1C45A2A41862EDE28A0DCB

    B107C3325F034E27A73F46C984B8A2E3

    4156B5CB15494551A0E528FEFD5B73FA

    AEB7F05EB0E5431388C4DC5F6DABA216

    9BF17367421C4E78B4C8CCC51E430971

    05677DC4B9574266831B293BA9003549

    5626B34D1098436CB57B366BFF966DCF

    10 rows selected.

    edit2: same configuration with John, windows and oracle too.

  • Unknown
    edited Nov 25, 2016 12:29PM

    First let me add something others haven't yet mentioned.

    There is a REAL ADVANTAGE to using GUIDs in that they should, by their nature, ALWAYS be unique. So you can just use that function to get a new key value whenever you need one.

    When you use NUMBERS for the key you always have to deal with the possibility that the number you use might already exist in the data.

    So people do things like MAX(myKeyValue) to figure out how to set/reset their sequence generators to avoid conflicts when data is moved from environment to environment. That can even cause an issue for a minor data fix like adding new value to a small lookup table.


    declare    x raw(16);  function sf return varchar2 is  begin  return sys_guid();  end;  begin  for i in 1..100000 loop      x := sf;  end loop;  end;  /    -- this takes 2.9 seconds.  times become more closer but still sys_guid is faster

    It could be even faster than that. Your function uses SYS_GUID to obtain a RAW but is returning a VARCHAR2 and storing in a RAW.

    That would result in two implicit conversions - one from RAW to VARCHAR2 and another from VARCHAR2 to RAW.

    Try it again using the RAW for the return datatype.

    also SYS_GUID can be a default value for a table column.

    Yes - but in 12c a sequence.nextval can be a default value also

    https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#SQLRF01402

    The DEFAULT expression can include the sequence pseudocolumns CURRVAL and NEXTVAL, as long as the sequence exists and you have the privileges necessary to access it.

    And see this AskTom article for additional info

    Ask Tom: On Oracle Database 12c, Part 1

    Improved Defaults

    The ability to create a default column value has existed in SQL for a while. The functionality has been somewhat limited, however, with various restrictions. For example, you were restricted from using a SEQUENCE object to supply a default value. Additionally, if a default value was to be inserted into or updated in a table, you had to either use the DEFAULT keyword in the SQL statement or leave the column out of the INSERT statement entirely. Furthermore, adding a new column that permits NULL values with a default value was an offline operation. In Oracle Database 12c, however, these restrictions and functionality limitations have been removed. 

    As usual, though, there are usually the practical considerations to consider when choosing a method and those often override any 'apparent' speed advantage.

    1. RAW values are not inherently 'user friendly' and even the human-readable form (using tests/varchar) isn't really something most developers would be familiar with.

    2. Any user tool (sql dev, toad, utl_file, Java, etc) that produces CSV, or similar files, will have to convert the value to a 'text' form. That will typically be VARCHAR2 taking 32 ascii bytes to represent the 16 byte raw. A typical numeric value will take far fewer bytes.

    Some of those tools don't really support exporting table data that includes LOBs (blob, clob, nclob). Haven't checked lately to see if they support RAW.

    3. Any report/query that 'displays' the data will also have to use a 'text' form. Again that means an automatic column width of 32 characters if that value is to be fully displayed.

    There is a REAL ADVANTAGE to using GUIDs that I haven't seen mentioned.

    Since primary key data is an important value if you are exporting data to files with the intent of moving it around (and in my experience a lot of people do such exports) using guids could be a deal killer.

    I'm a strong proponent of using surrogate key values rather than actual data except in specific, appropriate use cases.

    But while using guids may be a technical solution I don't consider it to be a practical solution given the penchant for exporting and reloading data that developers have shown (rightly or wrongly) over the years.

    For me there would have to be a REAL, demonstrated value that the standard NUMBER solution can't provide. I've never seen one.

    Mustafa_KALAYCI
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,941 Blue Diamond
    edited Nov 25, 2016 12:34PM

    John,

    The reason I asked was that when you said PC I thought Windows, and I believe Microsoft invented the GUID concept.

    With those thoughts in mind I wondered whether the Windows implementation of sys_guid() is just a call to the Windows GUID function while the Unix (et. al.) implementation is an in-house routine that operates much faster.  It's possible for an in-house call to take short cuts (e.g. remember the MAC rather than requesting it every time) but perhaps a call to a "3rd party" function has to live with whatever it gets.

    Regards

    Jonathan Lewis

    Mustafa_KALAYCI
This discussion has been closed.