Discussions
Categories
- 196.7K All Categories
- 2.2K Data
- 234 Big Data Appliance
- 1.9K Data Science
- 449.7K Databases
- 221.5K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 549 MySQL Community Space
- 477 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3K ORDS, SODA & JSON in the Database
- 532 SQLcl
- 4K SQL Developer Data Modeler
- 186.8K SQL & PL/SQL
- 21.2K SQL Developer
- 295.3K Development
- 17 Developer Projects
- 138 Programming Languages
- 292K Development Tools
- 104 DevOps
- 3.1K QA/Testing
- 645.9K Java
- 27 Java Learning Subscription
- 37K Database Connectivity
- 153 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.1K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 17 Java Essentials
- 157 Java 8 Questions
- 85.9K Java Programming
- 79 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.2K Java SE
- 13.8K Java Security
- 203 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 387 LiveLabs
- 37 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.6K Other Languages
- 2.3K Chinese
- 170 Deutsche Oracle Community
- 1K Español
- 1.9K Japanese
- 230 Portuguese
using sys_guid() as primary key

hello everyone,
This is not a question just wonder about your ideas and experiences. I just want to learn your opinions about the subject. I already search forum and google it, read blogs etc. is there anyone using this? many blog says sys_guid can be used but it is a little bit "slower" against sequences. so I run my tests and it is much more faster than sequences. here is my test:
oracle 11.2.0.3 DB on an Exadata X3 (1/8)
drop table t_Seq;drop table t_raw;drop sequence seq1 ;create sequence seq1; -- default 20 cachecreate table t_seq ( id number(9) primary key , filler varchar2(1000) );insert into t_seq select seq1.nextval ,'sdfsf' from dual connect by level < 1000000; -- this takes 25 secondsdrop table t_seq;drop sequence seq1;create sequence seq1 cache 10000; create table t_seq ( id number(9) primary key , filler varchar2(1000) );insert into t_seq select seq1.nextval ,'sdfsf' from dual connect by level < 1000000; -- this takes 13 secondscreate table t_raw ( id raw(16) primary key , filler varchar2(1000) ); insert into t_raw select sys_guid(),'sdfsf' from dual connect by level < 1000000; -- this takes 3.8 seconds select segment_name, bytes from user_Segments where segment_name in (select index_name from dba_indexes where table_name in ('T_SEQ', 'T_RAW'));SYS_C0069970 28311552 --sys_guidSYS_C0069969 15728640 --sequenceselect segment_name, bytes from user_Segments where segment_name in ('T_SEQ', 'T_RAW');T_RAW 32505856T_SEQ 18874368
so first create a sequence with default 20 cache, and then test it with 10.000 cache. best sequence performance is 13 seconds (with 10.000 cahce) bu sys_guid is already much more faster, 3.8 seconds. I also check the sizes of course both index and table. SYS_GUID seems bigger twice than the sequnce (but of course sequnce number is starting from 1 and table column is number(9), if this was bigger, space would be bigger).
so I just want to test sequence and sys_guid:
drop sequence seq1;create sequence seq1 ;declare x number(38); function sf return number is begin return seq1.nextval; end;begin for i in 1..100000 loop x := sf; end loop;end;/ -- this takes 6 secondsdrop sequence seq1;create sequence seq1 cache 10000;declare x number(38); function sf return number is begin return seq1.nextval; end;begin for i in 1..100000 loop x := sf; end loop;end;/ -- this takes 4.8 secondsdeclare x raw(16); function sf return varchar2 is begin return sys_guid(); end;begin for i in 1..100000 loop x := sf; end loop;end;/ -- this takes 2.9 seconds.
times become more closer but still sys_guid is faster. also I realized, if I run sequence codes twice without dropping sequences, time is getting longer. also SYS_GUID can be a default value for a table column.
so what do you think? thanks.
Comments
-
You might want to consider your primary key index, particularly as this is a RAC. Using sys_guid will spread the index inserts across the whole range of the index, whereas your sequence will insert into the right hand edge. Unless you reverse key it. This can mean playing games between efficient use of the buffer cache and block contention problems.
-
hi @JohnWatson2,
"block contention" yes I didn't think about it. good point. I also understand on my example that NEXTVAL is a plsql action right? my insert statements with sequences takes much more longer than plsql example but SYS_GUID has almost same performans both sql and plsql.
-
Good effort,
A couple of thoughts, though:
a) For this type of comparison I would do things like flush buffer_cache before eash run and and switch logfiles to make sure that the second test didn't suffer any side effects from the firsrt test (e.g. massive writes to disc as an unlucky log file is switched).
b) Watch out for the side effects of the large (1M) connect by allocating lots of PGA - your figures showed a relatively small difference on the insert/select, it's possible that a large part of the difference was O/S time allocating memory for your session to use from the PGA.
c) An insert on its own is only a starting point - if you think the results suggest it might be worth pursuing then run up (say) 8 concurrent processes to loop through the same total volume of data with a small pause (say 0.01 sec) between inserts. Check the effects on workload, completion rate, size of index etc.
d) Rather than starting a sequence-based test from 1 I would try to emulate what a production system would do at steady state and start with (e.g.) 10M.
=============
The thought I take from your experiments so far is that a sys_guid() is 16 bytes fixed, a number is variable but typically stored in about (N+1)/2 bytes where N is the number of digits, so the space usage in table and index are likely to be significantly different while the timing difference you show is only a few seconds over 1M rows.
Add in my previous experience (which seems to be confirmed by the examples in the manual - e.g.https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions187.htm#SQLRF06120 (check the 6th byte / 12th digit) - the Oracle implementation of sys_guid() has a component that makes it largely sequential so it's behaviour may not be too different from some of the behaviour seen in sequences (notwithstanding John's comments about random insertion to indexes.)
I have a vague memory of reading something that described how the sys_guid() was generated and this included a timestamp and the MAC and some other bits - so you need to do some testing to see if the timestamp is the leading few bytes and whether it is dictated by (e.g.) the logon time or the time of the first call to sys_guid(). The way in which the leading edge is generated could have a dramatic effect on storage and contention in the index.
Regards
Jonathan Lewis
-
Just had another look (work is slow today - America is mostly on holiday) and I get very different timings. On my PC which is an Intel i7 2.3GHz (your X3 has faster Xeon processors) the sequence is a lot faster. Depending on the cache, it does generate a lot of redo whereas sys_guid generates none.
orclz>orclz> set autot trace statorclz> set timing onorclz> select sys_guid() from dual connect by level < 1000000;999999 rows selected.Elapsed: 00:00:32.28Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 0 consistent gets 0 physical reads 0 redo size 30933832 bytes sent via SQL*Net to client 733877 bytes received via SQL*Net from client 66668 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 999999 rows processedorclz> create sequence s1;Sequence created.Elapsed: 00:00:00.02orclz> select s1.nextval from dual connect by level < 1000000;999999 rows selected.Elapsed: 00:00:15.35Statistics---------------------------------------------------------- 50012 recursive calls 151429 db block gets 50009 consistent gets 0 physical reads 36196692 redo size 18913645 bytes sent via SQL*Net to client 733877 bytes received via SQL*Net from client 66668 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 999999 rows processedorclz> create sequence s2 cache 10000;Sequence created.Elapsed: 00:00:00.01orclz> select s2.nextval from dual connect by level < 1000000;999999 rows selected.Elapsed: 00:00:04.32Statistics---------------------------------------------------------- 109 recursive calls 304 db block gets 109 consistent gets 0 physical reads 72720 redo size 18913645 bytes sent via SQL*Net to client 733877 bytes received via SQL*Net from client 66668 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 999999 rows processedorclz>
I think you really may need to think about the index. Using sys_guid, you could end up caching a complete copy of the index in both nodes. Do you want to do that? It is arguable both ways.
-
John,
What O/S are you using ? If it's a Windows version of Oracle perhaps it behaves completely differently.
Here's my output on a Linux VM with 12.1.0.2 (similar on 11.2.0.4):
SQL> select sys_guid() from dual connect by level <= 10;
SYS_GUID()
--------------------------------
4221329487460613E0530100007F4893
4221329487470613E0530100007F4893
4221329487480613E0530100007F4893
4221329487490613E0530100007F4893
42213294874A0613E0530100007F4893
42213294874B0613E0530100007F4893
42213294874C0613E0530100007F4893
42213294874D0613E0530100007F4893
42213294874E0613E0530100007F4893
42213294874F0613E0530100007F4893
10 rows selected.
For a single session the sys_guid()s are sequential (check the 12th character); and for multiple sessions that start at about the same time the values are sequential per session and start with very similar values.
Regards
Jonathan Lewis
P.S. It's just possible, though a little unlikely given the scale of the difference that most of the time difference is about memory allocation. Are the timings virtually unchanged if you repeat run sys_guid() test twice in a row?
-
wow Johnathan, great points. as you said I must re-do my tests for many things and I will. Also for your D step, I mentioned about it a little bit. when I run second time sequence examples, it takes more time and I believe when number gets higher it will take more time but of course I will create a full script about it in a few hours.
-
I did get repeatable results. I'm using Windows, I had thought that it would be the chip that matters, not the OS, but perhaps not:
orclz> set autot off
orclz> select sys_guid() from dual connect by level <= 10;
SYS_GUID()
--------------------------------
8E456A92F12D424FA92A34B8B675DEA8
87A1B48419B74CE0B05C8637D9A5C6E4
0321B5EE21CC42B1A291FF171F26E601
4F0A5EA2F33C40D1B7D0E802FBC0677A
5935EB51454A4D249C147CC3E1387AFD
23087816FE384D7BA0200ABB1AF4A751
65341B80F4B94B37A9F5B3A251A5BC6E
3481A89750604B39BA64E7ADF8F38A53
0B58E5FB0C594C34A46021B75ABA7304
8E904982B57841DC821660E18EE80EF8
10 rows selected.
Elapsed: 00:00:00.00
orclz> host ver
Microsoft Windows [Version 10.0.14393]
orclz> select * from v$version where rownum=1;
BANNER CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0
Elapsed: 00:00:00.01
orclz>
-
John, thanks for showing statistics, I didn't look at them. as you present, sequence cause consistent gets (because of internal commit and data dictionary updates I guess). after I rearrange my test scripts, I will also check statistics too and of course I will be checking sys_guid data generation method so I can understand index structure more clear. as you said it is not something I want to hold whole index in memory.
edit: my windows 10 machine creates sys_guids same as John's
SQL> select sys_guid() from dual connect by level <= 10;
SYS_GUID()
--------------------------------
33439FC5513D4B4ABEFC8D25D4A2EE23
CA090D329DEB4B89AE4EB37F235725C3
4DBBA9E9B4D644ADB281F22E60DD56C9
40A8C07DAE1C45A2A41862EDE28A0DCB
B107C3325F034E27A73F46C984B8A2E3
4156B5CB15494551A0E528FEFD5B73FA
AEB7F05EB0E5431388C4DC5F6DABA216
9BF17367421C4E78B4C8CCC51E430971
05677DC4B9574266831B293BA9003549
5626B34D1098436CB57B366BFF966DCF
10 rows selected.
edit2: same configuration with John, windows and oracle too.
-
First let me add something others haven't yet mentioned.
There is a REAL ADVANTAGE to using GUIDs in that they should, by their nature, ALWAYS be unique. So you can just use that function to get a new key value whenever you need one.
When you use NUMBERS for the key you always have to deal with the possibility that the number you use might already exist in the data.
So people do things like MAX(myKeyValue) to figure out how to set/reset their sequence generators to avoid conflicts when data is moved from environment to environment. That can even cause an issue for a minor data fix like adding new value to a small lookup table.
declare x raw(16); function sf return varchar2 is begin return sys_guid(); end; begin for i in 1..100000 loop x := sf; end loop; end; / -- this takes 2.9 seconds. times become more closer but still sys_guid is fasterIt could be even faster than that. Your function uses SYS_GUID to obtain a RAW but is returning a VARCHAR2 and storing in a RAW.
That would result in two implicit conversions - one from RAW to VARCHAR2 and another from VARCHAR2 to RAW.
Try it again using the RAW for the return datatype.
also SYS_GUID can be a default value for a table column.
Yes - but in 12c a sequence.nextval can be a default value also
https://docs.oracle.com/database/121/SQLRF/statements_7002.htm#SQLRF01402
The
DEFAULT
expression can include the sequence pseudocolumnsCURRVAL
andNEXTVAL
, as long as the sequence exists and you have the privileges necessary to access it.And see this AskTom article for additional info
Ask Tom: On Oracle Database 12c, Part 1
Improved Defaults
The ability to create a default column value has existed in SQL for a while. The functionality has been somewhat limited, however, with various restrictions. For example, you were restricted from using a SEQUENCE object to supply a default value. Additionally, if a default value was to be inserted into or updated in a table, you had to either use the DEFAULT keyword in the SQL statement or leave the column out of the INSERT statement entirely. Furthermore, adding a new column that permits NULL values with a default value was an offline operation. In Oracle Database 12c, however, these restrictions and functionality limitations have been removed.
As usual, though, there are usually the practical considerations to consider when choosing a method and those often override any 'apparent' speed advantage.
1. RAW values are not inherently 'user friendly' and even the human-readable form (using tests/varchar) isn't really something most developers would be familiar with.
2. Any user tool (sql dev, toad, utl_file, Java, etc) that produces CSV, or similar files, will have to convert the value to a 'text' form. That will typically be VARCHAR2 taking 32 ascii bytes to represent the 16 byte raw. A typical numeric value will take far fewer bytes.
Some of those tools don't really support exporting table data that includes LOBs (blob, clob, nclob). Haven't checked lately to see if they support RAW.
3. Any report/query that 'displays' the data will also have to use a 'text' form. Again that means an automatic column width of 32 characters if that value is to be fully displayed.
There is a REAL ADVANTAGE to using GUIDs that I haven't seen mentioned.
Since primary key data is an important value if you are exporting data to files with the intent of moving it around (and in my experience a lot of people do such exports) using guids could be a deal killer.
I'm a strong proponent of using surrogate key values rather than actual data except in specific, appropriate use cases.
But while using guids may be a technical solution I don't consider it to be a practical solution given the penchant for exporting and reloading data that developers have shown (rightly or wrongly) over the years.
For me there would have to be a REAL, demonstrated value that the standard NUMBER solution can't provide. I've never seen one.
-
John,
The reason I asked was that when you said PC I thought Windows, and I believe Microsoft invented the GUID concept.
With those thoughts in mind I wondered whether the Windows implementation of sys_guid() is just a call to the Windows GUID function while the Unix (et. al.) implementation is an in-house routine that operates much faster. It's possible for an in-house call to take short cuts (e.g. remember the MAC rather than requesting it every time) but perhaps a call to a "3rd party" function has to live with whatever it gets.
Regards
Jonathan Lewis