1 2 3 Previous Next 40 Replies Latest reply: Nov 26, 2009 8:56 AM by Sven W. Go to original post
• ###### 30. Re: Generate surogate key (number) dynamically
``My dataset would be leike : A-Z of occurance thrice followed by 8 digit number followed by 8 digit caharcters A-Z.``
Will it always be in this format? The field is declared as VARCHAR2(255) but will actually always be VARCHAR2(19)?

If so, you could for example treat the character parts as base 26 numbers and convert to decimal.

For the 3 characters, 26^3 = a max value of 17576 (5 digits)
For the 8 characters, 26^8 = a max value of 208827064576 (12 digits)

With the 8 digit numeric part you now have 3 numeric parts. Lpad them all to max length with zeros and concatenate, then convert the concatenation back to a number, you should have (I think) a 25 digit unique numeric key.

EDIT: Example below:
``````SQL> create or replace function makekey (bkey in varchar2) return number is
part1 varchar2(3);
part2 varchar2(8);
part3 varchar2(8);
p1 number;
p3 number;
begin
part1 := substr(bkey, 1, 3);
part2 := substr(bkey, 4, 8);
part3 := substr(bkey, 12, 8);

p1 :=   (ascii(substr(part1, 1, 1)) - 65) * 26 * 26
+ (ascii(substr(part1, 2, 1)) - 65) * 26
+ (ascii(substr(part1, 3, 1)) - 65);

/* Seems to be a bug in my version of Oracle which gives an overflow error for this, hence the select from dual instead...

p3 :=     (ascii(substr(part3, 1, 1)) - 65) * 26 * 26 * 26 * 26 * 26 * 26 * 26;
+ (ascii(substr(part3, 2, 1)) - 65) * 26 * 26 * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 3, 1)) - 65) * 26 * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 4, 1)) - 65) * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 5, 1)) - 65) * 26 * 26 * 26
+ (ascii(substr(part3, 6, 1)) - 65) * 26 * 26
+ (ascii(substr(part3, 7, 1)) - 65) * 26
+ (ascii(substr(part3, 8, 1)) - 65);
*/

select (ascii(substr(part3, 1, 1)) - 65) * 26 * 26 * 26 * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 2, 1)) - 65) * 26 * 26 * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 3, 1)) - 65) * 26 * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 4, 1)) - 65) * 26 * 26 * 26 * 26
+ (ascii(substr(part3, 5, 1)) - 65) * 26 * 26 * 26
+ (ascii(substr(part3, 6, 1)) - 65) * 26 * 26
+ (ascii(substr(part3, 7, 1)) - 65) * 26
+ (ascii(substr(part3, 8, 1)) - 65)
into p3
from dual;

end;
/

Function created.

SQL> set numwidth 30
SQL> select makekey('SAB12345678PQRSTUVX') from dual
2  /

MAKEKEY('SAB12345678PQRSTUVX')
------------------------------
1216912345678125630362049``````
Note, if you are running this on a billion records, you might want to try to optimize it. Remove the select from dual and use the commented code if your version of Oracle doesn't exhibit the same bug as mine (10.2.0.3). Also might want to evaluate those *26 *26 *26... bits to a single number to avoid excess calculations.

Edited by: Cyn on Nov 25, 2009 1:24 PM
• ###### 31. Re: Generate surogate key (number) dynamically
My dataset would be like : [A-Z] of occurance thrice followed by 8 digit number followed by 8 digit caharcters[A-Z].
eg SAB12345678PQRSTUVX
So now we are from 80 to 19 with additional constraints
26^11 * 8^10 = 3941002384166598867943424 ~ 4 * 10^25 distinct values.
1. it will increase the width of the child table
Those information is not required as those can be stored by using a single surogate key.reduce teh size of teh table.
If your dataset will always have the above mentioned format you don't need broader columns than VARCHAR2(3), NUMBER, VARCHAR2(8). And the data contain not more informations than the surrogate key. I still think that this does not justify a surrogate key
Matching number is faster than character.
Oracle internaly uses a 16 byte raw populated with sys_guid() if you store nested tables in a database table. We have a system that uses a 26 Character HEX code as key for millions of rows. They never are a performance problem.
Surogate key is primary key in parent and in child it is not null. If I put index on number index table would be easily maintainable.
Oracle is maintaining the index so why bother. An index is an index.
• ###### 32. Re: Generate surogate key (number) dynamically
To give the OP an idea how to generate a unique key:
``````SELECT  ASCII('S') - 64
+ (ASCII('A') - 64) * 26
+ (ASCII('B') - 64) * 26 * 26
FROM    dual;     ``````
Beside the fact that if I really need a surrogate key I would simply concatenate the 3 values for an alphanumeric key ;-)

Regards
Marcus
• ###### 33. Re: Generate surogate key (number) dynamically
It should work with
``````p3 :=     (ascii(substr(part3, 1, 1)) - 65) * POWER(26,7)
+ (ascii(substr(part3, 2, 1)) - 65) * POWER(26,6)
+ (ascii(substr(part3, 3, 1)) - 65) * POWER(26,5)
+ (ascii(substr(part3, 4, 1)) - 65) * POWER(26,4)
+ (ascii(substr(part3, 5, 1)) - 65) * POWER(26,3)
+ (ascii(substr(part3, 6, 1)) - 65) * POWER(26,2)
+ (ascii(substr(part3, 7, 1)) - 65) * 26
+ (ascii(substr(part3, 8, 1)) - 65);``````
Marcus

Ceterum censeo: "if you are running this on a billion records, you might want to" use an alphanumeric key instead ;-)
• ###### 34. Re: Generate surogate key (number) dynamically
user608555 wrote:
Ye it is Dataware housing system. Oracle is 10.1.0.4.
Data volumen would be about 50 milion initial and daily growth would be about 10000 records.
It's going to take you 260 years to reach a billion records at that rate.

After ten years you will have 86.5 million records, so if your number key and varchar2(255) columns have an average total length of 50 bytes total then you have 4GB of data to scan. The server I'm working on will scan that in less than five seconds, and equi-partitioning your source data and the lookup table on the join key will reduce the memory requirement for a hash join to a very manageable amount.

This business of needing to lookup a synthetic key for a very large dimension table is not one that people haven't encountered before, and I'd forget about this perfect hash business and go for the simple method that works for everyone else.
• ###### 35. Re: Generate surogate key (number) dynamically
user608555 wrote:
Lets have an example.
Tgt_A : parent target table , Src_A : source table of Tgt_A. , Business key of Src_A is A_ID. I am creating surrogate key A_Key (seq number) based on unique A_ID. If it is oracle seq number , I don’t have any issue and I can put seq.nextval for each A_ID.

Now Tgt_B: child table of Tgt_A, Src_B : source of Tgt_B, Business key of Src_B is B_ID and also refernce key at source side is A_ID in Src_B.
Now if I want to load A_Key in Tgt_B against A_ID , I need to load and then scan Tgt_A to get the seq value , if I use simple oracle sequence.
If I can use unique number against business key and can ensure it always give the same unique number for a same input value, I can load Tgt_B in parallel with Tgt_A.
So, using your example, and using sequences, you'd get something like this:
``````SQL> create table src_a
2  ( a_id number primary key
3  )
4  /

Tabel is aangemaakt.

SQL> create table src_b
2  ( b_id number primary key
3  , a_id number references src_a (a_id)
4  )
5  /

Tabel is aangemaakt.

SQL> create table tgt_a
2  ( a_key number primary key
3  , a_id  number unique
4  )
5  /

Tabel is aangemaakt.

SQL> create table tgt_b
2  ( b_key number primary key
3  , a_key number references tgt_a (a_key)
4  , b_id  number unique
5  )
6  /

Tabel is aangemaakt.

SQL> insert /*+ append */ into src_a
2   select level
3     from dual
4  connect by level <= 10000
5  /

10000 rijen zijn aangemaakt.

SQL> insert /*+ append */ into src_b
2   select level
3        , ceil(level/5)
4     from dual
5  connect by level <= 50000
6  /

50000 rijen zijn aangemaakt.

SQL> create sequence seq_a
2  /

Reeks is aangemaakt.

SQL> create sequence seq_b
2  /

Reeks is aangemaakt.

SQL> begin
2    insert /*+ append */ into tgt_a (a_key,a_id)
3    select seq_a.nextval
4         , a_id
5      from src_a
6    ;
7    commit;
8    insert /*+ append */ into tgt_b (b_key,a_key,b_id)
9    select seq_b.nextval
10         , tgt_a.a_key
11         , src_b.b_id
12      from src_b
13         , tgt_a
14     where src_b.a_id = tgt_a.a_id
15    ;
16    commit;
17  end;
18  /

PL/SQL-procedure is geslaagd.``````
What's exactly the problem with that?
The join in the last insert? I doubt it with only 10,000 rows per day.

Regards,
Rob.
• ###### 36. Re: Generate surogate key (number) dynamically
Rob van Wijk wrote:
user608555 wrote:
Lets have an example.
Tgt_A : parent target table , Src_A : source table of Tgt_A. , Business key of Src_A is A_ID. I am creating surrogate key A_Key (seq number) based on unique A_ID. If it is oracle seq number , I don’t have any issue and I can put seq.nextval for each A_ID.

Now Tgt_B: child table of Tgt_A, Src_B : source of Tgt_B, Business key of Src_B is B_ID and also refernce key at source side is A_ID in Src_B.
Now if I want to load A_Key in Tgt_B against A_ID , I need to load and then scan Tgt_A to get the seq value , if I use simple oracle sequence.
If I can use unique number against business key and can ensure it always give the same unique number for a same input value, I can load Tgt_B in parallel with Tgt_A.
So, using your example, and using sequences, you'd get something like this:
``````SQL> create table src_a
2  ( a_id number primary key
3  )
4  /

Tabel is aangemaakt.

SQL> create table src_b
2  ( b_id number primary key
3  , a_id number references src_a (a_id)
4  )
5  /

Tabel is aangemaakt.

SQL> create table tgt_a
2  ( a_key number primary key
3  , a_id  number unique
4  )
5  /

Tabel is aangemaakt.

SQL> create table tgt_b
2  ( b_key number primary key
3  , a_key number references tgt_a (a_key)
4  , b_id  number unique
5  )
6  /

Tabel is aangemaakt.

SQL> insert /*+ append */ into src_a
2   select level
3     from dual
4  connect by level <= 10000
5  /

10000 rijen zijn aangemaakt.

SQL> insert /*+ append */ into src_b
2   select level
3        , ceil(level/5)
4     from dual
5  connect by level <= 50000
6  /

50000 rijen zijn aangemaakt.

SQL> create sequence seq_a
2  /

Reeks is aangemaakt.

SQL> create sequence seq_b
2  /

Reeks is aangemaakt.

SQL> begin
2    insert /*+ append */ into tgt_a (a_key,a_id)
3    select seq_a.nextval
4         , a_id
5      from src_a
6    ;
7    commit;
8    insert /*+ append */ into tgt_b (b_key,a_key,b_id)
9    select seq_b.nextval
10         , tgt_a.a_key
11         , src_b.b_id
12      from src_b
13         , tgt_a
14     where src_b.a_id = tgt_a.a_id
15    ;
16    commit;
17  end;
18  /

PL/SQL-procedure is geslaagd.``````
What's exactly the problem with that?
The join in the last insert? I doubt it with only 10,000 rows per day.

Regards,
Rob.
Why even bother with a sequence? Base the new values on (select Coalesce(Max(pk_id),0) value from my_table) plus the rownum being inserted. It would be rather rare for multiple inserts to have to run concurrently into a data warehouse table.
• ###### 37. Re: Generate surogate key (number) dynamically
Okay.great this is nothis but the 26-base number system kind of things.Thansk a lot for the information.And hopefully there shouldn't be any collision here.

For Rob :

This is what I did at this moment. But although 10K/day is requirement, but joining will be on 1 billion with 4 billion to get the key of parent.
Data on parent and child will not come same day . So anyway I have to scan the large table to get the keys.Thats whay I have asked for the dynamically generation of keys.

Not agreed with Marwim. It is really helpful of having single surogate key as FK rather composite key.Check in Ask Tom :

'if the composite primary key is not a foreign key in lots of tables -- go for the composite primary key.If it is -- give serious consideration to the surrogate key populated via a sequence..'
• ###### 38. Re: Generate surogate key (number) dynamically
Cyn wrote:
``My dataset would be leike : A-Z of occurance thrice followed by 8 digit number followed by 8 digit caharcters A-Z.``
Will it always be in this format? The field is declared as VARCHAR2(255) but will actually always be VARCHAR2(19)?
...
SQL> set numwidth 30
SQL> select makekey('SAB12345678PQRSTUVX') from dual
2 /

MAKEKEY('SAB12345678PQRSTUVX')
------------------------------
1216912345678125630362049

Well I have three huge problems with this whole approach.

Firstly, a 19 byte character string is being replaced with a 14 byte number. If a sequence was used to generate the key values and you did need a billion of them then the worst case would be a key size of VSIZE(1000000001): 6 bytes. If there is not much attraction in saving 13 bytes by using a true synthetic key then I wouldn't bother saving 5 bytes by generating a number at all. You might as well use your character string.

Secondly, you don't really have a synthetic primary key with your method at all. The value is deterministically based on the real value and you can't safely change the business value in your dimension table without updating the PK and FK values as well, so you're throwing out both of the advantages of using a synthetic primary key -- saving on storage and making it truly immutable.

Thirdly, I don't think that the standard method of generating keys in the dimension table and looking them up (from the dimension or a more compact lookup table) is non-performant enough to warrant this approach.

So I'm afraid that I think that this whole "perfect hash key" approach, while sounding all sexy and exciting and mathematical and rigourous and all, is a waste of time. It wouldn't pass my code review.
• ###### 39. Re: Generate surogate key (number) dynamically
David_Aldridge wrote:
Why even bother with a sequence? Base the new values on (select Coalesce(Max(pk_id),0) value from my_table) plus the rownum being inserted. It would be rather rare for multiple inserts to have to run concurrently into a data warehouse table.
Hi David,

If you can be sure concurrent inserts won't take place, then that's certainly an option as well. I'm used to OLTP environments where you can never be sure about that. In this case -a data warehouse- your option is even easier.

Regards,
Rob.
• ###### 40. Re: Generate surogate key (number) dynamically
David_Aldridge wrote:
So I'm afraid that I think that this whole "perfect hash key" approach, while sounding all sexy and exciting and mathematical and rigourous and all, is a waste of time. It wouldn't pass my code review.
Well said!
1 2 3 Previous Next