This content has been marked as final.
Show 26 replies

15. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Franck Pachot Jun 28, 2012 2:27 PM (in response to JustinCave)Hi,
Justin Cave wrote:
Unfortunately the sequences are generating numbers... I would have preferred that it generates RAW: the fastests comparisions, no questions about gaps, no business meaning at all...
If you are using synthetic primary keys, you are presumably using sequences to generate the key values. Sequences generate numbers so your primary key ought to be a number.
Regards,
Franck. 
16. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
rp0428 Jun 28, 2012 4:11 PM (in response to Franck Pachot)>
Unfortunately the sequences are generating numbers... I would have preferred that it generates RAW: the fastests comparisions, no questions about gaps, no business meaning at all...
>
What do you mean 'I would have preferred'. This is Kayal's thread and Kayal's issue  not yours.
'fastests comparisons'  post your supporting evidence. How many bytes are the RAW values? How would they be generated to be unique? Numbers provide very compact storage. Numbers are easily handled by any tool that might be used. Numbers are easily exported to delimited files.
'no questions about gaps'  if the business insists on gapfree keys the same questions apply to RAW as to NUMBER. It's just that most people can't tell if there are gaps if looking at RAW values  but it doesn't make the question go away if gapfree was wanted.
'no business meaning at all'  that applies to ANY surrogate key. In fact, that is one of the main advantages of surrogate keys  they have no business meaning. You should be able to replace any surrogate key with any other nonexisting surrogate key with no issues at all save one; the restore from backup issue where you change the key after creating backups that use the old key and then restore it. Even then, using surrogate keys if ALL tables (or records) that contain the old surrogate key value are restored there is still no effect. 
17. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Mark D Powell Jun 29, 2012 9:11 AM (in response to rp0428)It is possible Franc used raw to mean binary. Oracle stores the number data type in a form of scientific notation so library math routines have to be accessed in order to use the values in mathimatical operations and comparisons. In any case he is intitled to his opinion though there is in fact no logical reason the artificial key needs to be numeric. Since the key in this type of design is supposed to be meaningless any unique value should do. The problem is in how to efficiently generate guaranteed unique values other than numeric values.
IMHO  Mark D Powell  
18. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Mark D Powell Jun 29, 2012 9:24 AM (in response to Pavan Kumar)Pavan, the post was on the old Oracle support forums which as far as I can tell Oracle did not preserve when the newer Oracle Community site launced a few years ago. I find the idea of creating such a test and accessing the same 100K rows in the same order but using a numeric key in one test and a character key in the other several times and averaging the times interesting, but I am not sure where I would find the time.
HTH  Mark D Powell  
19. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Mark Malakanov (user11181920) Jun 29, 2012 11:21 AM (in response to Mark D Powell)Oracle stores the number data type in a form of scientific notation so library math routines have to be accessed in order to use the values in mathimatical operations and comparisons.
Oracle stores the number data type packed 3 digits into 2 bytes. This way distinct values are somewhat (~30%) more compact compared to plain char string distinct values. Strings can be more compact if one would use nonprintable ASCII values, like 1 or 2 or 255 etc. But I never seen this.
For equality comparison math functions are not required because binary values can be compared without "decompression". 
20. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
rp0428 Jun 29, 2012 11:40 AM (in response to Mark Malakanov (user11181920))>
Oracle stores the number data type packed 3 digits into 2 bytes. This way distinct values are somewhat (~30%) more compact compared to plain char string distinct values.
>
I've never heard of that. Please provide a citation to support that statement.
As Mark already said Oracle uses a form of scientific notation
See Internal Numeric Format in the Database Concepts doc
http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i16209
>
Internal Numeric Format
Oracle Database stores numeric data in variablelength format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle Database does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 102, with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.
Taking this into account, the column size in bytes for a particular numeric data value NUMBER(p), where p is the precision of a given value, can be calculated using the following formula:
ROUND((length(p)+s)/2))+1
where s equals zero if the number is positive, and s equals 1 if the number is negative.
Zero and positive and negative infinity (only generated on import from Oracle Database, Version 5) are stored using unique representations. Zero and negative infinity each require 1 byte; positive infinity requires 2 bytes. 
21. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Mark Malakanov (user11181920) Jun 29, 2012 4:14 PM (in response to rp0428)You should read docs more careful.... and *2* bytes used to store the three significant digits of the mantissa ...
So, my statement does not contradicts docs.
this is what I meant by "compression".
Oracle's digit compression is neither storing 1 digit in 1 byte, like strings do, nor storing it like native binary numerics do, (byte, int, long, float, double etc...).
It packs 3 digits into 2 bytes.
I was not saying how exponent and negatives organized, it was irrelevant to the original topic, what was relevant that numeric distinct values are more dense then varchar ones. Usually.
Because 1234567890 as number stored in 6 bytes.
But '1234567890' as varchar2 stored in 10 bytes.
However, if one will use not only numeric but also alpha and nonprintable characters for key string values it can be as dense as binary, and 1234567890 can be stored in 4 bytes. 
22. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
rp0428 Jun 29, 2012 5:20 PM (in response to Mark Malakanov (user11181920))>
You should read docs more careful.
>
No need  I understand the doc perfectly, having worked with that, and many other formats for over 30 years.
You need to reread the doc format description more carefully and apply it to a sample of data values. Then, perhaps, you will understand the difference between what you stated and what the doc is stating.
>
Oracle's digit compression is neither storing 1 digit in 1 byte, like strings do, nor storing it like native binary numerics do, (byte, int, long, float, double etc...).
It packs 3 digits into 2 bytes.
>
The second sentence is simply not true. You have taken the one example from the doc for a three digit number and extrapolated it to a general rule and that general rule is not true. Oracle's rule is not 'pack every 3 digits into 2 bytes' as your statement implies. If that were the rule then 6 digits would get packed into 4 bytes when in fact it only takes 3. And 21 digits would be packed into 14 bytes when it actually only takes 11.
Oracle removes the leading and trailing zeros from the number. Then the mantissa of a number, excluding the sign, is stored in BCD (binary coded decimal) format with 2 decimal digits per byte. The three digit example (412) used 2 bytes because there were an odd number of digits. A four digit number would also use 2 bytes.
The formula shows that the sign will take one byte for negative numbers but zero bytes for positive numbers. For any even number of digits you can divide by two to get the number of bytes for the mantissa (not including sign). For an odd number of digits add 1 and then divide by 2.
>
This way distinct values are somewhat (~30%) more compact compared to plain char string distinct values.
>
Again, not true and misleading at to what the actual rule and savings are. The savings for even numbers with no leading or trailing zeros will be near 50% due to the BCD encoding (near because there is 1 byte for the exponent). The savings for odd numbers slightly less since one nybble of the highorder byte will be unused.
For numbers with large numbers of trailing zeroes the savings is even greater because Oracle does not store the leading or trailing zeros. So ALL of these numbers take exactly the same amount of space to store
Each of those is stored using one byte for the exponent and one byte for the mantissa. The exponent is different for each value.1 10 1000 10000000 100000000000000000000
And when you made this statement it's not clear if you are including the exponent or not.
>
1234567890 as number stored in 6 bytes.
>
Yes it is  the 10 decimal digits are stored in five bytes, not 6 and then there will one byte for the exponent. So what happened to your '3 digits into 2 byte' rule? That rule says the 10 decimal digits alone should take 7 bytes (2 each for 3 sets of 3 digits and one byte for the 10th digit).
So your statements 'misstate' the docs. You suggested a general rule that was wrong by extrapolating from a single example to the general case.
That is what I was pointing out.
Your statements about compression are correct but not for the reasons you stated. 
23. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Mark Malakanov (user11181920) Jun 29, 2012 5:50 PM (in response to rp0428)Then the mantissa of a number, excluding the sign, is stored in BCD (binary coded decimal) format with 2 decimal digits per byte.
You are right! And to the point.
I am wrong. (ashamed)
And this way it is even better, because it is even more dense!
the sign will take one byte for negative numbers
I only do not understand why "the sign will take one byte for negative numbers".
Why to waste entire byte just for negative sigh? They could use a bit from exponent, why not? Do you have any explanation or idea? 
24. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
rp0428 Jun 29, 2012 6:28 PM (in response to Mark Malakanov (user11181920))>
You are right!
>
Not exactly! I said BCD but Oracle actually used base100 which still means, as the other doc citation states, 'each byte can represent 2 decimal digits.'
>
I only do not understand why "the sign will take one byte for negative numbers".
Why to waste entire byte just for negative sigh? They could use a bit from exponent, why not? Do you have any explanation or idea?
>
See? You always have to keep digging and run some tests. The OCI doc says a bit from the exponent IS used but the other doc said a whole byte is used.
Now see what you make of this quote from the NUMBER section of the Call Interface Programmer's Guide
http://docs.oracle.com/cd/B28359_01/appdev.111/b28395/oci03typ.htm#i423684
>
Oracle stores values of the NUMBER datatype in a variablelength format. The first byte is the exponent and is followed by 1 to 20 mantissa bytes. The highorder bit of the exponent byte is the sign bit; it is set for positive numbers and it is cleared for negative numbers. The lower 7 bits represent the exponent, which is a base100 digit with an offset of 65.
To calculate the decimal exponent, add 65 to the base100 exponent and add another 128 if the number is positive. If the number is negative, you do the same, but subsequently the bits are inverted. For example, 5 has a base100 exponent = 62 (0x3e). The decimal exponent is thus (~0x3e) 128  65 = 0xc1 128 65 = 193 128 65 = 0.
Each mantissa byte is a base100 digit, in the range 1..100. For positive numbers, the digit has 1 added to it. So, the mantissa digit for the value 5 is 6. For negative numbers, instead of adding 1, the digit is subtracted from 101. So, the mantissa digit for the number 5 is 96 (101  5). Negative numbers have a byte containing 102 appended to the data bytes. However, negative numbers that have 20 mantissa bytes do not have the trailing 102 byte. Because the mantissa digits are stored in base 100, each byte can represent 2 decimal digits. The mantissa is normalized; leading zeroes are not stored.
Up to 20 data bytes can represent the mantissa. However, only 19 are guaranteed to be accurate. The 19 data bytes, each representing a base100 digit, yield a maximum precision of 38 digits for an Oracle NUMBER.
>
Interesting content. The high order bit of the exponent is the sign bit. Each mantissa byte . . . For positive numbers, the digit has 1 added to it.
Tests show that it is a bit that is used for the sign but the sign also affects the actual number stored in the mantissa.
Notice the '13' in the DUMP(12) output? The number was positive so '1' was added to it. And the high order bit of the exponent (193) is set.select 12, dump(12), 12, dump(12) from dual 12,DUMP(12),12,DUMP(12) 12,Typ=2 Len=2: 193,13,12,Typ=2 Len=3: 62,89,102

25. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
Mark Malakanov (user11181920) Jun 29, 2012 8:54 PM (in response to rp0428)Interesting content. The high order bit of the exponent is the sign bit. Each mantissa byte . . . For positive numbers, the digit has 1 added to it.
Getting a little bit clear.
Exponent has its own sign, it can be negative too for numbers <1. Or 0.1?
And mantissa has its sign too. 
26. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
rp0428 Jun 29, 2012 10:20 PM (in response to Mark Malakanov (user11181920))>
Exponent has its own sign, it can be negative too for numbers <1. Or 0.1?
And mantissa has its sign too.
>
Nope! The high order bit of the exponent byte is actually the sign of the number itself! The sign of the exponent itself is taken into account when the exponent is encoded using base 100, possible inversion and the offset of 65.
The exponent will decode as a negative or positive number.