1 2 Previous Next 26 Replies Latest reply: Jun 29, 2012 8:54 PM by Mark Malakanov (user11181920) Go to original post RSS
      • 15. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
        Franck Pachot
        Hi,
        Justin Cave wrote:
        If you are using synthetic primary keys, you are presumably using sequences to generate the key values. Sequences generate numbers so your primary key ought to be a number.
        Unfortunately the sequences are generating numbers... I would have preferred that it generates RAW: the fastests comparisions, no questions about gaps, no business meaning at all...

        Regards,
        Franck.
        • 16. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
          rp0428
          >
          Unfortunately the sequences are generating numbers... I would have preferred that it generates RAW: the fastests comparisions, no questions about gaps, no business meaning at all...
          >
          What do you mean 'I would have preferred'. This is Kayal's thread and Kayal's issue - not yours.

          'fastests comparisons' - post your supporting evidence. How many bytes are the RAW values? How would they be generated to be unique? Numbers provide very compact storage. Numbers are easily handled by any tool that might be used. Numbers are easily exported to delimited files.

          'no questions about gaps' - if the business insists on gap-free keys the same questions apply to RAW as to NUMBER. It's just that most people can't tell if there are gaps if looking at RAW values - but it doesn't make the question go away if gap-free was wanted.

          'no business meaning at all' - that applies to ANY surrogate key. In fact, that is one of the main advantages of surrogate keys - they have no business meaning. You should be able to replace any surrogate key with any other non-existing surrogate key with no issues at all save one; the restore from backup issue where you change the key after creating backups that use the old key and then restore it. Even then, using surrogate keys if ALL tables (or records) that contain the old surrogate key value are restored there is still no effect.
          • 17. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
            Mark D Powell
            It is possible Franc used raw to mean binary. Oracle stores the number data type in a form of scientific notation so library math routines have to be accessed in order to use the values in mathimatical operations and comparisons. In any case he is intitled to his opinion though there is in fact no logical reason the artificial key needs to be numeric. Since the key in this type of design is supposed to be meaningless any unique value should do. The problem is in how to efficiently generate guaranteed unique values other than numeric values.

            IMHO -- Mark D Powell --
            • 18. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
              Mark D Powell
              Pavan, the post was on the old Oracle support forums which as far as I can tell Oracle did not preserve when the newer Oracle Community site launced a few years ago. I find the idea of creating such a test and accessing the same 100K rows in the same order but using a numeric key in one test and a character key in the other several times and averaging the times interesting, but I am not sure where I would find the time.

              HTH -- Mark D Powell --
              • 19. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                Mark Malakanov (user11181920)
                Oracle stores the number data type in a form of scientific notation so library math routines have to be accessed in order to use the values in mathimatical operations and comparisons.
                Oracle stores the number data type packed 3 digits into 2 bytes. This way distinct values are somewhat (~30%) more compact compared to plain char string distinct values. Strings can be more compact if one would use non-printable ASCII values, like 1 or 2 or 255 etc. But I never seen this.

                For equality comparison math functions are not required because binary values can be compared without "decompression".
                • 20. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                  rp0428
                  >
                  Oracle stores the number data type packed 3 digits into 2 bytes. This way distinct values are somewhat (~30%) more compact compared to plain char string distinct values.
                  >
                  I've never heard of that. Please provide a citation to support that statement.

                  As Mark already said Oracle uses a form of scientific notation

                  See Internal Numeric Format in the Database Concepts doc
                  http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i16209
                  >
                  Internal Numeric Format

                  Oracle Database stores numeric data in variable-length format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle Database does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 102, with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.

                  Taking this into account, the column size in bytes for a particular numeric data value NUMBER(p), where p is the precision of a given value, can be calculated using the following formula:

                  ROUND((length(p)+s)/2))+1

                  where s equals zero if the number is positive, and s equals 1 if the number is negative.

                  Zero and positive and negative infinity (only generated on import from Oracle Database, Version 5) are stored using unique representations. Zero and negative infinity each require 1 byte; positive infinity requires 2 bytes.
                  • 21. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                    Mark Malakanov (user11181920)
                    You should read docs more careful.
                    ... and *2* bytes used to store the three significant digits of the mantissa ...
                    So, my statement does not contradicts docs.

                    this is what I meant by "compression".
                    Oracle's digit compression is neither storing 1 digit in 1 byte, like strings do, nor storing it like native binary numerics do, (byte, int, long, float, double etc...).
                    It packs 3 digits into 2 bytes.

                    I was not saying how exponent and negatives organized, it was irrelevant to the original topic, what was relevant that numeric distinct values are more dense then varchar ones. Usually.
                    Because 1234567890 as number stored in 6 bytes.
                    But '1234567890' as varchar2 stored in 10 bytes.
                    However, if one will use not only numeric but also alpha and non-printable characters for key string values it can be as dense as binary, and 1234567890 can be stored in 4 bytes.
                    • 22. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                      rp0428
                      >
                      You should read docs more careful.
                      >
                      No need - I understand the doc perfectly, having worked with that, and many other formats for over 30 years.

                      You need to reread the doc format description more carefully and apply it to a sample of data values. Then, perhaps, you will understand the difference between what you stated and what the doc is stating.
                      >
                      Oracle's digit compression is neither storing 1 digit in 1 byte, like strings do, nor storing it like native binary numerics do, (byte, int, long, float, double etc...).
                      It packs 3 digits into 2 bytes.
                      >
                      The second sentence is simply not true. You have taken the one example from the doc for a three digit number and extrapolated it to a general rule and that general rule is not true. Oracle's rule is not 'pack every 3 digits into 2 bytes' as your statement implies. If that were the rule then 6 digits would get packed into 4 bytes when in fact it only takes 3. And 21 digits would be packed into 14 bytes when it actually only takes 11.

                      Oracle removes the leading and trailing zeros from the number. Then the mantissa of a number, excluding the sign, is stored in BCD (binary coded decimal) format with 2 decimal digits per byte. The three digit example (412) used 2 bytes because there were an odd number of digits. A four digit number would also use 2 bytes.

                      The formula shows that the sign will take one byte for negative numbers but zero bytes for positive numbers. For any even number of digits you can divide by two to get the number of bytes for the mantissa (not including sign). For an odd number of digits add 1 and then divide by 2.
                      >
                      This way distinct values are somewhat (~30%) more compact compared to plain char string distinct values.
                      >
                      Again, not true and misleading at to what the actual rule and savings are. The savings for even numbers with no leading or trailing zeros will be near 50% due to the BCD encoding (near because there is 1 byte for the exponent). The savings for odd numbers slightly less since one nybble of the high-order byte will be unused.

                      For numbers with large numbers of trailing zeroes the savings is even greater because Oracle does not store the leading or trailing zeros. So ALL of these numbers take exactly the same amount of space to store
                      1
                      10
                      1000
                      10000000
                      100000000000000000000
                      Each of those is stored using one byte for the exponent and one byte for the mantissa. The exponent is different for each value.

                      And when you made this statement it's not clear if you are including the exponent or not.
                      >
                      1234567890 as number stored in 6 bytes.
                      >
                      Yes it is - the 10 decimal digits are stored in five bytes, not 6 and then there will one byte for the exponent. So what happened to your '3 digits into 2 byte' rule? That rule says the 10 decimal digits alone should take 7 bytes (2 each for 3 sets of 3 digits and one byte for the 10th digit).

                      So your statements 'misstate' the docs. You suggested a general rule that was wrong by extrapolating from a single example to the general case.

                      That is what I was pointing out.

                      Your statements about compression are correct but not for the reasons you stated.
                      • 23. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                        Mark Malakanov (user11181920)
                        Then the mantissa of a number, excluding the sign, is stored in BCD (binary coded decimal) format with 2 decimal digits per byte.
                        You are right! And to the point.
                        I am wrong. (ashamed)

                        And this way it is even better, because it is even more dense!
                        the sign will take one byte for negative numbers
                        I only do not understand why "the sign will take one byte for negative numbers".
                        Why to waste entire byte just for negative sigh? They could use a bit from exponent, why not? Do you have any explanation or idea?
                        • 24. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                          rp0428
                          >
                          You are right!
                          >
                          Not exactly! I said BCD but Oracle actually used base-100 which still means, as the other doc citation states, 'each byte can represent 2 decimal digits.'
                          >
                          I only do not understand why "the sign will take one byte for negative numbers".
                          Why to waste entire byte just for negative sigh? They could use a bit from exponent, why not? Do you have any explanation or idea?
                          >
                          See? You always have to keep digging and run some tests. The OCI doc says a bit from the exponent IS used but the other doc said a whole byte is used.
                          Now see what you make of this quote from the NUMBER section of the Call Interface Programmer's Guide
                          http://docs.oracle.com/cd/B28359_01/appdev.111/b28395/oci03typ.htm#i423684
                          >
                          Oracle stores values of the NUMBER datatype in a variable-length format. The first byte is the exponent and is followed by 1 to 20 mantissa bytes. The high-order bit of the exponent byte is the sign bit; it is set for positive numbers and it is cleared for negative numbers. The lower 7 bits represent the exponent, which is a base-100 digit with an offset of 65.

                          To calculate the decimal exponent, add 65 to the base-100 exponent and add another 128 if the number is positive. If the number is negative, you do the same, but subsequently the bits are inverted. For example, -5 has a base-100 exponent = 62 (0x3e). The decimal exponent is thus (~0x3e) -128 - 65 = 0xc1 -128 -65 = 193 -128 -65 = 0.

                          Each mantissa byte is a base-100 digit, in the range 1..100. For positive numbers, the digit has 1 added to it. So, the mantissa digit for the value 5 is 6. For negative numbers, instead of adding 1, the digit is subtracted from 101. So, the mantissa digit for the number -5 is 96 (101 - 5). Negative numbers have a byte containing 102 appended to the data bytes. However, negative numbers that have 20 mantissa bytes do not have the trailing 102 byte. Because the mantissa digits are stored in base 100, each byte can represent 2 decimal digits. The mantissa is normalized; leading zeroes are not stored.

                          Up to 20 data bytes can represent the mantissa. However, only 19 are guaranteed to be accurate. The 19 data bytes, each representing a base-100 digit, yield a maximum precision of 38 digits for an Oracle NUMBER.
                          >
                          Interesting content. The high order bit of the exponent is the sign bit. Each mantissa byte . . . For positive numbers, the digit has 1 added to it.

                          Tests show that it is a bit that is used for the sign but the sign also affects the actual number stored in the mantissa.
                          select 12, dump(12), -12, dump(-12) from dual
                          
                          12,DUMP(12),-12,DUMP(-12)
                          12,Typ=2 Len=2: 193,13,-12,Typ=2 Len=3: 62,89,102
                          Notice the '13' in the DUMP(12) output? The number was positive so '1' was added to it. And the high order bit of the exponent (193) is set.
                          • 25. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                            Mark Malakanov (user11181920)
                            Interesting content. The high order bit of the exponent is the sign bit. Each mantissa byte . . . For positive numbers, the digit has 1 added to it.
                            Getting a little bit clear.
                            Exponent has its own sign, it can be negative too for numbers <1. Or 0.1?
                            And mantissa has its sign too.
                            • 26. Re: Which is better to use Varchar or Interger for Primary key in Oracle DB
                              rp0428
                              >
                              Exponent has its own sign, it can be negative too for numbers <1. Or 0.1?
                              And mantissa has its sign too.
                              >
                              Nope! The high order bit of the exponent byte is actually the sign of the number itself! The sign of the exponent itself is taken into account when the exponent is encoded using base 100, possible inversion and the offset of 65.

                              The exponent will decode as a negative or positive number.
                              1 2 Previous Next