This discussion is archived
1 2 3 4 Previous Next 51 Replies Latest reply: May 11, 2013 11:02 AM by Debbie_1004338a Go to original post RSS
  • 15. Re: Is using Natural Keys bad?
    rp0428 Guru
    Currently Being Moderated
    >
    I thought I'd get some point for this design, which is sorta the best of both worlds...
    . . .
    Doesn't that follow what most of you are saying?
    . . .
    And what about my junction table...
    >
    No - it doesn't follow ANY of what people have been saying.

    Your junction/intersect table uses actual data values rather than surrogate values. And by doing so you are now duplicating those values. That will result in possible maintenance issues.

    These were your original questions:
    >
    I would like to know if it is "bad" to use Natural/Physical Keys as the Primary Key in a table?

    How does this affect performance?
    >
    Those questions were also answered by me and others. The thing that is 'bad' is that using natural keys invites the maintenance and security issues previously mentioned.

    And there is no performance difference between natural keys and surrogate keys. Oracle has NO KNOWLEDGE of whether a key is natural or a surrogate and so can't possibly make any decisions based on that that might affect performance.

    Here is what Tom Kyte, an Oracle ACE and noted author says about "Update Cascade" in his blog from over a decade ago: it still applies:
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5773459616034
    >
    There is not "on update cascade" automagically.

    There are ways to do it,

    o deferrable constraints. defer the foreign key check until commit, update the parent,
    update the child and then commit.

    o use the technique here:

    http://asktom.oracle.com/~tkyte/update_cascade/index.html

    Personally -- I've never found a need or use for update cascade. I'm opposed to it. If
    your design requires it -- change your design now if you can.

    Primary keys are supposed to be imutable, never changing, constant. It is an excessively
    bad practice to have to update them ever. If there is a 0.00001% chance you will have to
    update a primary key -- then it is not a primary key, its a surrogate key and you need to
    find the true primary key (even if you have to make it up via a sequence)
    >
    Note the first sentence of that last paragraph: 'Primary keys are supposeds to be immutable'.

    That is what we have been telling you.

    Your design does NOT follow best practices. There isn't really much else for us to say about it.

    I, for one, already suggested that you should examine the pros and cons of following best practice for your use case, document those pros and cons, and then document your decision. As a responsible professional you need to take the pros and cons into account but once you do that it is YOUR decision as to what is best for your use case.

    Your intersect table contents might well be easier for a human to read. But humans don't need to be able to read it. The DB can easily join the other lookup tables to get the 'text' value for the number.

    No one here is trying to tell you that you MUST use surrogate keys. That is for YOU to decide based on your used case. All we want to try to do is to ensure that you make an 'informed' decision after weighing the relative benefits.
  • 16. Re: Is using Natural Keys bad?
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    1004338 wrote:
    I thought I'd get some point for this design, which is sorta the best of both worlds...

    SECTION table
    - id (PK)
    - slug (UK)
    - name (UK)
    - created_on
    - updated_on
    Surrogate keys are essentially a bad idea, but an idea for which it's easy to make a reasonable sounding case.
    You've added an id (which means an extra column and an extra index) to two tables - so from a mechanical point of view you need to have a good concrete reason to justify wasting those resources.

    >
    Doesn't that follow what most of you are saying?


    And what about my junction table...

    SECTION_DIMENSION junction
    - id (PK)
    - section_slug (UK1) *(FK1)*
    - dimension_slug (UK2) *(FK2)*
    - created_on
    - updated_on
    Another redundant id and index - when the two candidate keys are (section_slug, dimenstion_slug) and (dimension_slug, section_slug). It's probably a table that should be an index-organized table - using the PK column order that you're likely to traverse most frequently, and a secondary index on the opposite order.


    >
    (BTW, in MySQL, there is the concept of CASCADE ON UPDATE which would make it easy to change a PK and the corresponding FK's - less issues like backups mentioned earlier.)
    Oracle doesn't have "on update cascade" unfortunately The problem of changing primary key values (which is not contrary to relational theory, despite an earlier comment) may be a nuisance, but Oracle allows constraints to be declared as deferrable, so the typical strategy if you expect some change is to declare the foreign key as deferrable then following the sequence:
    set constraint deferred
    modify child end to new value
    modify parent end to new value
    set constraint immediate 
    Regards
    Jonathan Lewis
  • 17. Re: Is using Natural Keys bad?
    rp0428 Guru
    Currently Being Moderated
    >
    Surrogate keys are essentially a bad idea, but an idea for which it's easy to make a reasonable sounding case.
    >
    Say what?

    I think you owe us an explanation for that one. I have NEVER heard that sentiment expressed by anyone else.

    There have been occasions where multiple columns are needed to deternine uniqueness and the question arises as to whether to add an additional ID (sequence) column that could be used as a foreign key rather than having to use multiple columns as FKs.

    I, for one, must be misunderstanding what you mean by 'Surrogate keys are essentially a bad idea'.
  • 18. Re: Is using Natural Keys bad?
    Debbie_1004338a Newbie
    Currently Being Moderated
    rp0428 wrote:
    No one here is trying to tell you that you MUST use surrogate keys. That is for YOU to decide based on your used case. All we want to try to do is to ensure that you make an 'informed' decision after weighing the relative benefits.
    Sorry, but part of my problem here might be some confusion about what different terms mean?!

    When I say "Natural Key" or "Physical Key", I mean something occurs naturally (e.g. SSN) or physically (e.g. serial #). The advantage of using one of these is that it is human-readable and understandable (e.g. "George Washington" or "polo-m-sm-blu-007").

    And when I say "Logical Key" or "Derived Key", I mean something generated by the database (e.g. AutoNumber in MySQL, Sequence?? in Oracle) The advantage of using one of these is that it is guaranteed to be unique.

    And a "Secondary Key" is another field that uniquely identifies a record, but which might need to be changed later, and thus is not chosen to be the "Primary Key". (e.g. Tele # 202-555-1212)

    Obviously I know what a "Primary Key" and a "Foreign Key" are! :)

    However, I'm not sure I understand what a "Surrogate Key" is....

    Sincerely,


    Debbie

    Edited by: 1004338 on May 6, 2013 2:50 PM
  • 19. Re: Is using Natural Keys bad?
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    rp0428 wrote:
    >
    Surrogate keys are essentially a bad idea, but an idea for which it's easy to make a reasonable sounding case.
    >
    Say what?

    I think you owe us an explanation for that one. I have NEVER heard that sentiment expressed by anyone else.
    For which bit - the "bad idea" or the "easy to make a reasonable sounding case" ?
    For the former I'd suggest reading almost anything by Chris Date, for the latter I think you only have to read back up this thread.



    Regards
    Jonathan Lewis
  • 20. Re: Is using Natural Keys bad?
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    1004338 wrote:
    (And, obviously I know what a "Primary Key" and a "Foreign Key" are!)

    I'm not sure I understand what a "Surrogate Key" is....
    Also known as a synthetic key, or meaningless key - I think you'd probably call it a derived key, it's typically generated in Oracle through an Oracle Sequence.

    Regards
    Jonathan Lewis
  • 21. Re: Is using Natural Keys bad?
    Justin_Mungal Journeyer
    Currently Being Moderated
    Jonathan Lewis wrote:
    For the former I'd suggest reading almost anything by Chris Date, for the latter I think you only have to read back up this thread.
    Heh heh... you must really dislike us.
  • 22. Re: Is using Natural Keys bad?
    rp0428 Guru
    Currently Being Moderated
    >
    For which bit - the "bad idea" or the "easy to make a reasonable sounding case" ?
    >
    For both.
    >
    For the former I'd suggest reading almost anything by Chris Date, for the latter I think you only have to read back up this thread.
    >
    Nice try but I ask YOU to explain your comment that 'Surrogate keys are essentially a bad idea'.

    Trying to throw it back on me by suggesting that I read one or more unspecified documents doesn't explain your comment.

    You made the comment. I'm asking you to explain what you meant, not what someone else might, or might not have said. Most of what I have read by Chris Date and Joe Celko is the opposite of what you just said. That Joe Celko likes 'intelligent' keys and that Chris Date likes 'surrogate'keys.

    I don't consider the use of surrogate keys to be a 'bad idea' at all. And the two 'reasonablyk sounding cases' I mentioned are very much 'to the point'.

    A sensitive data item such as social security number (for the U.S.) should NEVER be used as a primary key value or for referential integrity. The reasons include the maintenance issues related to addressing input data errors (an incorrect value was entered and needs to be corrected), the issue of archival copies of the data containing different KEY values (which affects restore capabilities) and the security issues involved with duplicating sensitive information and the need to protect that data.
  • 23. Re: Is using Natural Keys bad?
    jgarry Guru
    Currently Being Moderated
    1004338 wrote:
    rp0428 wrote:
    No one here is trying to tell you that you MUST use surrogate keys. That is for YOU to decide based on your used case. All we want to try to do is to ensure that you make an 'informed' decision after weighing the relative benefits.
    Sorry, but part of my problem here might be some confusion about what different terms mean?!

    When I say "Natural Key" or "Physical Key", I mean something occurs naturally (e.g. SSN) or physically (e.g. serial #). The advantage of using one of these is that it is human-readable and understandable (e.g. "George Washington" or "polo-m-sm-blu-007").
    The problem is the bad mapping between such keys and the pure relational model. That is, the real world is messy - some religions don't allow SSN, serial numbers aren't supposed to be reused but are, companies buy and sell each other and so on. Sometimes with compound keys you get things like the different divisions making division plus customer number unique, then the divisions are merged.

    Personally I prefer the natural key plus app code to fix the exceptions, but that's because I tend to work with ERP systems that work that way, rather than stuff I design from scratch, even though a lot of what I do is adding to the ERP from scratch. And if you know your data well enough, sometimes you can spot problems with simple visual inspections. I can't tell you how many times I've spotted problems with a simple count(*) and group by (and check out non-unique primary keys: http://richardfoote.wordpress.com/2008/06/04/primary-keys-and-non-unique-indexes-whats-really-happening/ ).

    >
    And when I say "Logical Key" or "Derived Key", I mean something generated by the database (e.g. AutoNumber in MySQL, Sequence?? in Oracle) The advantage of using one of these is that it is guaranteed to be unique.
    And yet, even those have limitations. There was even a problem with SCN's (real fundamental Oracle internals) maxing out under obscure circumstances. Oops!

    >
    And a "Secondary Key" is another field that uniquely identifies a record, but which might need to be changed later, and thus is not chosen to be the "Primary Key". (e.g. Tele # 202-555-1212)

    Obviously I know what a "Primary Key" and a "Foreign Key" are! :)

    However, I'm not sure I understand what a "Surrogate Key" is....
    From wikipedia: A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.

    >
    Sincerely,


    Debbie

    Edited by: 1004338 on May 6, 2013 2:50 PM
  • 24. Re: Is using Natural Keys bad?
    Debbie_1004338a Newbie
    Currently Being Moderated
    rp0428 wrote:

    Your junction/intersect table uses actual data values rather than surrogate values. And by doing so you are now duplicating those values. That will result in possible maintenance issues.
    In fairness, Foreign Keys always duplicate Primary Keys. So whether a PK/FK pair is a Derived Value or a Natural Value shouldn't matter on that point.

    I will give you that a Natural Key may change, whereas a Derived Key should never change.

    Apparently it is more work in Oracle than in MySQL, but in MySQL, simply by using "CASCADE ON UPDATE" you could change a Natural Primary Key all you want and the changes would instantly propagate to the Foreign Key. (Assuming you don't have any locking issues.)

    Now, I'm not saying that is desirable, though.

    These were your original questions:
    >
    I would like to know if it is "bad" to use Natural/Physical Keys as the Primary Key in a table?

    How does this affect performance?
    >
    Those questions were also answered by me and others. The thing that is 'bad' is that using natural keys invites the maintenance and security issues previously mentioned.
    Okay.

    And there is no performance difference between natural keys and surrogate keys. Oracle has NO KNOWLEDGE of whether a key is natural or a surrogate and so can't possibly make any decisions based on that that might affect performance.
    Except that an AutoIncrement/Sequence is an Integer, and it takes less space than a VARCHAR().

    Here is what Tom Kyte, an Oracle ACE and noted author says about "Update Cascade" in his blog from over a decade ago: it still applies:
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5773459616034
    There is not "on update cascade" automagically.

    There are ways to do it,

    o deferrable constraints. defer the foreign key check until commit, update the parent,
    update the child and then commit.

    o use the technique here:

    http://asktom.oracle.com/~tkyte/update_cascade/index.html

    Personally -- I've never found a need or use for update cascade. I'm opposed to it. If
    your design requires it -- change your design now if you can.
    Fair enough.

    Primary keys are supposed to be immutable, never changing, constant. It is an excessively
    bad practice to have to update them ever. If there is a 0.00001% chance you will have to
    update a primary key -- then it is not a primary key, its a surrogate key and you need to
    find the true primary key (even if you have to make it up via a sequence)
    You lost me there.

    I thought a "Surrogate Key" and a "Derived Key" were the same? If so, a Surrogate Key being used as a Primary Key should never change...

    >
    Note the first sentence of that last paragraph: 'Primary keys are supposeds to be immutable'.

    That is what we have been telling you.

    Your design does NOT follow best practices. There isn't really much else for us to say about it.
    I would like to clarify some things...

    In my design above, I chose a "Derived Key" (i.e. "id") for the PK on all tables. That should be okay.

    I then added a "Unique Key (Constraint)" to the Section.Slug and Dimension.Slug fields. That should be okay.

    In my junction table, I then used the Section.Slug and Dimension.Slug fields - still with a "Unique Key (Constraint)" - as the "Foreign Keys". I can see where you don't like that, since those two fields - while unique - are not technically the PK's for each respective Table. Fair enough.

    But to be clear, if I dropped Section.id and Dimension.id and instead made Section.slug and Dimension.slug the PK's, and then I dropped Section_Dimension.id and made Section_Dimension.section_id and Section_Dimension.dimension_id my Foreign Keys, then that should be okay, right? (Assuming I felt the new PK's were "immutable", as you say.)

    Your intersect table contents might well be easier for a human to read. But humans don't need to be able to read it. The DB can easily join the other lookup tables to get the 'text' value for the number.
    I suppose that is true, but in another thread I hope to start, that may not necessarily be the case. (More on that later.)

    No one here is trying to tell you that you MUST use surrogate keys. That is for YOU to decide based on your used case. All we want to try to do is to ensure that you make an 'informed' decision after weighing the relative benefits.
    That's why I came here!

    Thanks,


    Debbie
  • 25. Re: Is using Natural Keys bad?
    Debbie_1004338a Newbie
    Currently Being Moderated
    Jonathan Lewis wrote:
    And what about my junction table...

    SECTION_DIMENSION junction
    - id (PK)
    - section_slug (UK1) *(FK1)*
    - dimension_slug (UK2) *(FK2)*
    - created_on
    - updated_on
    Another redundant id and index - when the two candidate keys are (section_slug, dimenstion_slug) and (dimension_slug, section_slug). It's probably a table that should be an index-organized table - using the PK column order that you're likely to traverse most frequently, and a secondary index on the opposite order.
    So, do I hear you saying that on a "Junction Table", there is no need for a Derived Primary Key, and that you can just make the Primary Key from the two Foreign Keys?


    Debbie
  • 26. Re: Is using Natural Keys bad?
    rp0428 Guru
    Currently Being Moderated
    >
    In fairness, Foreign Keys always duplicate Primary Keys. So whether a PK/FK pair is a Derived Value or a Natural Value shouldn't matter on that point.
    >
    Correct - the difference is that a surrogate is a meaningless value and doesn't ever need to be changed because the value is wrong. Surrogate values also do not have security issues. It isn't the 'duplication' that is an issue; it is the maintenance and security aspects when you use 'natural' values.
    >
    Except that an AutoIncrement/Sequence is an Integer, and it takes less space than a VARCHAR().
    >
    You always need to be careful when you make a blanket statement like that or your credibility can get called into question. I think you would agree that it depends on the length of the VARCHAR and the number of significant digits of the Integer.

    Certainly an integer 'might' take less space to store a given number of key values and that less space 'might' result in more keys being stored in an index block and 'might' result in fewer blocks needing to be read. But there is also a well-known 'hot block' issue using sequences that results in wasted space in index blocks and index contention when multiple users are performing inserts at the same time.

    'It depends' is pretty accurate for many things Oracle.
    >
    Primary keys are supposed to be immutable, never changing, constant. It is an excessively 
    bad practice to have to update them ever. If there is a 0.00001% chance you will have to 
    update a primary key -- then it is not a primary key, its a surrogate key and you need to 
    find the true primary key (even if you have to make it up via a sequence) 
    You lost me there.
    >
    The quote is saying what we have already said: maintenance issues. Don't use a value for a primary key unless it is immutable, never changing, constant.

    Tom Kyte is a well-known and well-respected expert. So is Jonathan Lewis. They seem to have opposing views about the use of surrogate keys versus natural keys. That disagreement is fairly widespread and there are many experts on each side of the issue.

    Here is a link to an Oracle Scratchpad article by Jonathan Lewis about 'Meaningless Keys' where he provides an example of the additional joins that might be needed if surrogate keys are used.

    http://jonathanlewis.wordpress.com/2006/12/29/meaningless-keys/
    >
    Oracle Scratchpad

    December 29, 2006
    Meaningless Keys
    Filed under: Infrastructure,Performance — Jonathan Lewis @ 6:07 pm UTC Dec 29,2006

    I have been known to criticize “meaningless key” syndrome from time to time. I’m not going to go through the arguments here, but I’d like to tell you about a lovely example of how badly things can go wrong when you follow that route.
    >
    Just use you favorite search engine on 'joe Celko Chris Date surrogate natural' and you will find plenty of reading material and the ongoing 'war of words' on the issue.

    'It depends' - again, that is why there is no 'right' or 'wrong' way in general. 'It depends' on your particular use case. You should not invite maintenance and security issues if you don't need to. Your 'easy to read' and 'easier to query without joins' need may override those considerations for your use case.
    >
    No one here is trying to tell you that you MUST use surrogate keys. That is for YOU to decide based on your used case. All we want to try to do is to ensure that you make an 'informed' decision after weighing the relative benefits.
    That's why I came here!
    >
    What is why you came here? To be told what you should do?

    Sorry to disillusion you but there are many times when that just isn't possible. There are cases where things are black and white and other cases where they are not.

    All I can do is reiterate something I said earlier. It is YOUR responsibility as a professional to gather and analyze your options, rank them based on your own requirements and resources and then select the one that YOU deem best. There are some things that experienced professionals will disagree on because there are ALWAYS trade-offs to be made no matter which solution you choose.
  • 27. Re: Is using Natural Keys bad?
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    rp0428 wrote:

    Nice try but I ask YOU to explain your comment that 'Surrogate keys are essentially a bad idea'.
    Tend to agree with Jonathan.

    If a business entity is uniquely identified via a surrogate key and not its natural key, how do you prevent duplication of that business entity? Simple example. Social security number, or ID number as we call it down here - if a surrogate key is used, how does the data model provide consistency and prevent the invalid (and likely fraudulent) reuse of an ID number for another entity that is considered unique simply because of its surrogate key value? (@all: and please do not show ignorance by saying that a unique index should be used)

    I tend not to use surrogate keys as it is an inherently problematic approach to data consistency and integrity.

    Edited by: Billy Verreynne on May 7, 2013 8:57 AM
  • 28. Re: Is using Natural Keys bad?
    Justin_Mungal Journeyer
    Currently Being Moderated
    Marcus Rangel wrote:

    - Updates do not occur in the fields of the primary key, so there is less chance of problematic locks. Incidentally, updates on the primary key fields violate the relational model and should not happen anyway.
    If you have any sense at all your primary keys will be immutable, but I don't believe it's an actual violation of the relational model if they aren't.
  • 29. Re: Is using Natural Keys bad?
    Debbie_1004338a Newbie
    Currently Being Moderated
    rp0428 wrote:
    >
    Except that an AutoIncrement/Sequence is an Integer, and it takes less space than a VARCHAR().
    >
    You always need to be careful when you make a blanket statement like that or your credibility can get called into question. I think you would agree that it depends on the length of the VARCHAR and the number of significant digits of the Integer.
    I guess so.

    If there is a 0.00001% chance you will have to
    update a primary key -- then it is not a primary key, its a surrogate key and you need to
    find the true primary key (even if you have to make it up via a sequence)
    Sorry, but that makes no sense.

    If you have to update a Primary Key, that implies it is a "Natural Key".

    You choose a "Surrogate Key" for the Primary Key because you do NOT have to change it.

    Anyways, I get the whole "Natural" vs "Surrogate/Derived" thing.

    Tom Kyte is a well-known and well-respected expert. So is Jonathan Lewis. They seem to have opposing views about the use of surrogate keys versus natural keys. That disagreement is fairly widespread and there are many experts on each side of the issue.
    Okay.

    (Wow, I get to talk to a famous person in my thread?!) =)

    >
    No one here is trying to tell you that you MUST use surrogate keys. That is for YOU to decide based on your used case. All we want to try to do is to ensure that you make an 'informed' decision after weighing the relative benefits.
    That's why I came here!
    >
    What is why you came here? To be told what you should do?
    No, to speak with people with many more years experience than me, so I can get "Expert Opinions", and then form my own conclusions....

    Sorry to disillusion you but there are many times when that just isn't possible. There are cases where things are black and white and other cases where they are not.

    All I can do is reiterate something I said earlier. It is YOUR responsibility as a professional to gather and analyze your options, rank them based on your own requirements and resources and then select the one that YOU deem best. There are some things that experienced professionals will disagree on because there are ALWAYS trade-offs to be made no matter which solution you choose.
    Oh trust me, I never take things at face-value.

    Rather, I ask enough people that - combined with my own view of the world - I have enough "data" to make an informed decision.

    Usually, people get made at me for the opposite reason...

    Because I ask Person #1 for an opinion, and then quickly go to someone else. (It's not that I don't value Person #1's opinion, but I tend to look for trends and consensus - or lack there of - to make the most informed decisions!)

    Sincerely,


    Debbie

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points