This discussion is archived
12 Replies Latest reply: Dec 4, 2012 4:53 PM by Vite DBA RSS

How to implement Many to Many relationships?

829559 Newbie
Currently Being Moderated
I've been google searching for articles on implementing many-to-many relationships and finding so many hits I'm overwhelmed.
Does anyone have a favorite article on this subject?

In my example, I have a many to many relationship between table "user" and table "project".
Why do some articles suggest that the primary key for the link table "UserProject" contain the two foreign keys for "user" and "project"? If we want to find all the projects for a given "user.id" we won't be specifying the "project.id". Won't this degrade into a linear search then?

I was thinking we should pick the foreign key "fk_user" as the primary key and create an index of the for "fk_project" (or vice versa). Then if we specify the "user.id" we can quickly find all the corresponding "project.id" values.

Are there any issues unique to oracle that I should know about?

Thanks
Siegfried
  • 1. Re: How to implement Many to Many relationships?
    Ashu_Neo Pro
    Currently Being Moderated
     If we want to find all the projects for a given "user.id" we won't be specifying the "project.id". Won't this degrade into a linear search then?
    Then it's not a many to many relationship, is it ? So Associate table(say userproj) comes into picture where there is many to many relationship and stores FK column values separately. When you are thinking like above, then you have to think both way, like Projects can have many users. So all links will tell the same concepts of data modeling.

    Thanks!
  • 2. Re: How to implement Many to Many relationships?
    jeneesh Guru
    Currently Being Moderated
    One good place to search is the AskTom site:

    You will get wrtings like This and This

    as suggested in these links, an associating table will be a good idea..
  • 3. Re: How to implement Many to Many relationships?
    Vite DBA Pro
    Currently Being Moderated
    Hi Siegried,
    user8816970 wrote:

    I was thinking we should pick the foreign key "fk_user" as the primary key and create an index of the for "fk_project" (or vice versa). Then if we specify the "user.id" we can quickly find all the corresponding "project.id" values.
    Think about it for a second. A many to many relationship implies a user can be involved with many projects and a project can involve many users. If you put a PK on the fk_user column then a user can only appear at most once in the table, meaning that a user can only be associated with one project at most. If you have a composite PK on both the user and the project, then a user can be associated with many projects and a project can be associated with many users, but a given user and a given project can not be associated more than once, thus fullfilling the many to many relationship.

    Apart from the index that is created for the PK constraint, any other indexing would only be done for performance requirements.

    Regards
    Andre
  • 4. Re: How to implement Many to Many relationships?
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    As Andre said - and consider making this table for this relationship, an index organised table and not a hash table.
  • 5. Re: How to implement Many to Many relationships?
    829559 Newbie
    Currently Being Moderated
    Andre wrote:
    Hi Siegried,
    user8816970 wrote:

    I was thinking we should pick the foreign key "fk_user" as the primary key and create an index of the for "fk_project" (or vice versa). Then if we specify the "user.id" we can quickly find all the corresponding "project.id" values.
    Think about it for a second. A many to many relationship implies a user can be involved with many projects and a project can involve many users. If you put a PK on the fk_user column then a user can only appear at most once in the table, meaning that a user can only be associated with one project at most. If you have a composite PK on both the user and the project, then a user can be associated with many projects and a project can be associated with many users, but a given user and a given project can not be associated more than once, thus fullfilling the many to many relationship.
    OK, I was thinking duplicates were allowed in primary keys when I suggested we make only one foreign key the primary key.

    I still don't understand how a composite primary key helps prevent a linear search. Let's suppose we have a composite primary key as you suggest. How can the oracle database use this primary key if we are only supplying the user.id because we want to find all projects associated with that user? We will only be supplying the first half of a composite key!

    Given a project.id, I want to be able to find all the users associated with that project. Again, I'll only be supplying part (the second half) of the composite key. How can oracle use the indexed structure of the composite primary key to avoid a linear search when it does not have a complete primary key?

    Thanks
    Siegfried

    Edited by: user8816970 on Dec 4, 2012 10:39 AM
  • 6. Re: How to implement Many to Many relationships?
    Justin Cave Oracle ACE
    Currently Being Moderated
    Assuming that the primary key constraint is defined on the (user_id, project_id) (it could just as easily be defined on the reverse in which case the analysis is reversed)

    If you supply only the first column of a composite key, user_id, Oracle can do a range scan on the index created to support the primary key to find all the project_id values for a particular user_id which will all be in a small part of the index. Within the index, all the leaf block entries for a particular user_id will be continuous.

    If you supply only the second column of a composite key, project_id, and assuming that user_id is reasonably selective so that we can ignore the possibility of an index skip scan, the primary key index will probably not be useful. That's why you would generally create a second index either just on project_id or a composite index on (project_id, user_id).

    Justin
  • 7. Re: How to implement Many to Many relationships?
    Vite DBA Pro
    Currently Being Moderated
    Hi siegfried,
    user8816970 wrote:

    OK, I was thinking duplicates were allowed in primary keys when I suggested we make only one foreign key the primary key.
    This is one the most important principles of relational design, the requirement that every row in a table is uniquely identifiable. I think you should do some basic study on Relational Database Design and ER modelling. Googling these will bring up a large number of hits for you to peruse. A good understanding of these will hopefully lead to a good database design, which I believe is the most important foundation to any database information system. Forget the web architecture, the type or flavour of database, the type of front end API, the hardware vendor. Get the database design right in the first place and all the rest comes together much easier.
    >
    I still don't understand how a composite primary key helps prevent a linear search. Let's suppose we have a composite primary key as you suggest. How can the oracle database use this primary key if we are only supplying the user.id because we want to find all projects associated with that user? We will only be supplying the first half of a composite key!
    In Oracle, a PK is implemented with the use of an index, usually unique, that matches the requirements of the PK. This means that an index will exist on your PK columns, and as Justin alludes to, whether that index will be useful to a query depends on the columns used as filters in the query, eg user, and the order of columns in the index. Oracle will then take this information as well as information on the characteristics of the table from the table statistics and come up with a plan that it considers the most efficient.
    >
    Given a project.id, I want to be able to find all the users associated with that project. Again, I'll only be supplying part (the second half) of the composite key. How can oracle use the indexed structure of the composite primary key to avoid a linear search when it does not have a complete primary key?
    Given a project_id, Oracle will decide how best to execute the query. If the table doesn't have a large number of rows, it may decide to do a full table scan (assume this is what you mean by linear search) regardless of any indexes. It does this for the simple reason that it may be more efficient to simply read the whole table and filter out the required rows, than to read the index and then do indexed lookups on the table, especially if the blocks associated with the table are likely to be in buffer memory. If there are a large number of rows and the project_id is in the leading position of the composite index, it may decide to use that index. If the project_id is not in the leading part of the composite index, then there is nothing stopping you from creating an index just for this column, which the database may choose to use.

    I hope this helps your understanding.

    Regards
    Andre
  • 8. Re: How to implement Many to Many relationships?
    829559 Newbie
    Currently Being Moderated
    OK, we are making progress! Thank you everyone!

    So it sounds like the primary reason for creating a composite primary key is to preserve uniqueness (and integrity). Clearly, every composite key needs to be unique and we want someone or something to give us an error we we try to create a duplicate.

    I assumed that the primary purpose of a primary key was temporal efficiency for select statements at the expense of spacial efficiency (disk space) and insert statement efficiency (updating the index takes time).

    Apparently this not entirely true in the case of a link table (is that the appropriate term?) for implementing a many-to-many relationship.

    What Andre is telling me is that using a primary composite key may improve temporal efficiency if we are specify the first sub-key and not the second sub-key. Correct? If we specify the second sub-key and not the first sub-key the composite key probably makes no contribution to the temporal efficiency of the select statement. Correct?

    We could maybe remedy this condition by adding an index on the second sub-key. Correct?

    So if we determine that composite primary keys do not contribute to the improved efficiency of a select statement because (see Andre's explanation), should we only use them for debugging and remove them for the production version of the database?

    What about a more general question: should referential integrity be turned off for production?

    Thanks
    Siegfried
  • 9. Re: How to implement Many to Many relationships?
    sb92075 Guru
    Currently Being Moderated
    UNIQUE INDEX assures that no duplicates will exist.
  • 10. Re: How to implement Many to Many relationships?
    KeithJamieson Expert
    Currently Being Moderated
    As you have gathered , the purpose of a Primary Key is to preserve uniqueness.
    If you have a composite primary key then if you provide columns in the predicate (The where clause) which are leading columns of the primary key, then they can help with an efficient search.

    If you have a requirement that you have aquery which always uses the 2nd and 3rd columns of the primary key but never the first column, then it may make sense to add a composite index on these two columns.


    Removing referential integrity from a production database is pretty much one of the worst things you can do.
    Think of the consequences: You will have child records which don't have a parent record or even worse have the incorrect parent record. It is better to not let a record get into the database than to allow incorrect data.

    Incorrect data is a nightmare to fix, as its very difficult to find the source and even worse, it may lie undetected for years.


    Your mantra should be to prevent bad data getting into the database. You do this by applying constraints at the database level.
    This means that any data that is entered has to conform to those rules.
  • 11. Re: How to implement Many to Many relationships?
    Justin Cave Oracle ACE
    Currently Being Moderated
    I would never* turn off referential integrity in a production environment. Code has bugs. Even well-tested code has bugs. I would much (much, much) rather get an occasional error informing me that a constraint was violated because I had a bug in my code rather than finding out after running for years that invalid data has been entered in the database and we have no reasonable way to clean it up. The overhead of enforcing relational integrity is miniscule in a moderately well-designed system.

    As both Andre and I suggested, it may well make sense to create a separate index on either project_id or (project_id, user_id) if you will frequently query the link table based only on the project_id.


    * Well, never is overly strong. If you have absolutely critical performance requirements and you have optimized everything that you can possibly optimize and you cannot hit the performance targets you need with the (very reasonable) overhead of relational integrity and you cannot solve the problem by throwing hardware at it (realizing that if you are in this situation then you're going to fail to meet the performance requirements very quickly if the load increases above what is expected) and you accept the fact that you will inevitably get invalid data in the database then, sure, it may be reasonable to disable referential integrity. Almost no one that disables referential integrity is actually in this situation, though.

    Justin
  • 12. Re: How to implement Many to Many relationships?
    Vite DBA Pro
    Currently Being Moderated
    Hi Siegfried,

    having previously worked for a large RDBMS/Applications/Software/OS/hardware company (who shall remain nameless) in a support role with a large component in database and application performance, I can confirm the majority of performance and integrity problems can be traced directly back to design and implementation decisions made previously.

    I can not stress enough the importance of getting the database design correct. By this I simply mean that the design is a true reflection of the business information requirements and that it is normalized at least to 3rd normal form. The next step is that it is implemented in your RDBMS using accepted principles. One of the more important of these principles in any commonly used RDBMS is the use of enforced relational integrity. There are instances where you will break these rules, but instances where you will need to denormalize or remove integrity are comparatively rare and require a reasonable level of expertise to recognize and do correctly. Also note that the words used are "denormalize" and "remove". This means that your first instinct should be normalize your database design and enforce integrity and only step back from this when you recognize the targeted situation.

    The problem with bad design and implementation decisions, they become more difficult to fix the further you are down the line.

    Regards
    Andre

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points