5 Replies Latest reply: May 9, 2012 8:53 AM by robvarga RSS

    Data Affinity: Duplicating entities when associatedKey value changes?

    932905
      I'm trying to implement Data Affinity. I have a simple domain model of FootballClub and Player. A player is owned by a FootballClub. Players and Clubs are cached in separate NamedCaches and I'm using basic java serialisation.

      Affinity appears to be working - I'm logging the partition using a BackingMapListener and can see that a player goes to the same partition as the owning club. If I start a new data node, and the partition moves to the new node the Player and Club move together.

      The problem I have is that if I change the owner of an existing Player and put into my updated Player back into the Player cache, I don't update the existing Player entry in the cache - I get a brand new Player. Is this the expected behavior ?

      If this is not expected, does anyone have any suggestions as to where I might be going wrong? I've tried with a KeyAssociator and by implementing KeyAssociation on Player.Id.
      I've also tried excluding the clubName (the associatedKey), in the Player.Id.hashcode() and equals() methods, and making the clubName transient in that class.

      I'm logging MapEvents and can see that the toString() of the Binary version of the Player key changes slightly when the KeyAssociator.getAssociatedKey() value changes, but converting the key back to original format using BackingMapManagerContext.getKeyFromInternalConverter() always returns the same value.

      I'm new to coherence so very possible I've got something basic wrong.
        • 1. Re: Data Affinity: Duplicating entities when associatedKey value changes?
          robvarga
          user6871200 wrote:
          I'm trying to implement Data Affinity. I have a simple domain model of FootballClub and Player. A player is owned by a FootballClub. Players and Clubs are cached in separate NamedCaches and I'm using basic java serialisation.

          Affinity appears to be working - I'm logging the partition using a BackingMapListener and can see that a player goes to the same partition as the owning club. If I start a new data node, and the partition moves to the new node the Player and Club move together.

          The problem I have is that if I change the owner of an existing Player and put into my updated Player back into the Player cache, I don't update the existing Player entry in the cache - I get a brand new Player. Is this the expected behavior ?
          It is the expected behaviour: if you actually changed the key, then it is a different key, and you put the same player with a different key into the cache. Why would it affect the old entry?

          A key in the map is considered immutable and you violated that principle.

          The problem is that your modelling is faulty: owner either should not be part of the key, or should be immutable. In your case it should not be part of the key (but then you can't have affinity). Question is: why do you want to have affinity between owner and player, anyway?

          Best regards,

          Robert
          • 2. Re: Data Affinity: Duplicating entities when associatedKey value changes?
            932905
            Thanks Robert - sounds like I'm missing something fundamental.

            I wasnt aware that I was changing the key of my Player object when I added or changed the club associated with the Player.

            Are you saying that with Data Affinity the value returned by getAssociatedKey() is always part of the object's key? So changing this value changes the object's key? If that is the case then this makes sense but this wasn't clear to me from the documentation I've looked at. (I assumed that the associatedKey was just used to determine the partition that would be used to hold the child object but it wasn't part of the child key).

            >
            Question is: why do you want to have affinity between owner and player, anyway?
            >

            The FootballClub/Player domain is something simple I can play around with and try and understand Data Affinity. The reason I want the players associated with their club is so that if I want get hold of the Club and all it's players I won't need to go to jump around cluster nodes and aggregate results to get the player list . We think the kinds of benefits described in [the developer guide |http://docs.oracle.com/cd/E24290_01/coh.371/e22837/api_dataaffinity.htm] might help in our real application so I'm just trying to get something basic working first.
            • 3. Re: Data Affinity: Duplicating entities when associatedKey value changes?
              robvarga
              user6871200 wrote:
              Thanks Robert - sounds like I'm missing something fundamental.

              I wasnt aware that I was changing the key of my Player object when I added or changed the club associated with the Player.

              Are you saying that with Data Affinity the value returned by getAssociatedKey() is always part of the object's key? So changing this value changes the object's key? If that is the case then this makes sense but this wasn't clear to me from the documentation I've looked at. (I assumed that the associatedKey was just used to determine the partition that would be used to hold the child object but it wasn't part of the child key).
              No, what I am saying is that getAssociatedKey() is called on the key object, not on the value object. So if you changed that part of the key, that would be a new key. If you did not store the club key there, then you did not in fact have data affinity between the player and its owning club.

              >
              >
              Question is: why do you want to have affinity between owner and player, anyway?
              >

              The FootballClub/Player domain is something simple I can play around with and try and understand Data Affinity. The reason I want the players associated with their club is so that if I want get hold of the Club and all it's players I won't need to go to jump around cluster nodes and aggregate results to get the player list . We think the kinds of benefits described in [the developer guide |http://docs.oracle.com/cd/E24290_01/coh.371/e22837/api_dataaffinity.htm] might help in our real application so I'm just trying to get something basic working first.
              A player is not identified by its club. First of all, it may be between contracts and may not have a club. Second, the primary key for an entity (a player in this case) should not change, therefore anything which can change on an entity is not part of its primary key, it is a derived/associated information for that entity.

              So data affinity is not really applicable here. What you need here is an index of some sort which tells you what players are associated with a club.

              You can
              - store this index as part of the club and update it explicitly along with updating the player,
              - or you can have a Coherence index on the club information within the player and club-to-player navigation would be an indexed Coherence query
              - or you can materialize this index into an index cache (about the same as storing it within the club except you store it in a separate cache entry keyed by the same key as the club is keyed by but you would write some listener-triggered background processing to actually do this). This is not an out-of-the-box feature and usually only necessary for large data-sets.

              Best regards,

              Robert
              • 4. Re: Data Affinity: Duplicating entities when associatedKey value changes?
                MagnusE
                I think you are to quick to say that affinity don't apply - kind of depends if you have a club or player centric view. With a club centric view one could for instance have a "player/contract record" that represents the contract period of that player with the club (and would keep all information about the players performance during that period with the club including goals shoot and the start / end dates). In this case a player can be active with more than one club or even be without any clud without this causing any problems...

                Assuming that thew most common query sent to the system was to list current (and perhaps historical) players of the club an implementation using a player/contract object with affinity would make perfects sense (in my humble opinion). The key for the player/contract could then be the clubid + a contract id. If one want a normalized structure the player/contract object should not contain "generic" information about the player (when and where he is born etc) so if this type of information is needed it should be stored in a separate "player" object (that as you propose have no connection to any club).

                To use affinity for common requests can, if used right, result in a nice "sharding" effect that can substantiallt reduce the total CPU load of a system (by directing common queries to only one node rather than all and allowing invocables etc to handle a complete query using backing map access)!

                /Magnus
                • 5. Re: Data Affinity: Duplicating entities when associatedKey value changes?
                  robvarga
                  MagnusE wrote:
                  I think you are to quick to say that affinity don't apply - kind of depends if you have a club or player centric view. With a club centric view one could for instance have a "player/contract record" that represents the contract period of that player with the club (and would keep all information about the players performance during that period with the club including goals shoot and the start / end dates). In this case a player can be active with more than one club or even be without any clud without this causing any problems...

                  Assuming that thew most common query sent to the system was to list current (and perhaps historical) players of the club an implementation using a player/contract object with affinity would make perfects sense (in my humble opinion).
                  I totally agree with you, but even in that case the object affine to the club is the contract, not the player. The player would be a referred but not affine entity and the contract entity would only store a foreign key pointing to the player and data derived/associated to the contract itself.
                  The key for the player/contract could then be the clubid + a contract id. If one want a normalized structure the player/contract object should not contain "generic" information about the player (when and where he is born etc) so if this type of information is needed it should be stored in a separate "player" object (that as you propose have no connection to any club).
                  For this case, I would probably use the (club_id;player_id;start_date_of_contract) tuple as the key with club_id being the associated key, hopefully there won't be multiple contracts between the same club and player on the same date. Alternatively the entry key could be the (club_id;player_id) pair and entry value could be a list of contracts...

                  Best regards,

                  Robert