1 2 Previous Next 18 Replies Latest reply: Jun 10, 2013 8:12 AM by greybird Go to original post RSS
      • 15. Re: Use case for triggers, is there a better way?
        Andrius Juozapaitis
        Hey,

        I am totally aware of this possibility, and it is definitely one of the options. I also understand the unwillingness to complicate the codebase to address the requirements of some specific solutions such as mine.

        There are a few issues with such a straightforward approach:
        1. Elasticsearch is not transactional, so I would need to compensate for transaction rollbacks somehow, if I simply mirror the actions between JE and ES
        2. Code would be either duplicated for both datastores, or aspects would need to be used, which in this case unnecessarily complicates the architecture
        3. Event-driven (i.e., transaction trigger commit/abort ) approach is easier to implement, understand, and maintain
        4. ES interface is fully asynchronous, which would make the overhead reasonably small in case of 3.

        Building a scheduler-based synchronisation mechanism is also an option, but:
        1. ES has near-real-time indexing, which makes the inserted data available (almost) instantly. To compensate for it in the scheduler-based implementation, I would need to query 30+ entities reasonably often, which means additional load on the JE
        2. Separate thread would be required to handle the synchronization, which further complicates the architecture.

        I really like the DPL approach, since it very closely mirrors what I've been doing with CouchDB and it's Ektorp java interface lately, with the added benefit of supporting transactions and being able to embed it in the webapp directly.

        Another thought: if there were some kind of dependency injection mechanism, I would be able to just wire the modified components together, to do the additional work I need. But I don't think it's happening :)

        regards,
        Andrius
        • 16. Re: Use case for triggers, is there a better way?
          greybird
          There are a few issues with such a straightforward approach:
          1. Elasticsearch is not transactional, so I would need to compensate for transaction rollbacks somehow, if I simply mirror the actions between JE and ES
          You would have this problem with triggers as well. Triggers (the internal code you're using, at least) notify about individual write operations whether or not they're ultimately committed or aborted, and then notify about the commit/abort. So it is up to you to buffer the operations, and only publish them if the transaction commits.
          2. Code would be either duplicated for both datastores, or aspects would need to be used, which in this case unnecessarily complicates the architecture
          Not sure what you mean.
          3. Event-driven (i.e., transaction trigger commit/abort ) approach is easier to implement, understand, and maintain
          4. ES interface is fully asynchronous, which would make the overhead reasonably small in case of 3.
          Understood.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
          • 17. Re: Use case for triggers, is there a better way?
            Andrius Juozapaitis
            I gave some further thought about Elasticsearch/JE integration. The sync has two different stages:
            1. Initial sync- ES may be clean, or partially populated to some specific timestamp/checkpoint. Happens on startup.
            2. Continuous sync, which is running afterwards - preferably pushed from JE.

            Regarding the initial sync, I had a few ideas on how to implement this without complicating the application logic too much.
            1. Add a version field to entities - using a sequence per entity. Query the entities older than the latest recorded sequence using a cursor, and index them appropriately.
            -Version would be updated with each update. Is there a way to assign a sequence value to an arbitrary field (non-pk)? Another annotation perhaps?
            -How to handle entity deletion in such a case? One way would be to mark the entities as deleted in JE in the application logic, and properly delete them once they've been removed from index. Any other options?

            2. Is it reasonable to use the env.getStats(StatsConfig.DEFAULT).getLastCheckpointId() as a marker for an indexing iteration?
            -Would the checkpoint ID remain the same in case of master failover on a replica node?
            -Would it be possible to get a list of updated/deleted entities between two checkpoints somehow?


            any help would be appreciated,
            Andrius
            • 18. Re: Use case for triggers, is there a better way?
              greybird
              Regarding the initial sync, I had a few ideas on how to implement this without complicating the application logic too much.
              1. Add a version field to entities - using a sequence per entity. Query the entities older than the latest recorded sequence using a cursor, and index them appropriately.
              -Version would be updated with each update. Is there a way to assign a sequence value to an arbitrary field (non-pk)? Another annotation perhaps?
              -How to handle entity deletion in such a case? One way would be to mark the entities as deleted in JE in the application logic, and properly delete them once they've been removed from index. Any other options?

               

              DPL doesn't have an annotation for adding a sequence to a non-primary-key field.  You'd have to use the Sequence class.

               

              Using a secondary index to keep track of what is to be published will probably work but will be expensive (performance-wise).  Have you considered simply re-creating the text index after a crash?

               

              2. Is it reasonable to use the env.getStats(StatsConfig.DEFAULT).getLastCheckpointId() as a marker for an indexing iteration? 
              -Would the checkpoint ID remain the same in case of master failover on a replica node?
              -Would it be possible to get a list of updated/deleted entities between two checkpoints somehow?

               

              No, the checkpoint ID is not replicated.  No, there is no way to read the JE log to find out about changes.

               

              --mark

              1 2 Previous Next