13 Replies Latest reply: Apr 5, 2010 1:15 PM by jschellSomeoneStoleMyAlias RSS

    Server design (multithreading, serialization, and performance)

    843853
      (I'm not asking for anyone to design my software for me, I'm just looking for a response along the lines of "That's called XYZ server design, look for books on this topic" sort of thing.)

      Summary:

      I have a Server application (S) that accepts connections from many Clients (C). The clients request pieces of a large internal data structure, *"Data"* (D). The clients are totally passive with respect to Data, they only read (from the Server) and do not initiate any modification.

      D is really a structure of structures: it's hashtables of hashtables, with objects that hold other hashtables and vectors, etc. etc., down a few levels. The clients don't read the entire structure, just parts of it.

      The Server is multi-threaded, with threads handling client communications, and a very important thread that modifies Data by processing messages from an external source. I call this part of the software the Message Processor (P). These messages are what drives manipulation of the data structure.



      [**CLICK HERE FOR A DIAGRAM**|http://imgur.com/sb5ZU]



      There are a couple of design questions I'm wondering about:

      The data structure D is a shared resource between the Client threads and the Message Processor thread within the Server, with the Client threads only reading from the data structure (and writing over TCP/IP), and the Message Processor both reading and modifying it.

      Right now I am using locks to lock the structure when a client requests data, so that the processor cannot modify the data while it is being serialized.
      I also lock the data structure when a message is received and the structure has to be modified by P, to prevent the structure from being serialized while it is being modified.


      My question is, is this the only design pattern I can use in this situation? It looks like the only way to improve performance is to
      a) make sure I only lock when necessary (to prevent data corruption or inconsistency)
      b) lock the data for as short a time as possible
      c) make sure the parts of the data structure being sent to clients are serialized as fast as possible (write my own writeObject/readObject methods)

      Any insight is appreciated, the shorter and more candid, the better. Don't be afraid to say I'm in over my head and should read a few books by author so-and-so, that's a good starting point :)
        • 1. Re: Server design (multithreading, serialization, and performance)
          800387
          jta23 wrote:
          (I'm not asking for anyone to design my software for me, I'm just looking for a response along the lines of "That's called XYZ server design, look for books on this topic" sort of thing.)

          Summary:

          I have a Server application (S) that accepts connections from many Clients (C). The clients request pieces of a large internal data structure, *"Data"* (D). The clients are totally passive with respect to Data, they only read (from the Server) and do not initiate any modification.
          Are you using Servlets? That should facilitate the development of your server immensely (e.g., maintains sessions, handles multi-threading, implements HTTP out of the box, dozens of additional frameworks available, etc.)
          D is really a structure of structures: it's hashtables of hashtables, with objects that hold other hashtables and vectors, etc. etc., down a few levels. The clients don't read the entire structure, just parts of it.
          You can get away with using Map of Map or Map of List or whatever level of nesting you want. Generally, however, it is better to implement a canonical and rich domain model. [http://www.eaipatterns.com/CanonicalDataModel.html]. [http://www.substanceofcode.com/2007/01/17/from-anemic-to-rich-domain-model/].
          The Server is multi-threaded, with threads handling client communications, and a very important thread that modifies Data by processing messages from an external source. I call this part of the software the Message Processor (P). These messages are what drives manipulation of the data structure.



          [**CLICK HERE FOR A DIAGRAM**|http://imgur.com/sb5ZU]



          There are a couple of design questions I'm wondering about:

          The data structure D is a shared resource between the Client threads and the Message Processor thread within the Server, with the Client threads only reading from the data structure (and writing over TCP/IP), and the Message Processor both reading and modifying it.

          Right now I am using locks to lock the structure when a client requests data, so that the processor cannot modify the data while it is being serialized.
          I also lock the data structure when a message is received and the structure has to be modified by P, to prevent the structure from being serialized while it is being modified.


          My question is, is this the only design pattern I can use in this situation? It looks like the only way to improve performance is to
          a) make sure I only lock when necessary (to prevent data corruption or inconsistency)
          This can easily be handled by a Servlet. I think the best way to do this would be to create a Singleton. [www.javacoffeebreak.com/articles/designpatterns/index.html]. Be careful, however. Singletons are like global variables. They can easily be abused. If you did not want a singleton, create a lock table in your database. The RDBMS will handle synchronization for you, and it is an elegant solution. You can perform a similar feat using a filesystem lock that you create. Up to you. Whether in the JVM, in the database or in the filesystem.
          b) lock the data for as short a time as possible
          Write an efficient method to insert or update or delete the data. If you are dealing with a large amount of data, consider using a native tool like Oracle's SqlLoader or using vendor-specific JDBC syntax. If you need to support multiple types of databases, use bulk JDBC operations.
          c) make sure the parts of the data structure being sent to clients are serialized as fast as possible (write my own writeObject/readObject methods)
          Take a look at JBoss serialization. It is much more compact that Java's. Or do some experimenting. JSON is much more compact than XML normally, and it can be read by a Javascript client to facilitate any Ajax you might want to use for some flash and sizzle.
          Any insight is appreciated, the shorter and more candid, the better. Don't be afraid to say I'm in over my head and should read a few books by author so-and-so, that's a good starting point :)
          No, take it in bite sized pieces. Start with the server. Then work on the client. Play around with your locking strategy. Optimize your update of the data. Don't do everything at once.

          - Saish
          • 2. Re: Server design (multithreading, serialization, and performance)
            jschellSomeoneStoleMyAlias
            The clients don't read the entire structure, just parts of it.
            Then that is all you should give them.
            • 3. Re: Server design (multithreading, serialization, and performance)
              843853
              I feel I should add that the input msg rate can be pretty high, say several thousand per second.

              Also, the data structure needs to be entirely in memory--I'm using a DB for archiving the data but for performance reasons, it's all in RAM.

              Thanks for the pointers, Saish.
              • 4. Re: Server design (multithreading, serialization, and performance)
                791266
                jta23 wrote:
                I feel I should add that the input msg rate can be pretty high, say several thousand per second.
                Do all changes get propagated to all clients? What changes do you propagate? How often do you do that, and why?

                Is it important that the whole structure is in synch at all times, or is it ok that you don't update the whole tree in a single "transaction"?
                • 5. Re: Server design (multithreading, serialization, and performance)
                  843853
                  kajbj wrote:
                  Do all changes get propagated to all clients? What changes do you propagate? How often do you do that, and why?
                  Updates are sent to clients as frequently as every second (it's user configurable), so thousands of changes to the data doesn't mean thousands of changes sent to the client.
                  Is it important that the whole structure is in synch at all times, or is it ok that you don't update the whole tree in a single "transaction"?
                  Good point, that's something I've been thinking about...instead of locking the data structure thousands of times a second for those thousand received messages, what if i queued up all the changes that need to be made and did a batch update once a second?

                  Since every data update requires certain overall/total values to be updated, if i did batch updates, those values would be stale for about a second, but that's the maximum frequency of client updates, so it might be ok.

                  There's a big difference between showing the user an inconsistent dataset, and a dataset that has values that are calculated using other values that are 1 or 2 seconds old. As long as they are aware of these details, the latter might be ok.
                  • 6. Re: Server design (multithreading, serialization, and performance)
                    791266
                    jta23 wrote:
                    kajbj wrote:
                    Do all changes get propagated to all clients? What changes do you propagate? How often do you do that, and why?
                    Updates are sent to clients as frequently as every second (it's user configurable), so thousands of changes to the data doesn't mean thousands of changes sent to the client.
                    Is it important that the whole structure is in synch at all times, or is it ok that you don't update the whole tree in a single "transaction"?
                    Good point, that's something I've been thinking about...instead of locking the data structure thousands of times a second for those thousand received messages, what if i queued up all the changes that need to be made and did a batch update once a second?

                    Since every data update requires certain overall/total values to be updated, if i did batch updates, those values would be stale for about a second, but that's the maximum frequency of client updates, so it might be ok.

                    There's a big difference between showing the user an inconsistent dataset, and a dataset that has values that are calculated using other values that are 1 or 2 seconds old. As long as they are aware of these details, the latter might be ok.
                    Hmm.. How large is the dataset? One thing that you can do is to have two versions of it. One that you are updating, and one that the clients are reading from. You create a new copy of the three each second, and the thread that is performing the updates are also creating the new copy, and "publishes" the new copy. You won't need any locks in that case.
                    • 7. Re: Server design (multithreading, serialization, and performance)
                      843853
                      kajbj wrote:
                      Hmm.. How large is the dataset? One thing that you can do is to have two versions of it. One that you are updating, and one that the clients are reading from.
                      2x or 3x the dataset should fit in memory (I can make it a requirement!)

                      I wondered about two copies but...
                      You create a new copy of the three each second, and the thread that is performing the updates are also creating the new copy, and "publishes" the new copy. You won't need any locks in that case.
                      It's the "publishing" of the new copy I'm not sure about. Since it's all maps of maps I guess you can replace the stale maps but you still would need to do this in a batch so you don't have half new, half old, which means you still don't want the client to receive the data while it's being updated, even though it's a copy you are updating/sending.
                      • 8. Re: Server design (multithreading, serialization, and performance)
                        791266
                        jta23 wrote:
                        kajbj wrote:
                        Hmm.. How large is the dataset? One thing that you can do is to have two versions of it. One that you are updating, and one that the clients are reading from.
                        2x or 3x the dataset should fit in memory (I can make it a requirement!)

                        I wondered about two copies but...
                        You create a new copy of the three each second, and the thread that is performing the updates are also creating the new copy, and "publishes" the new copy. You won't need any locks in that case.
                        It's the "publishing" of the new copy I'm not sure about. Since it's all maps of maps I guess you can replace the stale maps but you still would need to do this in a batch so you don't have half new, half old, which means you still don't want the client to receive the data while it's being updated, even though it's a copy you are updating/sending.
                        Your processing thread is performing all updates, and it's also responsible for creating a deep copy of the datastructure, when a new copy is needed.

                        This is what you have:

                        1) A tree that you are updating, only the processing thread can see this structure.
                        2) Another structure that clients are currently reading from. That structure is read-only. No one will ever write to it.

                        A seconds has elapsed, and this happens:

                        The processing thread creates the deep copy of the structure, and replaces read-only tree with the new structure. Clients that are still traversing the old copy will be able to complete the traversing since they are still referencing it, and they will see the new structure the next time they start to traverse from the root. New clients will see the new copy directly.
                        • 9. Re: Server design (multithreading, serialization, and performance)
                          791266
                          Btw. Implementing something like that is very easy
                          • 10. Re: Server design (multithreading, serialization, and performance)
                            jschellSomeoneStoleMyAlias
                            kajbj wrote:
                            The processing thread creates the deep copy of the structure, and replaces read-only tree with the new structure. Clients that are still traversing the old copy will be able to complete the traversing since they are still referencing it, and they will see the new structure the next time they start to traverse from the root. New clients will see the new copy directly.
                            And it can be implemented without locks as well.
                            • 11. Re: Server design (multithreading, serialization, and performance)
                              843853
                              What if the tree is between 100MB to 1GB in memory size?
                              • 12. Re: Server design (multithreading, serialization, and performance)
                                jschellSomeoneStoleMyAlias
                                jta23 wrote:
                                What if the tree is between 100MB to 1GB in memory size?
                                Presuming
                                1- That is the size in java structures and not the absolute size.
                                2.-It isn't going to grow or at least has a very slow growth rate (say 1k a month would be good.)

                                Then no problem.
                                If either of those are not true then other factors apply.
                                • 13. Re: Server design (multithreading, serialization, and performance)
                                  800387
                                  As a further optimization, since you are sending the results to the client in a consistent form, have your updateable tree in whatever form makes the most sense in memory (e.g., a map of maps as you are doing, or better yet an actual graph of domain objects), then rather than simply copying that form to another reference that is read-only, transform it once to the format that clients will use. Make that your read-only version and cache it until the next update. Now, you are not constantly transforming and/or serializing for each request that will return the same response.

                                  - Saish