This discussion is archived
7 Replies Latest reply: Oct 4, 2011 1:53 AM by 892009 RSS

RMI Network Performances degradation

892009 Newbie
Currently Being Moderated
Hi,

My application is making use of (8) RMI processes in order to distribute some calculations over a cluster of computers.

I know that calculations are taking around 450ms on every single node but the whole computation is taking up to 1150ms in total.

I know that it is not a Serialization/Deserialization issue because if I kick calculations only on one node it takes exactly 500ms.

These are the timings I have when I kick simulations over the 8 node that compose my cluster:

Simulations on 1 node:
node 1 replies in 500ms

Simulation on 8 nodes:
node 1) 768ms
node 2) 80..
.....
......
node 8) 1150ms


Just for clarity I repeat that on the node itself calculations are only taking about 450ms, but the other 500ms are lost somewhere else (socket, network, ...)

Does anybody of you have experienced the same? Do you have any suggestion?

I know it's not a common practice this one, so if the question is not clear please feel free to make some questions.

Thanks,
Giovanni
  • 1. Re: RMI Network Performances degradation
    jtahlborn Expert
    Currently Being Moderated
    how is the controlling application launching the executions and receiving the replies?
  • 2. Re: RMI Network Performances degradation
    892009 Newbie
    Currently Being Moderated
    Hi,

    RMI Client and Servers are always up and running. I firstly start the 8 RMI processes on the nodes of the cluster and then I start the client.

    The client is a module in a EJB3 app and it instantiate a reference to the processes whenever the app starts.

    Once that a request of calculation comes in, the client app invokes the the RMI method on the 8 processes.

    Now, I've implemented both approaches: sync calls and callbacks. In both cases I got pretty much the same result.

    If I perform only one RMI call I cannot see any overhead or waste of time. As soon as the number of processes I use grows, the amount of time misteriosly wasted increases... till to get to a double when I make use all the 8 nodes.

    At the moment my fields of investigation are:

    1) Serialization/Deserialization. (but if it is a serialization issue it should manifest constantly and not only when I perform multiple calls)
    2) Network issues (is it possible that for transferring few bytes the network becomes the bottleneck?)
    3) Sockets (threads or buffers socket-related are not optimized for parellel uses)
    4) Can the RMI registry be somehow the bottleneck? Is it somehow involved in the data-transfer (I really think it is not... but still....)

    Regards,
    Gio
  • 3. Re: RMI Network Performances degradation
    EJP Guru
    Currently Being Moderated
    4) Can the RMI registry be somehow the bottleneck? Is it somehow involved in the data-transfer (I really think it is not... but still....)
    No it isn't.
    5. Concurrency issues. How concurrent is your code?
  • 4. Re: RMI Network Performances degradation
    jtahlborn Expert
    Currently Being Moderated
    EJP wrote:
    5. Concurrency issues. How concurrent is your code?
    yes this was what i was trying to get at with my first question. my guess (since you haven't show any code nor given nearly enough detail), is that your client code is somehow not making all the requests completely concurrent, so some requests don't get made until others finish. if you look at the timestamps on the worker nodes, do they all seem to be starting at the same time, or are some starting after others complete?
  • 5. Re: RMI Network Performances degradation
    892009 Newbie
    Currently Being Moderated
    Hi,

    here is the code.

    There are 2 methods, the Sync (multi threaded - blocking: 1 thread per rmi process) and async approach (1 thread kicking all the rmi requests but process side they are handled with callbacks).

    Yeah, anyway I'm sure that in both cases requests are triggered concurrently. In particular the asyc solution takes 50msec to fire request on all the nodes (so still serial approach but in a very fast fashion)
    private final PricingHpcOutputBean runSynchSimulations(
                   PricingHpcInputBean inputBean, int numSimulations) {
    
              PricingHpcOutputBean outputBean = null;
    
              try {
    
                   this.clean();
    
                   List<Future<PricingHpcOutputBean>> futures = new ArrayList<Future<PricingHpcOutputBean>>();
    
                   int numSimsPerProcess = 0;
    
                   for (int i = 0; i < numRmiProcesses; i++) {
    
                        numSimsPerProcess = (i == 0 ? (numSimulations / numRmiProcesses + numSimulations
                                  % numRmiProcesses)
                                  : numSimulations / numRmiProcesses);
    
                        Future<PricingHpcOutputBean> future = threadPool
                                  .submit(new MyRMIThreadClient(services, inputBean,
                                            numSimsPerProcess));

                        futures.add(future);

                   }

                   while (!allDones(dones, numRmiProcesses)) {

                        for (int j = 0; j < numRmiProcesses; j++) {

                             // Just retrieving the first one coz it's gonna be the
                             // object in which am gonna re-aggregate all teh results
                             if (!dones[j] && j == 0) {
                                  outputBean = futures.get(j).get();

                                  // Putting the source as null says to the impl taht this
                                  // is the basic result, the one we're gonan return.
                                  // Yeah, this is a trick! :-P
                                  serialReaggregation(null, outputBean);

                                  dones[j] = true;
                                  continue;
                             }

                             if (!dones[j] && futures.get(j).isDone()) {

                                  PricingHpcOutputBean source = futures.get(j).get();

                                  if (concurrentReaggregation) {

                                       log.error("runSimulations::concurrent reaggregation not implemented yet!!");

                                  } else {

                                       serialReaggregation(outputBean, source);

                                  }

                                  dones[j] = true;
                             }
                        }
                   }

              } catch (Exception e) {
                   log.error("runSimulations::cannot complete simulations", e);
              }

              return outputBean;
         }

         private final PricingHpcOutputBean runAsynchSimulations(
                   PricingHpcInputBean inputBean, int numSimulations) {

              starttime = System.nanoTime();

              try {

                   this.clean();

                   int numSimsPerProcess = 0;

                   for (int i = 0; i < numRmiProcesses; i++) {

                        numSimsPerProcess = (i == 0 ? (numSimulations / numRmiProcesses + numSimulations
                                  % numRmiProcesses)
                                  : numSimulations / numRmiProcesses);

                        services[i].runSimulations(inputBean, numSimsPerProcess);

                   }

                   long allsub = System.nanoTime();

                   System.out
                             .println("all sub in: " + (allsub - starttime) / 1000000d);

                   // This is a very nasty condition!!!!
                   while (receivedCallbacks != numRmiProcesses) {
                        if (callbackResults.isEmpty()) {
                             Thread.sleep(10);
                        } else {
                             System.out.println("callback avail after: " + (System.nanoTime() - starttime) / 1000000d);
                             long start1 = System.nanoTime();
                             serialReaggregation(null, callbackResults.remove(0));
                             System.out.println("Reagg took: " + (System.nanoTime() - start1) / 1000000d);
                             receivedCallbacks++;
                        }
                   }

                   System.out.println("Elapsed: " + (System.nanoTime() - starttime)
                             / 1000000d);

              } catch (Exception e) {
                   log.error("runSimulations::cannot complete simulations", e);
              }

              return getOutputContainer();

         }
    Gio
    
    Edited by: EJP on 4/10/2011 19:19: added {noformat}
    {noformat} tags: please use them.
  • 6. Re: RMI Network Performances degradation
    EJP Guru
    Currently Being Moderated
    How was the thread pool initialized?
  • 7. Re: RMI Network Performances degradation
    892009 Newbie
    Currently Being Moderated
    Hi,

    in case of the callback approach there was no thread pool.

    otherwise:
    threadPool = new ThreadPoolExecutor(numRmiProcesses, numRmiProcesses, 50000L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());
    threadPool.prestartAllCoreThreads();
    Thanks for the "code" thing, didn't know that :)

    And also thanks for answering my thread!

    Regards,
    Gio

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points