This discussion is archived
7 Replies Latest reply: Aug 5, 2009 6:39 AM by 800341 RSS

Thread priorities - Solaris 10 on virtualbox

800341 Newbie
Currently Being Moderated
After i ran my "hello world" from RTSJ, i wrote an extended version (in the sense it is also a sample program to play around) of that.

I am running the following code on a sun solaris 10 update 5, running on virtualbox.

The problem is that "Low_Priority" and "High_Priority" threads are sharing my single CPU fairly !!!! (which is unfortunately unfair here)

The TSV (with the input from a drecord script) shows the following
- high priority thread is running for 1.XX seconds - then
- low priority thread is running for 1.XX seconds - then
- high priority thread is running for 1.XX seconds .... and so on until the program comes to halt
- in the mean time, now and then RTGC preempts both threads.

The memory allocated was default 64 MB. Which is not quite enough here. Nevertheless it is understandable here that GC preempts both "Low_Priority" and "High_Priority" threads. But once GC is done, i would expect that the processor is given to the "High_Priority" thread and not to the "Low_Priority" thread.


My question is "am i doing something wrong" or my understanding is not correct?

I can share the log produced by the drecord but don't know how/where to upload?

here is the code
import javax.realtime.AbsoluteTime;
import javax.realtime.Clock;
import javax.realtime.PeriodicParameters;
import javax.realtime.PriorityParameters;
import javax.realtime.PriorityScheduler;
import javax.realtime.RealtimeThread;
import javax.realtime.RelativeTime;
import javax.realtime.SchedulingParameters;

public class Test {

     public static void main(String[] args) {

          CallerThread callerThread = new CallerThread();
          callerThread.setName("CallerThread");

          SchedulingParameters sp = new PriorityParameters(PriorityScheduler.instance().getMaxPriority());

          callerThread.setSchedulingParameters(sp);

          callerThread.start();

     }
}

class CallerThread extends RealtimeThread {

     public void run() {

          RelativeTime relativeTime = new RelativeTime(20, 0);
          PeriodicParameters pp = new PeriodicParameters(relativeTime);

          SchedulingParameters spLow = new PriorityParameters(PriorityScheduler.instance().getMaxPriority() - 20);
          SchedulingParameters spHigh = new PriorityParameters(PriorityScheduler.instance().getMaxPriority() - 1);

          MyThread low = new MyThread();
          low.setName("Low_Priority");
          low.setReleaseParameters(pp);
          low.setSchedulingParameters(spLow);
          low.start();

          try {
               Thread.sleep(100);
          } catch (InterruptedException e) {
               e.printStackTrace();
          }

          MyThread high = new MyThread();
          high.setName("High_Priority");
          high.setReleaseParameters(pp);
          high.setSchedulingParameters(spHigh);
          high.start();

     }

}

class MyThread extends RealtimeThread {

     private Clock rtClock;

     int count = 200;

     RelativeTime[] periods = new RelativeTime[count];
     boolean[] success = new boolean[count];

     public MyThread() {
          super();
          rtClock = Clock.getRealtimeClock();
     }

     public void run() {

          AbsoluteTime begin = rtClock.getTime();
          System.out.println(getName() + " starting: " + begin);

          int i = 0;

          while (i < count) {

               AbsoluteTime start = rtClock.getTime();

               // AbsoluteTime loopBegin = rtClock.getTime();

                        // put some load on the thread and some work for GC
               for (int a = 0; a < 15050; a++) {
                    String abc = Integer.valueOf(a).toString();
                    abc = abc.concat(abc);
               }

               // AbsoluteTime loopEnd = rtClock.getTime();

               // System.out.println(loopEnd.subtract(loopBegin));

               success[i] = waitForNextPeriod();

               AbsoluteTime end = rtClock.getTime();

               periods[i] = end.subtract(start);

               i++;
          }

          AbsoluteTime end = rtClock.getTime();
          System.out.println(getName() + " finishing: " + end + " runtime:" + end.subtract(begin));

          /*
          AbsoluteTime twenty = new AbsoluteTime(20, 0);

          for (int j = 0; j < periods.length; j++) {
               System.out.println(getName() + " " + periods[j] + " " + success[j] + " difference is " + twenty.subtract(periods[j]));
          }
      */
     }
}
Thanks
Ramsundar
  • 1. Re: Thread priorities - Solaris 10 on virtualbox
    807557 Newbie
    Currently Being Moderated
    Okay here's what I see happening, approximately ... and it's nothing to do with virtualbox :)

    Your threads try to do more work per period than will fit, so waitForNextPeriod never blocks, but returns false and so the threads effectively becomes busy loops. The HP thread runs until it consumes memory at which point GC becomes active. Now due to the way the RTGC works there are a number of RTGC threads involved and these have to coordinate with each other as well as mutator threads. As far as I can see RTGC reaches a point where all mutators are not blocked, but the originating allocation request for the HP thread is not yet able to unblock. At that point the LP threads preempts the RTGC threads and it gets to hog the CPU until it too consumes memory and has to block. This time it is the HP thread that gets to run eventually and preempt the RTGC, until it again exhausts memory. And so the threads continue taking turns like this.

    Part of the problem here is that both threads are above the RTGC critical priority and will always preempt GC except when directly trying to allocate.

    Try winding back the workload in each thread so they don't exhaust memory or exceed their 20ms period (or half of that if you want the system to be feasible on 1 CPU).

    HTH

    David Holmes
  • 2. Re: Thread priorities - Solaris 10 on virtualbox
    800341 Newbie
    Currently Being Moderated
    davidholmes wrote:
    Okay here's what I see happening, approximately ... and it's nothing to do with virtualbox :)
    Good to know that it has nothing to do with VirtualBox ;) Thanks for ur continued support David.
    Your threads try to do more work per period than will fit, so waitForNextPeriod never blocks, but returns false and so the threads effectively becomes busy loops. The HP thread runs until it consumes memory at which point GC becomes active. Now due to the way the RTGC works there are a number of RTGC threads involved and these have to coordinate with each other as well as mutator threads. As far as I can see RTGC reaches a point where all mutators are not blocked, but the originating allocation request for the HP thread is not yet able to unblock. At that point the LP threads preempts the RTGC threads and it gets to hog the CPU until it too consumes memory and has to block. This time it is the HP thread that gets to run eventually and preempt the RTGC, until it again exhausts memory. And so the threads continue taking turns like this.
    - That means HPT is waiting for RTGC (since memory allocated for critical threads is full).
    - RTGC is cleaning
    - LPT (preempts RTGC) comes into action and continues. BUT where does it get its necessary memory from? Since both of them are critical threads, they share the criticalMemoryArea or is it my wrong understanding?


    >
    Part of the problem here is that both threads are above the RTGC critical priority and will always preempt GC except when directly trying to allocate.
    So when a HPT is blocked by RTGC, a LPT can steal the processor from the running RTGC (thereby leaving the HPT still blocked)? and the scheduler allows this to happen?

    I am sorry. I couldn't understand this : "except when directly trying to allocate"
    Try winding back the workload in each thread so they don't exhaust memory or exceed their 20ms period (or half of that if you want the system to be feasible on 1 CPU).
    Yes it was my mistake. I always use eclipse on a 4 CPU server to try the programs there and then run it on my virtual box (which is slower, but where i have root access to run dtrace). On the virtual machine when i reduced the loop from
    for (int a = 0; a < 15050; a++) 
    to
    for (int a = 0; a < 5000; a++) 
    then it ran fine.

    One more question David : Why does the priority of the CallerThread changes during run time? Infact in the code it is the highest priority thread.
    In TSV I see that during runtime it's priority changed from "158 (RTSJ priority 69)" to "100 (RTSJ priority 11)".

    Thanks a lot for ur time David,
    Ramsundar
  • 3. Re: Thread priorities - Solaris 10 on virtualbox
    807557 Newbie
    Currently Being Moderated
    I'm not a RTGC expert (I'll ping the RTGC expert to comment on this) but what I suspect is that the HP thread tries to do an allocation and can't so it blocks while it induces a GC operation. The GC has to prevent all allocations for a time but as soon it is stops blocking all threads this allows the LP thread to preempt the GC which in turn stops the HP thread from being unblocked. Initially each thread will use a thread-local allocation buffer so there is some allocation space available to a thread even when the heap is exhausted.

    With regards to the CallerThread dropping its priority to the real-time minimum, this is happening during thread termination.

    David
  • 4. Re: Thread priorities - Solaris 10 on virtualbox
    807557 Newbie
    Currently Being Moderated
    Hi, I am the RTGC expert but it is difficult to draw any conclusion without the GC logs :-)
    A picture of your TSV window might also help.

    In general, try running with -XX:+PrintGC. This is often enough for a preliminary analysis.

    However, I do not think there is any RTGC issue. As explained by David, the issue is probably
    in the waitForNextPeriod call.

    If you exhaust the memory, the threads will block waiting for the RTGC to complete.
    FYI, note that this happens only because they allocate faster than what the RTGC recycles
    or because or was badly configured. If the system is feasible, then you must increase
    RTGCCriticalReservedBytes so that the RTGC will start soon enough and 'protect' your
    critical threads by preventing the other ones from allocating when memory is scarce.

    When the GC will have recycled enough memory, they will restart but they are very late
    and may have missed a few deadlines. Hence, you are now trying to recover from a timing
    error.

    The High priority thread will run, doing a busy loop without blocking on waitForNextPeriod().
    If it can catch up (e.g. the cost per loop is smaller than the period), it will finally block
    on waitForNextPeriod.

    Once the High priority thread has catched up, the Low priority thread will be able to run, also trying to catch
    up the missed periods without blocking. Note that, during that time, it will be periodically preempted by the
    High priority threads (to perform one of its period).

    I think this correspond to the behavior you describe. It should be easy to check that on TSV. Try using a script that
    shows the waitForNextPeriod probes.

    Bertrand.
  • 5. Re: Thread priorities - Solaris 10 on virtualbox
    800341 Newbie
    Currently Being Moderated
    Bertrand.Delsart wrote:
    Hi, I am the RTGC expert but it is difficult to draw any conclusion without the GC logs :-)
    A picture of your TSV window might also help.

    In general, try running with -XX:+PrintGC. This is often enough for a preliminary analysis.
    I always ran with -XX:+PrintGC. I saw that at some point in time RTGC blocked 2 threads. (understandably my HPT and LPT). Nevertheless, What i was expecting was that HPT will run immediately after RTGC and not the LPT.
    However, I do not think there is any RTGC issue. As explained by David, the issue is probably
    in the waitForNextPeriod call.
    Yes, when I increased the period to 50 ms (earlier 20 ms) then i see a good picture with TSV. ie. HPT and LPT run perfectly OK and also RTGC running when both LPT and HPT are NOT running.
    If you exhaust the memory, the threads will block waiting for the RTGC to complete.
    FYI, note that this happens only because they allocate faster than what the RTGC recycles
    or because or was badly configured. If the system is feasible, then you must increase
    RTGCCriticalReservedBytes so that the RTGC will start soon enough and 'protect' your
    critical threads by preventing the other ones from allocating when memory is scarce.

    When the GC will have recycled enough memory, they will restart but they are very late
    and may have missed a few deadlines. Hence, you are now trying to recover from a timing
    error.

    The High priority thread will run, doing a busy loop without blocking on waitForNextPeriod().
    If it can catch up (e.g. the cost per loop is smaller than the period), it will finally block
    on waitForNextPeriod.

    Once the High priority thread has catched up, the Low priority thread will be able to run, also trying to catch
    up the missed periods without blocking. Note that, during that time, it will be periodically preempted by the
    High priority threads (to perform one of its period).
    yes. using the RTGCCrtiticalReservedBytes, the RTGC runs now and then to keep some memory free, MOSTLY not blocking my HPT and LPT threads. At some point it blocks. I played around with different -Xmx and -Xms values. In some runs HPT, LPT and RTGC runs without blocking.

    I think this correspond to the behavior you describe. It should be easy to check that on TSV. Try using a script that
    shows the waitForNextPeriod probes.
    I will try to write my own scripts, which i never did yet ;)
    Bertrand.
    David and Bertrand, thanks a lot for your time and suggestions.

    Thanks
    Ramsundar
  • 6. Re: Thread priorities - Solaris 10 on virtualbox
    807557 Newbie
    Currently Being Moderated
    Hi Ramsundar,

    David has forwarded me the TSV view and I did more analysis. In fact, the RTGC was partly culprit... but only because of your you of waitForNextPeriod :-)

    The short answer is that your application was not feasible and was saturating the CPUs with critical threads that were missing their deadline. The system is not designed to properly handle this situation, which is already an error case :-)

    Hard real-time thread (e.g. critical threads) MUST let enough CPU time for the RTGC to recycle what they need. If not, the system is not feasible... and the behavior does not really matter since the system is already in a very bad shape if it is indeed a hard real-time system.

    The more complete answer is that the issue came from the fact that when your LT or you HP was awaken by the GC, it started to run at a higher priority and, since you were running with a single CPU, it prevented the GC from waking up the other thread.

    This is not supposed to be an issue for critical threads because they should never block if the GC is properly configured (high enough 'reserved' bytes). If they block, we are already in an erroneous case. Hence, we decided
    to implement fairness when the GC unblocks threads: instead of waking up the highest priority thread, it wakes up the oldest first. This is not an issue for threads that run at a lower priority (which do not prrempt the GC) and, as stated before, we are already in an error recovery mode for the ones at a higher priority. Hence, this fairness allows to avoid strange behavior (like unexpected OutOfMemoryError for low priority threads) while not impact the determinism of a feasible application.

    Back to your example, at one cycle, the GC wakes up HP and is preempted before waking up LP. At the next cycle, it will wake up LP and be preempted before waking up HP. This seems to explain what I am seeing in the TSV log:
    HP GC LP GC HP GC LP...

    Regards,

    Bertrand.
  • 7. Re: Thread priorities - Solaris 10 on virtualbox
    800341 Newbie
    Currently Being Moderated
    Bertrand.Delsart wrote:
    Hi Ramsundar,

    David has forwarded me the TSV view and I did more analysis. In fact, the RTGC was partly culprit... but only because of your you of waitForNextPeriod :-)

    The short answer is that your application was not feasible and was saturating the CPUs with critical threads that were missing their deadline. The system is not designed to properly handle this situation, which is already an error case :-)
    I agree with u. But what i wanted to see is that which thread (HPT or LPT) completes first. Till now i thought once RTGC is done then the HPT will be given priority. Therefore I thought that HPT will do the job first by preempting the LPT completely. (ie. if the normal job time is 5 seconds, HPT will do it in 5 seconds and LPT will do the same in 10 seconds including the 5 second period preempted by the HPT. Like L,H,H,H,H,H,L,L,L,L)

    The highlighted text answers my question. This was the key point i am looking for. exactly what i wanted. you say that once RTGC is done with its job, it will kick start the oldest blocked and NOT the highest priority thread. Like L,H,H,L,H,L,H,L,H,L.

    with this mistake i learned something new ;)
    Hard real-time thread (e.g. critical threads) MUST let enough CPU time for the RTGC to recycle what they need. If not, the system is not feasible... and the behavior does not really matter since the system is already in a very bad shape if it is indeed a hard real-time system.

    The more complete answer is that the issue came from the fact that when your LT or you HP was awaken by the GC, it started to run at a higher priority and, since you were running with a single CPU, it prevented the GC from waking up the other thread.

    This is not supposed to be an issue for critical threads because they should never block if the GC is properly configured (high enough 'reserved' bytes). If they block, we are already in an erroneous case. Hence, we decided
    to implement fairness when the GC unblocks threads:{color:#0000ff} instead of waking up the highest priority thread, it wakes up the oldest first.{color} This is not an issue for threads that run at a lower priority (which do not prrempt the GC) and, as stated before, we are already in an error recovery mode for the ones at a higher priority. Hence, this fairness allows to avoid strange behavior (like unexpected OutOfMemoryError for low priority threads) while not impact the determinism of a feasible application.
    let me explain what i understood.

    please refer the image [in this link|http://tinypic.com/r/9v9i5c/3] . This is one of the sample trace that i have.

    The one on the top is the HPT, second row is the LPT and the third is the caller thread (CT).

    The steps are:
    1. CT starts and then initiates the LPT
    2. CT sleeps from some ms
    3. CT then initiates the HPT
    4. HPT preempts LPT
    5. HPT runs until RTGC blocks (*/*/*/)
    5.5 LPT (tried to) use CPU
    6. RTGC runs (first pair of bookmarks)
    7. HPT resumes
    8. RTGC runs
    9. (this time) LPT runs (i think this is what u meant with the{color:#0000ff} highlighted text{color}. It is the oldest waiting thread which was blocked in step 5.5)
    10. from now on the oldest blocked is served first....

    */*// -> HPT and LPT use/share the same heap (criticalMemoryArea) area right?
    Back to your example, at one cycle, the GC wakes up HP and is preempted before waking up LP. At the next cycle, it will wake up LP and be preempted before waking up HP. This seems to explain what I am seeing in the TSV log:
    HP GC LP GC HP GC LP...

    Regards,

    Bertrand.
    Thanks Bertrand and thanks David for forwarding my query to Bertrand. :)
    Ramsundar