This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Sep 4, 2008 1:50 PM by 807557 RSS

Jitter

800408 Newbie
Currently Being Moderated
Hi all,
we communicate with four PC and one server
the communication threads are real time thread in priority 59.
we run this system about half an hour without major problem (we got a few times jitter of 65 msec for normal run of 8 msec. we wonder why its happened ???).

but this is our minor problem, we got bigger problem with the following case.
we know that our system will include more threads that we are testing now.
so in order to simulate it we add the following threads:

we add the following threads to our sun server machine:
50 Real time threads woke each 500 msec working for 5 msec priority 30,
50 normal threads woke woke each 500 msec working for 5 msec,
50 Real time threads woke each 4000 msec working for 50 msec priority 30,
50 normal threads woke woke each 4000 msec working for 50 msec
Total 200 threads

for our station we add the following threads :
10 Real time threads woke each 500 msec working for 5 msec priority 30,
10 normal threads woke woke each 500 msec working for 5 msec,
10 Real time threads woke each 4000 msec working for 50 msec priority 30,
10 normal threads woke woke each 4000 msec working for 50 msec.
Total 40 Threads.

when we add the those threads we got tremendous jitters about 1200msec 1400 msec.

the vm parameter we running are:
-Xmx2g -Drtsj.precompile=nhrt.precompile -XX:RTGCCriticalReservedBytes=100m -XX:NormalMinFreeBytes=100m -XX:PrintGC

we need to know what we made wrong.
i read the latest article "A Practical Introduction to Achieving Determinism"
and i don't have any Static Initialization etc that can make react so nondeterministic.

Thanks
Gabi
  • 1. Re: Jitter
    807557 Newbie
    Currently Being Moderated
    Gabi,

    Please tell me how you measure the jitter. Exactly at what point do you start the measurement and at exactly do you stop the measurement.

    Thanks, Greg
  • 2. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi Greg,
    i'm writing from my home so ill try to describe it as much as i remember.
    so it goes like that:
    1) i send a message through some socket connection.
    each of my messages includes the task name and the send time(the send time is getting from Systen.nanotime()).
    2) on the other side i received this message and parse it(im just get from this message the task name and send time).
    and send it back to the sender.
    3) The sender received this message read the current time(by Systen.nanotime()), and make the subtraction between the current time and the time in the message.

    Thats the all principle.
    i hope it clear.

    Thanks for your support
    Gabi
  • 3. Re: Jitter
    807557 Newbie
    Currently Being Moderated
    Let's let rawSendTime[i] be the difference in times for the ith send as you describe in the post. What do you consider jitter? Something like max(rawSendTime[over all i]) - min(rawSendTime[over all i])? Or something else?
  • 4. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Greg,

    let me explain what i see as a jitter.

    suppose i get on (current time)-(send time)= xxx stable and deterministic

    so when i got a number once in a while thousand times more and than back to normal for me is a jitter.



    ill give you an example:

    i am sending a data via socket .

    This message took always between 7-10 msec

    but when i get sometimes 1400msec,120 msec and than it back again to 7-10 msec i call it a jitter.



    in any case ill try your suggestion first thing in sunday morning .



    ill Appreciate your help.



    Have a nice day

    Gabi
  • 5. Re: Jitter
    807557 Newbie
    Currently Being Moderated
    Gabi, Any number of things can be going on here.

    First, note that some of your threads have periods of 4000ms. It is entirely possibly that the logic for that thread might execute at different places in the period because of other activity. This is completely normal. E.g., suppose the start time for such a thread was 1000ms. The first period starts at time 1000 the second at 5000 the third at 9000 and so on. The logic might execute at time 1001 in the first period, 8000 in the second period, 10500 in the third, and so on. This is to be expected. I'm not sure of your system design so I can't ascertain if such an execution pattern could affect your measurements.

    Another area to consider is the network. Are these machines on a corporate LAN? If so, this is not a good test environment. The machines should be connected via a dedicated switch with no other machines on the switch. Or, via a point-to-point Ethernet cable.

    Are any other non-JavaRTS processes running on the systems? If so, this can cause delays in the TCP stack.

    Have you correctly used the ITC compiler to pre-compile all the methods executed in the RTTs? Have you read the documentation about using the ITC system?

    Have you set the priorities of the RTTs to a higher value than the default maximum for the RTGC? If not, then the RTGC can be preempting the threads being measured?

    Have you determined the minimum reserved bytes for the time-critical threads and set the appropriate command-line arguments? If not the memory use of java.lang.Thread instances can interfere with the time-critical RTTs. Have you read the documentation for setting the minimum reserved bytes?

    Are these system multi-core machines? If so, how many cores? Have you read the documentation for setting the number of CPUs on wihc the RTGC should use? If not, please do so.

    It may be the case that you should talk to us about some consulting services. With access to the machines and code we can likely solve your problem quickly.

    Have you taken the RTSJ / Java RTS programming class Sun Education offers? It can help a lot.

    Greg
  • 6. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi Greg,
    Let me answer your question:
    Are these machines on a corporate LAN?
    No they are dedicate LAN.

    Are any other non-JavaRTS processes running on the systems?
    No im the only one who running those tests.

    Have you correctly used the ITC compiler to pre-compile all the methods executed in the RTTs?
    we do it in two stages :
    1- in the first stage we run our our program jvm with -XX:+RTSJBuildCompilationList for three minutes
    2- in the second stage we run the program with -Drtsj.precompile=nhrt.precompile
    is it answer your question???

    Have you read the documentation about using the ITC system?
    Yes i am.

    Have you set the priorities of the RTTs to a higher value than the default maximum for the RTGC?
    Yes my critical threads are priority 59 (please watch my description on first message)

    Have you determined the minimum reserved bytes for the time-critical threads and set the appropriate command-line arguments? If not the memory use of java.lang.Thread instances can interfere with the time-critical RTTs. Have you read the documentation for setting the minimum reserved bytes?

    -XX:NormalMinFreeBytes=100m
    and yes i read it.

    Are these system multi-core machines? If so, how many cores? Have you read the documentation for setting the number of CPUs on wihc the RTGC should use? If not, please do so
    the one machine is multi core 8 cpu dual core, other single cpu dual core.
    with this thing i am not feel so comfortable, i will read more.

    Have you taken the RTSJ / Java RTS programming class Sun Education offers?
    we wait for this course and need it urgent.

    thanks for your help.
    ill try to give more details on the next week.
    Thanks again
    Have a nice day
    Gabi
  • 7. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi Greg,
    we focused ourselves in some small application that simulate our problem.
    We want to give more details on our problem :

    We have two nodes ( A,B ) in a 1Gb LAN . Node A send 1KBytes each 30ms to Node B , via Tcp/Ip socket ( with Tcp_NoDelay attribute ) .
    Node B receives those msgs and calculates the time delta between messages arrival .Usually the delta is 30ms , but sometimes ( after several minutes ) it gets to 30+50 or 30+100 or even 30+800 ms ! ( When it occurs all the waiting msgs are received immediately )

    More details :
    Both applications run in RealTimeJava jvm . Node A runs only 1 critical thread, priority 59 , which send msg every 30ms .
    Node B runs 1 critical thread,priority 59 , which receives the message and calculate the delta , and another 20 critical thread with priority 30 which do some calculations , and another 20 normal threads which do some calcluations .
    Node B runs on dual core station , and the CPU load is about 30% .

    Note that if we run the same application in node B , without the 40 threads - we receive no jitter - the delta is always 30ms .
    Also note that if we run the 40 threads , but with normal priority , we still get the same jitter ( but its start occurring only after 40-50 minutes ) .

    Also note that the jitter is not caused because of GarbageCollection - because
    a) The print GC options states that no thread was stucked beacuse of the GC , and
    b) In different runs , the jitter occurrs after diffrenet amount of sends ( sometime after ~3000 sends , sometimes after ~9000 sends )
    c) We use the CriticalReservedBytes & NormalFreeBytes flags
    d) The same phenomena occurred even when we allocate all memory ( "new " statements ) in the initialization phase

    ( You can also discard the JIT compilation problem )

    I also want to add that in an application which does not envolve I/O , the RTJ is O.K. - meaning that the most critical thread has no jitter , even if there are another threads in the jvm .

    So to summarize , the problem is caused when we add threads ( even normal ones ) to the application - but it is not pure RTJ problem ( and not related to the GC problem )


    We think that the problem is that the 40 application threads prevent from the operating system treat the Tcp message . We think that there is a tcp kernel deamon in Solaris , which can not be run because of the (40) application threads (note that there are only 2 cores ) . Another idea is that the jvm does not send an interrupt ( for some reason ) to the critical thread when a msg arrived .

    1. Can you explain us what is done in the OS when a Tcp msg arrive , and what is ?
    2. Is our theory right ?
    3. Do you have other theories ?
    4. If our theory is right - how can we solve the problem ?
  • 8. Re: Jitter
    807557 Newbie
    Currently Being Moderated
    Gabi,

    You've done a very good job isolating and recreating the problem.

    A couple more questions:
    1. Are there any other threads anywhere in the system using TCP other than the 1 thread in node A sending and the thread in node B receiving? This includes any possible threads in other non-Java processes?

    2. Let's say the jitter is 30+300. Do you receive 11 x 1KB bytes? (I think this is what you mean when you say ... "all the waiting msgs are received immediately" but I want to make sure).

    3. Have you tried the test with the other threads at getMinPriority() ( == 11)?

    4. Have you used, 'tsv'? If not, this would be an important next step. tsv is the 'Thread Scheduling Visualizer' You can download it form the download site.

    The 40 application threads, at the priorities you state, should not prevent a system/kernel thread from running.
    I don't know the details of tcp handling w/in Solaris. If we need to, later, we can bring in a Solaris TCP expert.
    Have you tried doing this with UDP? The difference is that UDP is block oriented and TCP is stream oriented. Try the test by sending 1KB datagrams over a UDP connection and send me the results.
    Greg
  • 9. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi Greg,

    Thanks for your reply.

    ill try tomorrow your suggestions and will send you our results.



    Thanks

    Gabi
  • 10. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi Greg,

    we answering your questions
    *1. Are there any other threads anywhere in the system using TCP other than the 1 thread in node A sending and the thread in node B receiving? This includes any possible threads in other non-Java processes?*
    as far as i know there are not any threads using tcp.

    *2. Let's say the jitter is 30+300. Do you receive 11 x 1KB bytes? (I think this is what you mean when you say ... "all the waiting msgs are received immediately" but I want to make sure).*
    yes we recived all the 11 messsages one after one imidiatelly

    *3. Have you tried the test with the other threads at getMinPriority() ( == 11)?*
    no, but we run also with normal threads. is it answer your question??
    why you think its important??

    *4. Have you used, 'tsv'? If not, this would be an important next step. tsv is the 'Thread Scheduling Visualizer' You can download it form the download site.*
    no

    Have you tried doing this with UDP?
    we intend to do it but not tested it yet.

    Thanks

    Gabi/Doron
  • 11. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi Greg,
    we made this test in UDP the jitter remains as we got in tcp.

    i made some more tests and i have some questions:
    we run tsv program and we got some strange behavior.
    in our machine we have 8 CPUs dual core (id 0 -15).
    but in tsv we see in some time period, there are working a few threads on single cpu (for example in time x thread 1 thread 2 thread 5 thread 6 working on the same cpu) we can not understand it.
    we are expect that each thread when he is working occupy single cpu(actually a core).
    are we missing somthing??

    Thanks

    Gabi
  • 12. Re: Jitter
    807557 Newbie
    Currently Being Moderated
    Gabi wrote:
    we run tsv program and we got some strange behavior.
    in our machine we have 8 CPUs dual core (id 0 -15).
    but in tsv we see in some time period, there are working a few threads on single cpu (for example in time x thread 1 thread 2 thread 5 thread 6 working on the same cpu) we can not understand it.
    we are expect that each thread when he is working occupy single cpu(actually a core).
    are we missing somthing??
    First TSV only shows you the threads from the JVM process (well more accurately the drecord script only captures information for those), so there may be "more important" stuff from other processes running on those other processors - it all depend on what threads, their priorities and what they were doing.

    Second I'm assuming you don't have any processor sets defined.

    Solaris will schedule threads with some consideration of where they last ran (processor affinity) to gain potential benefit from "hot" caches, or at least "warm" ones; but I don't think the base Solaris scheduler attempts to perform load-balancing (this is generally done at a higher-level). So if there are processors to spare ands the load is light there's no general expectation for threads to run on distinct processors - rather under light load you would expect a thread to come back to the same processor when it can.

    Of course if you are seeing unexpected preemption in these threads then you need to see what is executing on those other processors at the times of interest.

    David Holmes
  • 13. Re: Jitter
    800408 Newbie
    Currently Being Moderated
    Hi David,
    i might not explain myself well.
    let me try to again:
    How can it be in the same time there will be several threads on the same core?
    f.g we saw :

    Thread1 ----(core1) ----(core2) -----(core 1)

    Thread2 --(core3) ----(core3) -----(core 2)

    Thread3 ----------(core1) ---(core2) -----(core 3)

    ^
    So at this time | both Thread1 & Thread3 ran on the same core (1)

    Thanks
  • 14. Re: Jitter
    807557 Newbie
    Currently Being Moderated
    Sorry Gabi I misunderstood.

    Have you zoomed right in to see the detail? There is a bug that limits the maximum zoom using the slider, but even if the slider is at 100% if you drag a rectangle in the thread view you can zoom in even further. Zoomed in enough you should see that the threads are actually going off the cpu.

    That said, if you still have the unpatched system then you might have some erroneous timestamps in the data file due to the DTrace timer bug. When I had that bug I saw things happening in an impossible order (eg thread B running before thread A was started, when A started B).

    David Holmes
1 2 Previous Next