This discussion is archived
9 Replies Latest reply: Nov 27, 2011 6:55 AM by Mohan RSS

Analyzing thread-core interaction

Mohan Newbie
Currently Being Moderated
I am planning to use the studio in Ubuntu and analyze parallel Java programs and the way threads are used effectively in a multic-core environment. Is that feasible ? Is there any reference material that explains how to do this in the studio ? Herlihy and shavit's book seems to be a good reference.

Can I run my Java programs in specialized multi-core environments like the Google app. engine and analyze the execution dump in my Ubuntu desktop which has studio ? Hope I am not on the wrong track here.
I am also looking for some advice about how to go about this.


Thanks
  • 1. Re: Analyzing thread-core interaction
    NikMolchanov Newbie
    Currently Being Moderated
    Mohan wrote:
    I am planning to use the studio in Ubuntu and analyze parallel Java programs and the way threads are used effectively in a multic-core environment. Is that feasible ? Is there any reference material that explains how to do this in the studio ?
    Yes, we profile multi-threaded java applications very often and I think you will find the profiling data very useful.
    For simple profiling you can run:

    collect -j on java ...

    where "..." is what you usually pass to java to run your application.
    If you use a script to start your application, you have to create a copy of this script and insert "collect -j on " before the java invocation.

    It is important to use Oracle JDK 1.6 or 1.7, because they support API that we use to get user's call stacks.
    Here is an example that shows call tree of Netbeans IDE:

    % er_print -ctree test.1.er
    ...
    | +-  2.501   (13%)    org.netbeans.core.startup.TopThreadGroup.run()
    | | +-  2.501   (13%)    org.netbeans.core.startup.Main.start(java.lang.String[])
    | | +-  2.180   (11%)    org.netbeans.core.startup.Main.getModuleSystem()
    | | | +-  1.510   (8%)    org.netbeans.core.startup.ModuleSystem.restore()
    | | | | +-  1.510   (8%)    org.netbeans.core.startup.ModuleList.trigger(java.util.Set)
    | | | | +-  1.500   (8%)    org.netbeans.core.startup.ModuleList.installNew(java.util.Set)
    | | | | | +-  1.440   (7%)    org.netbeans.ModuleManager.enable(java.util.Set)
    | | | | | | +-  1.120   (6%)    org.netbeans.core.startup.NbInstaller.load(java.util.List)
    | | | | | | | +-  0.380   (2%)    org.netbeans.core.startup.NbInstaller.loadCode(org.netbeans.Module, boolean)
    | | | | | | | +-  0.380   (2%)    org.netbeans.core.startup.NbInstaller.loadLayers(java.util.List, boolean)
    | | | | | | | +-  0.290   (1%)    org.netbeans.core.startup.Main.initUICustomizations()
    | | | | | | | +-  0.050   (0%)    org.netbeans.core.startup.MainLookup.modulesClassPathInitialized()
    | | | | | | | +-  0.010   (0%)    org.netbeans.core.startup.CoreBridge.getDefault()
    | | | | | | | +-  0.010   (0%)    org.netbeans.core.startup.NbInstaller.checkForDeprecations(java.util.List)
    | | | | | | +-  0.160   (1%)    org.netbeans.StandardModule.classLoaderUp(java.util.Set)
    | | | | | | +-  0.060   (0%)    org.netbeans.core.startup.NbInstaller.classLoaderUp(java.lang.ClassLoader)
    | | | | | | +-  0.050   (0%)    org.netbeans.core.startup.NbInstaller.prepare(org.netbeans.Module)
    | | | | | | +-  0.040   (0%)    org.netbeans.ModuleManager.simulateEnable(java.util.Set)
    | | | | | | +-  0.010   (0%)    org.netbeans.ProxyClassLoader.append(java.lang.ClassLoader[])
    | | | | | +-  0.060   (0%)    org.netbeans.ModuleManager.simulateEnable(java.util.Set)
    | | | | +-  0.010   (0%)    org.netbeans.core.startup.ModuleList.flushInitial()

    You can select any function and navigate to its source, byte code, machine instructions (in "Machine" mode).
    There are tabs that show Threads (how long each thread worked), Lines (hot lines), PCs (hot addresses).
    You can see all threads in the Timeline tab - it shows what each thread did at the moment it was profiled.
    You can also use HW counters to find cache misses and other information.
    Herlihy and shavit's book seems to be a good reference.
    There is a lot of documentation. You can start from this page:
    http://download.oracle.com/docs/cd/E18659_01/html/821-1379/index.html

    There is a very good demo created by Marty Itzkowitz:
    http://webcast-west.sun.com/interactive/07D01004_01/
    Can I run my Java programs in specialized multi-core environments like the Google app. engine and analyze the execution dump in my Ubuntu desktop which has studio ? Hope I am not on the wrong track here.
    I am also looking for some advice about how to go about this.
    If this environment uses many machines - then I don't know if "collect" can handle it properly.
    The main problem is that it has to save the results in some directory, and if this directory is
    accessable from all these machines, then everything may work just fine, but we did not
    test this case.

    Please, let us know if you have any problem with java profiling.

    Thanks.
    Nik
  • 2. Re: Analyzing thread-core interaction
    Mohan Newbie
    Currently Being Moderated
    I was able to profile my Java program using the thread analyzer. More specifically I was using the fork join framework and I was looking for data to understand how a Java thread gets allocated to a particular core. This is just to understand and view how multiple cores are utilized when a work stealing algorithm runs.

    Are there any specific recommendations for this ?
  • 3. Re: Analyzing thread-core interaction
    NikMolchanov Newbie
    Currently Being Moderated
    You can see threads in Timeline tab, and you can also switch from threads to cpus in this tab.
    If you are interested in one thread (how it migrated from one cpu to another), you can open
    Threads tab, select this thread and set filter to exclude other threads (using context menu).
    Then you can see how this threads migrated from one cpu to another and which cpus it
    used in Timeline tab and in CPUs tab.

    Thanks.
    Nik
  • 4. Re: Analyzing thread-core interaction
    Mohan Newbie
    Currently Being Moderated
    Thanks,. I think this project http://developers.sun.com/speedway/ on the cloud provides the studio tools but it seems to be on the backburner. This seems to be a way of using multi-core machines on the cloud for testing.

    Thanks,
    Mohan
  • 5. Re: Analyzing thread-core interaction
    Mohan Newbie
    Currently Being Moderated
    Is there documentation to understand the effect on CPU caches using the studio ? I am basically just beginning to understand these concepts. Any test programs that demonstrate effects on caches ? I understand these are specific to the hardware and software we execute.
  • 6. Re: Analyzing thread-core interaction
    NikMolchanov Newbie
    Currently Being Moderated
    Mohan wrote:
    Is there documentation to understand the effect on CPU caches using the studio ?
    Yes, you can take a look at "man collect" - it has a section about HWC profiling.
    Also there are some documents available on web, for example this one:


    http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCoQFjAB&url=http%3A%2F%2Fcscads.rice.edu%2Fworkshops%2Fsummer09%2Fslides%2Fperformance-tools%2FDProfile.cscads.pdf&ei=Die-TqvVC8evsALY97zfBA&usg=AFQjCNGeFvgIAM2wQvvndmGxoaIcjGPI8A&sig2=F0YSs6GF1Phk_0YFG-ouDg

    Basically what you need is to find out which HW counters are available on you system.
    You can use "collect" without arguments to find this information,
    After that you can decide which of them are important for you:
    L1 cache misses
    L2 cache misses
    L3 cache misses
    On some systems you can profile all of them during one run.
    On other systems you can profile some of them, and then rerun your application to profile other counters.

    I think I can easy create a test that shows some cache problems, like false sharing, but it is more interesting
    to see what kind of problems your application has. Anyway I'll post a simple example of a cache problem soon.

    Thanks.
    Nik

    Edited by: NikMolchanov on Nov 12, 2011 12:11 AM
  • 7. Re: Analyzing thread-core interaction
    Mohan Newbie
    Currently Being Moderated
    Very appreciable. The PPT is good but also quite advanced from the perspective of a Java developer. Not sure if your examples can show a false sharing Java thread program instead of C.

    I have read page 476. of 'The art of MutiProcessor Programming' by Herlihy and Shavit where there is a single para. on false sharing. Maybe profiling the example in page 152 using the analyzer is good proof for a beginner ?

    I am trying to demonstrate the utility of profiing on multiple cores but I don't have any real application.

    Mohan

    Edited by: Mohan on Nov 14, 2011 7:46 AM

    Edited by: Mohan on Nov 14, 2011 7:48 AM
  • 8. Re: Analyzing thread-core interaction
    NikMolchanov Newbie
    Currently Being Moderated
    Hi Mohan,

    Yes, you are right, I was going to post a simple C application with 3-4 POSIX threads to illustrate a false sharing problem.
    But it is interesting to illustrate it in a Java program as well. I'll try and let you know.

    Thanks.
    Nik
  • 9. Re: Analyzing thread-core interaction
    Mohan Newbie
    Currently Being Moderated
    I think I have an idea about Java code to induce false sharing. But the thread analyzer seems to require code instrumentation. Can I view thread id's, VA's and cache lines like it is shown in pages 35 and 36 of http://cscads.rice.edu/workshops/summer09/slides/performance-tools/DProfile.cscads.pdf ? An example will surely help.

    Thanks.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points