This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Feb 9, 2012 5:19 AM by gimbal2 RSS

ThreadLocal and False Sharing, are they assigned to the Stack or Heap

916264 Newbie
Currently Being Moderated
I've been running into quite a few issues that I've isolated down to False Sharing.

To help rule out variables that could have False Sharing I'm considering defining some as ThreadLocal. However I'm having some doubts as to whether this will help. The reason is if the ThreadLocal variable is assigned to the Heap what would stop it from being put next to a Variable that another thread would be using. However, if ThreadLocal variables always were on the Stack then maybe it could help. Fighting the slowdowns I'm getting from false sharing feels like I'm fighting with the GC, and where it has put an object or set of objects in memory. The next step seems to be to add object padding, and that just seems unpredictable since the JVM could just optimize the padding to nothing.
  • 1. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    EJP Guru
    Currently Being Moderated
    ThreadLocal variables are object references. Object references go on the heap if they are class/instance members; on the stack if they are method-local.

    If you're referring to the ThreadLocal objects themselves, they are objects so they are always in the heap.
  • 2. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    916264 Newbie
    Currently Being Moderated
    So there's no way to stop the GC or Heap from putting two variables next to each other so they are on the same cache line to stop the false sharing?
  • 3. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    796440 Guru
    Currently Being Moderated
    You have no control over the physical memory layout of your objects. And if you think you need it--if you think you'll be better at it than the team of engineers at Sun/Oracle that have worked to refine and improve it over the last 15 or so years, then you'll need to use a lower-level language. And even then...
  • 4. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    EJP Guru
    Currently Being Moderated
    So there's no way to stop the GC or Heap from putting two variables next to each other so they are on the same cache line to stop the false sharing?
    Variables are put on the stack. Objects are put in the heap. Which are you talking about?
  • 5. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    916264 Newbie
    Currently Being Moderated
    Actually, objects can be put on the stack in 1.6 and 1.7, but it requires that Escape Analysis determine that the Object is actually eligible to be allocated on the Stack instead of the Heap.

    Java does give some access for controlling memory layout. ByteBuffer.allocateDirect(x) is a fine example of just such a thing. Heaven forbid even considering Unsafe, or how else do you make an custom Atomic?

    Situations where False Sharing occurs are just the kind of situation where I need a little more control. An example would be where I have an Array where I'm just going to Increment elements. Because I have more then 1 core I can just start up as many threads as I have cores and do the increments. However, since they are in an Array and are most likely right next to each other in memory I will have a heavy amount of L1 cache Thrashing. Even if the Object in the array use Atomic methods to do the incrementing and even if I have placed memory barriers to prevent one thread from accessing the same element as another thread. The way you get around this is by padding the objects so that they don't overlap the same cache line. I don't like padding since a compiler can remove it if it sees it as unused or the JVM can remove it if it sees it as unused.

    Intel's Advice on tracking down and reducing the causes of False Sharing is to use Thread-Local storage on variables that don't need to be shared so they can be eliminated as a potential cause of the thrashing. ThreadLocal gives the correct description of what is needed in the JavaDoc, but if it is allocated on the Heap then even a ThreadLocal variable cannot be discounted as a cause of the False Sharing.

    My question is dealing with reducing False Sharing by whatever means Oracle intends me to. That ,however, isn't very apparent, and is the reason why I'm asking.
  • 6. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    EJP Guru
    Currently Being Moderated
    Java does give some access for controlling memory layout. ByteBuffer.allocateDirect(x) is a fine example of just such a thing.
    No it isn't. You don't have any control over where it is laid out.
    I don't like padding since a compiler can remove it if it sees it as unused or the JVM can remove it if it sees it as unused.
    The JVM can reduce the size of an array?
    Intel's Advice on tracking down and reducing the causes of False Sharing is to use Thread-Local storage on variables that don't need to be shared so they can be eliminated as a potential cause of the thrashing.
    I think you are here confusing what Intel means by thread-local with what Java means by ThreadLocal. Intel's thread-local appears to be to mean 'on the stack', much as I was saying above. Java's ThreadLocal means 'reachable only via the current thread', via a special data member in the Thread object.
    ThreadLocal gives the correct description of what is needed in the JavaDoc
    It gives a correct description of its own specification. It doesn't say anything about False Sharing, or about itself as a solution.
    My question is dealing with reducing False Sharing by whatever means Oracle intends me to. That ,however, isn't very apparent, and is the reason why I'm asking.
    It isn't apparent at all. I've never seen the term anywhere in the Java documentation, and I've been reading it for 15 years.
  • 7. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    916264 Newbie
    Currently Being Moderated
    Yes Java can reduce the size of an array by reducing the size of the object inside of it. If it sees a variable as unused or unreachable it will strip it out. That is a lot of what optimization is all about which is why padding that works in 1.6 fails in 1.7.

    If you don't know what False Sharing is why are you responding without at least doing a google search for "False Sharing Java"
    First link
    http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html
    This might help you too
    http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/

    The reason you're not aware of it could be numerous. You probably don't write any code that has any noticeable drawbacks to it or you've never bothered to see if your parallel algorithm actually is faster than the single threaded version in all testing. The problem is really only on Multi-Core Processors that have shared cache which hasn't been around for 15 years, and before they were introduced Multiple CPU servers typically didn't have shared cache and would go as far as having separate RAM banks to prevent this kind of memory thrashing. You won't even have it with 1 Core Hyperthreading if the OS knows what it's doing with thread scheduling. You only really start to see it when you have multiple cores that can invalidate the cache lines on the other cores which even some duel cores won't invalidate the others cache because they are on the same memory module like how the AMD Bulldozer is set up. This problem isn't "new", but it isn't old ether. Until you could make a clear bet and say that people have duel core and higher system it also wan't very important to look at.

    If it wasn't important for someone to look at the cache misses caused by java code why does the Oracle Solaris Studio allow you to do hardware profiling to that you can get a good estimate as to which line of code in your Java App is causing you the largest amount of cache misses?
  • 8. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    796440 Guru
    Currently Being Moderated
    >
    Java does give some access for controlling memory layout. ByteBuffer.allocateDirect(x) is a fine example of just such a thing.
    Okay, I stand corrected, although that's a different situation than what I was talking about.
    Heaven forbid even considering Unsafe,
    Bluddy right.
    or how else do you make an custom Atomic?
    You use the low level concurrency features, or build it on top of what's in java.util.concurrent.
    Situations where False Sharing occurs are just the kind of situation where I need a little more control.
    Then Java probably does not suit your needs. It was simply never intended to give the programmer that kind of low-level control.
    An example would be where I have an Array where I'm just going to Increment elements. Because I have more then 1 core I can just start up as many threads as I have cores and do the increments. However, since they are in an Array and are most likely right next to each other in memory I will have a heavy amount of L1 cache Thrashing.
    I thought that v7 was supposed to provide new finer-grained concurrency features, including some geared toward array handling--not direct control over memory layout, but some sort of higher level abstraction. I don't recall details though, or whether it would help with this particular situation.
    Even if the Object in the array use Atomic methods to do the incrementing and even if I have placed memory barriers to prevent one thread from accessing the same element as another thread. The way you get around this is by padding the objects so that they don't overlap the same cache line. I don't like padding since a compiler can remove it if it sees it as unused or the JVM can remove it if it sees it as unused.
    More to the point, Java is not intended for that kind of thing, and it would probably make for a very brittle solution to try to jam Java's square peg into that round hole.
    Intel's Advice on tracking down and reducing the causes of False Sharing is to use Thread-Local storage on variables that don't need to be shared so they can be eliminated as a potential cause of the thrashing. ThreadLocal gives the correct description of what is needed in the JavaDoc, but if it is allocated on the Heap then even a ThreadLocal variable cannot be discounted as a cause of the False Sharing.
    I don't see TL helping here, at least not by design, since it's just implemented as a Map, where the key is the Thread and the value is the "ushared shared" value, so it will all be on the heap, but maybe the authors of that doc know that the implementation is such that different threads' values will not likely share the same cache line. You can look at the implementation yourself and see if that's the case.

    End of the day, though, if you really need to control this kind of stuff directly, Java is probably not a good tool for your job.
  • 9. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    796440 Guru
    Currently Being Moderated
    medv4380 wrote:
    Yes Java can reduce the size of an array by reducing the size of the object inside of it. If it sees a variable as unused or unreachable it will strip it out.
    What are you talking about? Can you provide a reference?

    First, Java arrays don't hold objects, they hold references (or primitives).

    Now, if I have a 1,0000,000-element array with null at every element but the first and last, I suppose it's hypothetically possible that the JVM is allowed to implement that by holding only enough memory for those two references and some notation the the other 999,998 are null, (because I can't recall anything in the JVM spec that says it can't) but I highly doubt that it would do so for two reasons: 1) I believe that arrays are intended to be fast to access, and as such are probably implemented as contiguous memory, and 2) I don't think it's allowed because I don't think a[n] = some_non_null_reference is allowed to throw OutOfMemoryError, which it could do if the JVM were compressing arrays and then we tried to assign a value to an "eliminated" index.
  • 10. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    EJP Guru
    Currently Being Moderated
    Yes Java can reduce the size of an array by reducing the size of the object inside of it.
    I do not believe this claim. Please provide some evidence. It doesn't even make sense. There is no 'object inside' a Java array. It is an array of primitives, or an array of object references. You are repeating the same confusion I referred to above.
    If it sees a variable as unused or unreachable it will strip it out.
    A variable isn't the same thing as an array member.
    If you don't know what False Sharing is why are you responding without at least doing a google search for "False Sharing Java"
    I didn't say I didn't know what it is. I said I have never seen any Java documentation concerning it.
    The reason you're not aware of it could be numerous.
    I didn't say I wasn't aware of it either.

    As you aren't even citing me accurately, it is difficult to place any reliance on your other claims in this thread.
    If it wasn't important for someone to look at the cache misses caused by java code
    I didn't say that either.

    You are just fabricating statements and attributing them to me. Don't do that. It's called a 'straw man' argument, and it is fallacious.
  • 11. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    If it sees a variable as unused or unreachable it will strip it out.
    A variable isn't the same thing as an array member.
    And of course neither is the same as an object. And variables don't have the property of being reachable or not; objects do. The whole idea of "stripping something out of an array" makes no sense.
  • 12. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    916264 Newbie
    Currently Being Moderated
    jverd wrote:
    medv4380 wrote:
    Yes Java can reduce the size of an array by reducing the size of the object inside of it. If it sees a variable as unused or unreachable it will strip it out.
    What are you talking about? Can you provide a reference?
    The best example of it can be found at here .

    In that case if you change the order of the variables defined in the object Java7 will actually run as if their was no padding at all.
    class PaddedAtomicLong extends AtomicLong {       
    public volatile long p1, p2, p3, p4, p5, p6 = 7L;
    }

    I've run this through extensive tests to see how predictable or unpredictable the results are.
    False Sharing tests were run with an object like
    class NonPaddedAtomicLong extends AtomicLong {}
    where tests run with the FixedPadding where like PaddedAtomicLong

    On Windows 7 on a i3 duel core Java 1.7 the False Sharing looks like this
    False Sharing: Thread 1     100.04%
    False Sharing: Thread 2     194.51%
    False Sharing: Thread 3     439.52%
    False Sharing: Thread 4     355.27%

    On Ubuntu 11.10 Phenom II X4 the false Sharing Looks like
    False Sharing: Thread 1     100.00%
    False Sharing: Thread 2     595.31%
    False Sharing: Thread 3     1490.54%
    False Sharing: Thread 4     1129.10%

    As you can see the issue on the Duel Core with Hyperthreading doesn't really start to hurt until 3 threads are on forcing two real cores to start to thrash each others cache lines.

    With Padding The results are a bit more acceptable

    Windows 7 on a i3 duel core Java 1.7
    False Sharing Fixed Padding - T:1     100.00%
    False Sharing Fixed Padding - T:2     176.76%
    False Sharing Fixed Padding - T:3     167.61%
    False Sharing Fixed Padding - T:4     169.98%

    Ubuntu 11.10 Phenom II X4 Java 1.7
    False Sharing Fixed Padding - T:1     103.83%
    False Sharing Fixed Padding - T:2     128.14%
    False Sharing Fixed Padding - T:3     137.04%
    False Sharing Fixed Padding - T:4     765.33%

    Ubuntu 11.10 Phenom II X4 Java OpenJDK 1.6
    False Sharing Fixed Padding - T:1     105.53%
    False Sharing Fixed Padding - T:2     429.51%
    False Sharing Fixed Padding - T:3     491.93%
    False Sharing Fixed Padding - T:4     135.95%

    Ubuntu 11.10 Phenom II X4 Java 1.6
    False Sharing Fixed Padding - T:1     100.03%
    False Sharing Fixed Padding - T:2     162.48%
    False Sharing Fixed Padding - T:3     154.05%
    False Sharing Fixed Padding - T:4     130.41%

    Even though the Padding helped there were still weird differences. Java7 didn't like my Quad core if I had 4 threads but 6 did. The OpenJDK6 didn't like 2 and 3 threads but was fine with 4. These also weren't small single run tests but rather a 1000 tests then run though some analysis to identify any outlires that may have been caused by the GC deciding that that moment was a good time to run. Outliers were removed then the Median of each test was compared to the Median False Sharing single thread for that configuration.

    So what should be done when you have a situation like this when the Heap has two variables clearly sitting on the same cache line that more than one thread happens to be using? Getting some Objects allocated to the Threads Local stack when they are only used by that one thread would help reduce the possible causes, but clearly wouldn't help when they have to be assigned on the shared Heap which was probably one of the many arguments for escape analysis.

    I don't exactly like seeing an app jump to over 10x the time just because it happened to be on a system with multiple processors.
  • 13. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    EJP Guru
    Currently Being Moderated
    Yes Java can reduce the size of an array ...
    In that case if you change the order of the variables defined in the object Java7 will actually run as if their was no padding at all.
    class PaddedAtomicLong extends AtomicLong {       
    public volatile long p1, p2, p3, p4, p5, p6 = 7L;
    }
    There is no array here.

    In any case I am unable to make sense of all this. If Java 'reduces the size' of this object, why does it behave differently to your NonPaddingAtomicLong?
  • 14. Re: ThreadLocal and False Sharing, are they assigned to the Stack or Heap
    796440 Guru
    Currently Being Moderated
    medv4380 wrote:
    jverd wrote:
    medv4380 wrote:
    Yes Java can reduce the size of an array by reducing the size of the object inside of it. If it sees a variable as unused or unreachable it will strip it out.
    What are you talking about? Can you provide a reference?
    The best example of it can be found at here .
    Nothing there suggests that elements are "stripped out of" an array.

    And, not to sound like a broken record, but the feature the author is arguing for seems to go against 2 of Java's core goals: Simplicity and platform-agnosticism. While cache-line alignment might be a very useful thing to have for performance tuning, if it were to be added, I would hope it would be in the form of a higher-level abstraction. Else it would be like adding direct pointer manipulation, and if we want that, we've got C.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points