Forum Stats

  • 3,816,305 Users
  • 2,259,168 Discussions
  • 7,893,452 Comments

Discussions

45 Second Periodic Performance Disruption

2»

Answers

  • Henk Vandenbergh-Oracle
    Henk Vandenbergh-Oracle Member Posts: 813
    edited Mar 27, 2017 5:36PM

    So what IS a loopback device?

  • Bryce Guinta
    Bryce Guinta Member Posts: 22
    edited Mar 27, 2017 6:04PM

    It's a psuedo-block device that I have used to mount a large empty file to act as an actual device where IO can be done on. It uses the block layer just like other drive on linux. The above graphs used a Intel SSD DC S3500 480GB however

  • Bryce Guinta
    Bryce Guinta Member Posts: 22
    edited Mar 27, 2017 6:39PM

    The 30 second interval ping occurs on vdbench 5.03

    The 45 second interval ping occurs on vdbench 5.04.06

  • Henk Vandenbergh-Oracle
    Henk Vandenbergh-Oracle Member Posts: 813
    edited Mar 27, 2017 7:04PM

    A pseudo device, a fake device.

    Gotcha.

    ran an iorate=75k against /tmp, and I am seeing the same behavior on my OEL system, not Solaris.

    Do you have historic data running against Solaris? And if you do, do you have the same problem there?

    We're getting closer.

    BTW, you can aggravate the problem when running with -d28

    thx

    Henk.

  • Henk Vandenbergh-Oracle
    Henk Vandenbergh-Oracle Member Posts: 813
    edited Mar 27, 2017 7:59PM

    A 'fake' device!

    Knowing how to recreate the problem makes a heck of a difference!

    See this link:

    What does the Socket method setTcpNoDelay() do and when should I use it?

    Nagle's algorithm tries to send full data segments by waiting, if necessary, for enough writes to come through to fill up the segment.

    This looks pretty straight forward, even for someone who has no networking knowledge whatsoever.

    It appears there indeed is a difference between Solaris and Linux and the problem therefore is not happening on Solaris.

    TCP performance differences between RH Linux and Solaris in java? - Server Fault

    I added this call to the code and my problem on Linux now is gone.

    Let me mull this over a bit more and unless  I change my mind, I'll send you the new code tomorrow morning.

    Henk.

  • Henk Vandenbergh-Oracle
    Henk Vandenbergh-Oracle Member Posts: 813
    edited Mar 28, 2017 9:58AM Answer ✓

    Bryce,

    here is the fix, currently only accessible by you, as soon as I get an OK from you I'll open it to the world. The fix is only needed for Linux.

    This exact code has been around for 16 years, Chas probably still remembers when it started.

    It is understandable that it was not noticed: a 40 millisecond fluctuation in iops equals 1/25th of total iops and does not get noticed when doing only a few iops. In your case, 1/25th of 75,000 iops equals 3000 iops, and that does make a difference.

    Thank you so much for bringing this to my attention and providing me with all the data needed to get this fixed.

    Henk.

  • Bryce Guinta
    Bryce Guinta Member Posts: 22
    edited Mar 28, 2017 5:50PM

    Hi Henk,

    I ran a string search for SunOS/Solaris on archive data overnight but unforunately it didn't turn up anything useful. That of course doesn't matter anymore, since your patch fixed the issue for both the loopback device and the solid-state drive. Thanks!

    I did read about TCP Fusion, but didn't get to the buffering thread. I'm glad you figured it out.

    When do you think this fix will go into a release?

    Here's what the same graph now looks like:

    loop-with-fix.pngSSD-with-fix.png

    For comparison here's what it looks like without the fix:

    loop-without-fix.pngSSD-without-fix.png

    p.s. I may have mixed up the loopback device with the SSD for the initial graph in the thread

  • Henk Vandenbergh-Oracle
    Henk Vandenbergh-Oracle Member Posts: 813
    edited Mar 29, 2017 9:59AM

    When will it be in a new release? I have no short term plan to come out with a new version.

    Since it took 16 years to run into this I don't think there is a real rush to get this out , but my users will be able to download the fix.

    Thank you again for your help.

    Henk,

This discussion has been closed.