6 Replies Latest reply: Jul 7, 2014 2:10 PM by Henk Vandenbergh-Oracle RSS

    EOFException on vdbench 5.03

    e20f5b36-bf17-40b3-af2a-9bd65e6204ad

      Hello Henk,

       

      I seem to get the following error on vdbench 5.03 while running IOs, and I cant seem to be able to figure out why. I dont see this on the earlier version (5.02). Is this a known issue already? Thanks a lot for your time and help!

       

       

      11:52:18.851

      11:52:18.851 Waiting for synchronization of all slaves

      11:52:48.985 Waiting for slave synchronization: localhost-4

      11:53:18.987 Waiting for slave synchronization: localhost-7

      11:53:44.144

      11:53:44.178 Receiving unexpected EOFException from slave: localhost-2

      11:53:44.178 This means that this slave terminated prematurely.

      11:53:44.178 This thread will go to sleep for 5 seconds to allow

      11:53:44.178 slave termination to be properly recognized.

      11:53:44.178

      11:53:49.084 Waiting for slave synchronization: localhost-7

      11:53:58.678

      11:53:59.852 killCommand: /root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570

      11:54:01.604 killCommand: /root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-14-140301-10.52.10.364 -l localhost-4 -p 5570

      11:54:02.259 killCommand: /root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570

      11:54:02.259

      11:54:02.259 common.failure():

      java.io.EOFException

      11:54:07.155 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      11:54:07.232 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      11:54:08.207 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      11:54:08.261 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      11:54:09.207 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      11:54:09.261 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      11:54:10.207 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      11:54:10.261 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      11:54:10.851 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-14-140301-10.52.10.364 -l localhost-4 -p 5570' completion

      java.io.EOFException

      11:54:14.742 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2571)

      11:54:16.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)

       

      Thanks,

      Priyanka

      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)

      11:54:16.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-14-140301-10.52.10.364 -l localhost-4 -p 5570' completion

      at Vdb.SlaveSocket.getMessage(SlaveSocket.java:148)

      at Vdb.SlaveOnMaster.processSlave(SlaveOnMaster.java:118)

      at Vdb.SlaveOnMaster.run(SlaveOnMaster.java:60)

      11:54:16.691 killCommand: /root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-11-140301-10.52.10.364 -l localhost-1 -p 5570

      11:54:16.842 killCommand: /root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-17-140301-10.52.10.364 -l localhost-7 -p 5570

      11:54:17.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      11:54:17.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-14-140301-10.52.10.364 -l localhost-4 -p 5570' completion

      11:54:17.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      11:54:18.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-14-140301-10.52.10.364 -l localhost-4 -p 5570' completion

      11:54:18.628 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-10-140301-10.52.10.364 -l localhost-0 -p 5570' completion

      11:54:18.679 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

      11:54:36.191 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-14-140301-10.52.10.364 -l localhost-4 -p 5570' completion

      11:54:55.836 Waiting for command1 '/root/.local/lib/python2.7/site-packages/validator/binaries/vdbench503/vdbench SlaveJvm -m localhost -n localhost-16-140301-10.52.10.364 -l localhost-6 -p 5570' completion

        • 1. Re: EOFException on vdbench 5.03
          Henk Vandenbergh-Oracle

          I doubt it will make a difference, but please use Vdbench50401: Vdbench Downloads

           

          This problem typically happens when you do not have enough memory to start, in your case 8,  Vdbench slaves.

           

          "Receiving unexpected EOFException from slave: localhost-2"

          You may also want to look at  file localhost-2.stdout.html, for possibly some more information. Please include that in your response.

           

          Henk.

          • 2. Re: EOFException on vdbench 5.03
            e20f5b36-bf17-40b3-af2a-9bd65e6204ad

            Thanks a lot for your response Henk! I can definitely try vdbench 5.04.

             

            Couple of more questions -

            1. When you say not enough memory to run 8 slaves, would you think it might be a better idea to maybe reduce the number of slaves to less than 8 and try? Or would you suggest I'd rather increase the RAM on my server to more than what I have? (16GB is what I have now)

            2. Which file does the EOFExcpetion indicate? Could you please help me understand that better?

             

            Also, here's what I find on localhost-2.stdout.html..

             

            10:52:18.006 Starting RD=rd1; I/O rate: Uncontrolled MAX; elapsed=3600; For loops: threads=4.0

             

            10:52:18.009 10:52:18.009 task_run_all(): 40 tasks

            10:52:18.025 10:52:18.020 Starting WG_task for sd19

            10:52:18.025 10:52:18.021 Starting WG_task for sd51

            10:52:18.025 10:52:18.023 Starting WG_task for sd35

            10:52:18.026 10:52:18.025 Starting WG_task for sd27

            10:52:18.026 10:52:18.026 Starting WG_task for sd3

            10:52:18.031 10:52:18.031 Starting WG_task for sd11

            10:52:18.092 10:52:18.091 Starting WG_task for sd43

            10:52:18.096 10:52:18.096 Starting WG_task for sd59

            11:52:18.312 11:52:18.311 Ending WG_task 2 for sd11

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd3

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd43

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd35

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd27

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd19

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd51

            11:52:18.336 11:52:18.312 Ending WG_task 2 for sd59

            11:52:18.412 11:52:18.412 Sent 32/32 interrupts to waiting IO_task threads

            11:52:18.427 11:52:18.426 Memory total Java heap: 254.750 MB; Free: 238.316 MB; Used: 16.434 MB;

            11:52:18.498 11:52:18.498 Maximum native memory allocation: 134217728; Current allocation: 0

            11:52:18.499 11:52:18.498 End of run

            11:52:18.499 11:52:18.498 **********

            11:52:18.499

            11:52:18.499

            11:52:18.839 11:52:18.839 Beginning of run setup

            11:52:18.839 11:52:18.839 **********************

            11:52:18.839

            11:52:18.839

            11:52:18.840 11:52:18.840 Opening sd=sd3,lun=/dev/sdbg; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.840 11:52:18.840 Opening sd=sd11,lun=/dev/sdbk; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.841 11:52:18.840 Opening sd=sd19,lun=/dev/sdaj; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.841 11:52:18.841 Opening sd=sd27,lun=/dev/sdac; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.841 11:52:18.841 Opening sd=sd35,lun=/dev/sdap; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.841 11:52:18.841 Opening sd=sd43,lun=/dev/sdw; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.841 11:52:18.841 Opening sd=sd51,lun=/dev/sdo; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.842 11:52:18.842 Opening sd=sd59,lun=/dev/sdg; write: true; OpenFlags: 0x00004000 OtherFlags: 0x00000000

            11:52:18.844 11:52:18.844 Started 8 Workload Generator threads.

            11:52:18.844 11:52:18.844 work.use_waiter: false

            11:52:18.845 11:52:18.845 createCompressionPattern() seed: 0 comp_ratio: 1.00 limit: 0.00

            11:53:09.927 11:53:08.677 Started 256 i/o threads for sd35

            11:53:38.952 Killed

             

            Thanks again for your time Henk!

            • 3. Re: EOFException on vdbench 5.03
              Henk Vandenbergh-Oracle

              It appears from the little I am seeing here that you have at least 60 SDs, and have asked for 256 threads per SD.

              That adds up to 60*256=15,360 threads.

              Each thread needs a minimum of one xfersize= for a READ buffer, and extra for a WRITE buffer if writes have been requested.

              So, if for instance you use xfersize=1m, the data buffer requirement alone will be 60*256*2*1m=30gb.

               

              Of course, I do not know what your requested xfersize is, but I am sure you can imagine that you may be asking too much from your system.

              With 16gb of memory running 8 slaves should not be a problem, but you may want to run some experiments:

              - lower SD count

              - lower thread count

              - lower xfersize

              - and as last experiment, if needed, lower slave count, adding 'hd=default,jvms=1' to the top of your parameter file.

               

              'EOFException' only indicates that a slave socket connection has disappeared, and has no further meaning.

               

              Henk.

              • 4. Re: EOFException on vdbench 5.03
                e20f5b36-bf17-40b3-af2a-9bd65e6204ad

                Sure, I'll try the experiments suggested. Thanks again for your help!

                • 5. Re: EOFException on vdbench 5.03
                  e20f5b36-bf17-40b3-af2a-9bd65e6204ad

                  Hi Henk,

                   

                  I have another situation, in extension to my previous thread, that I'd like to maybe get an opinion for.

                   

                  I have a server with say 24 drives of which say, I have 12 drives configured as volumes which vdbench recognizes as sd*. On the parameter file for running IOs on the volumes created with these 12 drives, I define which of the sd* I need to run IOs on. There obviously are more drives in my server which I'm not running any IOs to. Would vdbench really care about that? I've run into a situation where I seem to have an issue when I have these extra drives, vdbench seems to terminate. On the other hand, when I pull out all of these drives with no other than those I am defining my volumes to write IOs to, vdbench runs without any issues.

                   

                  Could you please share your thoughts about this? It would really help me understand the situation better. Thanks for your time and help!

                   

                  Regards,

                  Priyanka

                  • 6. Re: EOFException on vdbench 5.03
                    Henk Vandenbergh-Oracle

                    Please move this to a new thread, and then include your parameter file.

                    "I seem to have an issue", what do you mean with that?

                     

                    Henk.