14 Replies Latest reply: Sep 7, 2014 2:34 AM by MichR RSS

    "Eye catchers still visible after successful read. Returning error 797"

    MichR

      Test was run with vdbench 50401.

      test run for approx 40 hours before one of the vms produced this error:

       

      What does "Eye catchers still visible after successful read" mean?

      Google never heard of it as well unfortunately.

       

      error 797 is "797 BAD_READ_RETURN Read was successful, but data buffer contents not changed"

      Can you explain what does vdbench mean by data buffer?

       

       

      This is the parmfile:

      ---------------------------------------------------------------------------------------------

      data_errors=1

      debug=25

       

      hd=default,vdbench=/opt/vdbench,user=root

      shell=ssh

       

      hd=vm-vdbench-01,system=vm-vdbench-01

      hd=vm-vdbench-02,system=vm-vdbench-02

      hd=vm-vdbench-03,system=vm-vdbench-03

      hd=vm-vdbench-04,system=vm-vdbench-04

      hd=vm-vdbench-05,system=vm-vdbench-05

      hd=vm-vdbench-06,system=vm-vdbench-06

      hd=vm-vdbench-07,system=vm-vdbench-07

      hd=vm-vdbench-08,system=vm-vdbench-08

      hd=vm-vdbench-09,system=vm-vdbench-09

      hd=vm-vdbench-10,system=vm-vdbench-10

      hd=vm-vdbench-11,system=vm-vdbench-11

      hd=vm-vdbench-12,system=vm-vdbench-12

      hd=vm-vdbench-13,system=vm-vdbench-13

      hd=vm-vdbench-14,system=vm-vdbench-14

      hd=vm-vdbench-15,system=vm-vdbench-15

      hd=vm-vdbench-16,system=vm-vdbench-16

       

      sd=sd1,host=*,lun=/dev/sdb,openflags=o_direct

       

      wd=wd1,sd=sd*,seekpct=random,rdpct=50,xfersize=64k

       

      rd=rd1,wd=wd1,elapsed=200000,interval=2,iorate=max,threads=(80)

      ----------------------------------------------------------------------------------------------

       

      Thank you in advance.

        • 1. Re: "Eye catchers still visible after successful read. Returning error 797"
          Henk Vandenbergh-Oracle

          Before a read is requested Vdbench places a fixed value at the beginning and end of the data buffer:

          0xdeadbbbb 0xdeadbbbb ........ 0xdeadeeee 0xdeadeeee

           

          After the OS returns 'successful' from the read request Vdbench checks to make sure that these fixed values are gone.

          If they are still there it means that somewhere in the i/o path you have been lied to: the i/o did NOT complete successfully.

           

          Henk.

          • 2. Re: "Eye catchers still visible after successful read. Returning error 797"
            MichR

            Thank you for the response.

             

            Is there any way to trace the problem with the vdbench logs?

            To at least get a general idea of where the lying begun?

            • 3. Re: "Eye catchers still visible after successful read. Returning error 797"
              Henk Vandenbergh-Oracle

              Here at Oracle in lab settings, and also at an other storage company where we had the 797 problem it happened during/after some kind of failover testing.

              Vdbench then of course shows some huge drop in iops during failover. Other than that, Vdbench has no information about what happens behind the (Wizard of Ozz ) curtain. I understand that both these cases were due to problems in a device driver.

              One suggestion I would give in the Solaris world is to use Speculative Tracing using Dtrace, I do not know if Linux has tools like this.

              This is the best I can offer.

               

              Henk.

              • 5. Re: "Eye catchers still visible after successful read. Returning error 797"
                2680354

                Henk,

                Glad to see we already have this error analyzed here. This was helpful.

                 

                I remember that in case we catch data validation errors, the tool would print details of 'Expected' vs 'Read data'. Would it be useful to print these values for 797 error too ?

                 

                Thanks,

                Pranav

                • 6. Re: "Eye catchers still visible after successful read. Returning error 797"
                  Henk Vandenbergh-Oracle

                  The 797 check is actually not a Data Validation check, it is done for every single read operation, with or without DV being active.

                  The code where this check is done is not even aware of DV, so it did not appear worth the effort to make the code changes needed.

                  Reporting 'read' versus 'expected' data caused serious confusion, letting users think that "we read the wrong data". We did not read the wrong data, we did not read at all, with the emphasis being "the OS and/or the storage lied to you", not "we read the wrong data".

                  THAT by the way is the main reason I introduced the 797 check, and the fact that the 797 also was done and worked without DV being active was an accidental and very lucky side effect.

                   

                  Henk.

                  • 7. Re: "Eye catchers still visible after successful read. Returning error 797"
                    2680354

                    Henk,

                    Appreciate your response.

                     

                    I see that the vdbench tool still complaints about 797 error:

                    --CUT--

                    user@ubuntu-vm:~/vdbench$ sudo ./vdbench print /dev/sdb 50039488512 65536

                    Device: /dev/sdb; lba: 0xba6960000

                    18:23:13.625 Eye catchers still visible after successful read. Returning error 797

                    18:23:13.631

                    18:23:13.631 File_handles.findHandle(): unknown handle: 9

                    18:23:13.631

                    java.lang.RuntimeException: File_handles.findHandle(): unknown handle: 9

                      at Vdb.common.failure(common.java:308)

                      at Vdb.File_handles.findHandle(File_handles.java:69)

                      at Vdb.IO_task.io_error_report(IO_task.java:174)

                      at Vdb.Native.read(Native Method)

                      at Vdb.Native.readFile(Native.java:195)

                      at Vdb.Native.readFile(Native.java:186)

                      at Vdb.PrintBlock.printit(PrintBlock.java:119)

                      at Vdb.PrintBlock.print(PrintBlock.java:92)

                      at Vdb.Vdbmain.anySpecialTricks(Vdbmain.java:926)

                      at Vdb.Vdbmain.main(Vdbmain.java:520)

                    user@ubuntu-vm:~/vdbench$

                    --CUT--

                     

                     

                     

                     

                     

                    I see that 'dd' reads same lba and returns "eeee dead" pattern at the end, which seems to cause the 797 error.

                     

                     

                    --CUT--

                    user@ubuntu-vm:~/vdbench$ sudo dd if=/dev/sdb of=./l bs=65536 count=1 skip=763542

                    1+0 records in

                    1+0 records out

                    65536 bytes (66 kB) copied, 0.00103593 s, 63.3 MB/s

                    user@ubuntu-vm:~/vdbench$ ls -l l

                    -rw-r--r-- 1 root root 65536 2014-05-27 18:23 l

                    user@ubuntu-vm:~/vdbench$ hexdump l | tail

                    000ff70 4c0d 3f38 f6de 8bfa 7b6f 45fd 6d6f b698

                    000ff80 e66f cf2a a3ef f3f3 012f ed9f 504f e2a9

                    000ff90 78ff e532 eca7 e6ff 268b e719 439d e7ea

                    000ffa0 f116 e793 f88b 73c9 2c9d ad82 c696 c2a7

                    000ffb0 e34b 6153 217d a4cf 4066 c601 a033 6300

                    000ffc0 80c1 a5e6 90b8 c695 c85c 634a 642e 31a5

                    000ffd0 b217 18d2 89d3 980f 1431 d861 5ac0 f856

                    000ffe0 2d60 7c2b 96b0 3e15 cb58 1f0a 65ac 0f85

                    000fff0 b2d6 07c2 596b 03e1 7c6d 9596 eeee dead

                    0010000

                    user@ubuntu-vm:~/vdbench$ hexdump l | head

                    0000000 000b 0000 0000 a696 fa23 0004 259d 13ab

                    0000010 0000 1ca1 6473 2031 2020 2020 0000 0000

                    0000020 5858 15e0 2c2c 0af0 1616 0578 0b0b 02bc

                    0000030 d55d 9538 ba76 defa 5d3b 6f7d 7e45 a3d8

                    0000040 effa c58a 77fd 62c5 6b26 a504 3593 5282

                    0000050 ca11 bd27 35d0 caf5 9ae8 657a 4d74 32bd

                    0000060 a6ba 195e 535d 0caf 7976 9231 bcbb 4918

                    0000070 8e85 b0ea 979a cc13 cbcd 6609 353e a762

                    0000080 1a9f 53b1 5d97 bdbe fe13 cab9 2fd1 f13a

                    0000090 c730 ecfb e398 767d f1cc 3b3e 78e6 1d9f

                    user@ubuntu-vm:~/vdbench$

                    --CUT--

                     

                     

                     

                    Just curious to know if the read failure here could be a false alarm as 0xdeadeeee is actually the last pattern in this LBA and that is what read buffers gets populated with?

                    • 8. Re: "Eye catchers still visible after successful read. Returning error 797"
                      Henk Vandenbergh-Oracle

                      Interesting: there indeed appears to be a REAL 0xdeadeeee at the end of the data pattern, something that I did not think was statistically likely to happen. WRONG!

                      (it took several billion i/o's to get this to happen).

                      My mistake.

                       

                      I now may have to EXTEND the amount of fixed values or, preferably, use DIFFERENT values each time instead of fixed values.

                      Good find.

                       

                      I'll get back to the code tomorrow.

                       

                      Henk.

                      • 9. Re: "Eye catchers still visible after successful read. Returning error 797"
                        2680354

                        Wow... looks like we got (un)lucky... what are the chances for this to happen... May be we should never write "dead" pattern, even when it get generated randomly...

                         

                        When you do fix it, let us know the files and changes, so I can put them in and recompile a build from source.

                         

                        Thanks Henk.

                        • 10. Re: "Eye catchers still visible after successful read. Returning error 797"
                          2680354

                          Hi Henk,

                           

                          From the vdbench user guide, we see that the way data is generated is very random and it looks like there is no pattern to it.

                          "

                          This is how the data pattern is generated:

                          1. A one-megabyte minimum data pattern buffer is filled with random numbers. This is done

                          only once.

                          2. A number of random 32-bit words will be replaced with zeroes, this depending on the

                          compratio= parameter used. The number of zeros is based on experiments done using the

                          LZJB compression algorithm; compression ratios 1:1 through 25:1 are the only ones

                          implemented; any ratio larger than 25:1 will be set to 25.

                          3. Data starting at the pattern buffer + (remainder of the Logical Byte Address (lba) divided by

                          the length of the pattern buffer) is copied into the data buffer. The copy wraps around to the

                          beginning of the pattern buffer if needed. This takes care of the requested compression ratio.

                          4. The first 8 bytes of each 4k portion of the data buffer is overlaid by a combination of lba, 4k

                          offset, file handle, and the lower four bytes of the system's High Resolution Timer (HRT).

                          This is done to make sure that the data can not be deduped. See also the dedupratio=

                          parameter.

                          "

                           

                          Considering that the probability of find a pattern of 'eeee dead' is very (very) rare, I just needed to check if during write operations do we ever make use of this pattern ? If no, then how did we end up with this pattern.

                           

                          Any clues to this will be helpful.

                           

                          Thanks.

                          • 11. Re: "Eye catchers still visible after successful read. Returning error 797"
                            Henk Vandenbergh-Oracle

                            Vdbench uses among others ''Linear Feedback Shift Register' logic to generate data patterns using as a seed:

                            - the SD name

                            - The Data Validation Key

                            - the lba

                            - and some polynomial coefficients given to me by someone who is a heck of a lot smarter than I am.

                            See source file 'vdb_dv.c'

                             

                            It is therefore technically possible to come up with ANY data pattern, as we found with this false 797.

                             

                            The next version of Vdbench will have this deadbbbb and deadeeee expanded from 32bits to 64bits, with the first 32 bits containing the CURRENT buffer address. This should dramatically decrease the chance of ever getting a false 797 again, though of course, it still won't be absolute zero.

                            • 12. Re: "Eye catchers still visible after successful read. Returning error 797"
                              MichR

                              Im guessing next version is the one after 50401 right ?

                              We are getting this error quite often ,after runs of 2 hours as well as runs of several days with VDBench.

                               

                              Any idea when you will be releasing the version?

                              • 13. Re: "Eye catchers still visible after successful read. Returning error 797"
                                Henk Vandenbergh-Oracle

                                Thanks for reminding me. I built the 'final' vdbench50402 two weeks ago and then of course priorities changed.

                                 

                                Let me double check things a bit and then I'll pass it on to those responsible for putting it on OTN. I can not promise you an arrival time.

                                 

                                Henk.

                                • 14. Re: "Eye catchers still visible after successful read. Returning error 797"
                                  MichR

                                  Understood.

                                   

                                  As long as ive reminded you , ive done all i can do