8 Replies Latest reply: Mar 4, 2012 2:30 PM by 900070 RSS

    Incredibly poor zfs recv performance

    900070
      While doing my backups, I ran into a case of zfs recv performance not meeting my expectations. This is on Solaris 11, zfs version 33

      On my primary server I had a set of snapshots, taken after periods of intense activity and updating:


      storage/avmedia@201006140624 0 - 8.81G -
      storage/avmedia@201006191633 0 - 8.81G -
      storage/avmedia@201007201232 0 - 8.81G -
      storage/avmedia@201112111350 33K - 1.33T -
      storage/avmedia@201112150024 33K - 1.33T -
      storage/home@200903241123 21.7M - 425G -
      storage/home@200903241355 5.96M - 425G -
      storage/home@200906292018 498M - 407G -
      storage/home@201004171421 72.0M - 421G -
      storage/home@201004190940 74.5M - 421G -
      storage/home@201006140624 1.30G - 1.05T -
      storage/home@201006191633 1.22G - 1.05T -
      storage/home@201007201232 8.70G - 1.08T -
      storage/home@201112111350 47K - 1.29T -
      storage/home@201112150024 47K - 1.29T -


      Copying all the storage/avmedia snapshots over the network to the backup server took just under a day. I use :

      zfs send -i <snap1> <snap2> | ssh <user>@<host> zfs recv -d <zfs name>

      to perform the backup.

      The next set of snapshots, from storage/home took less than a day, until storage/home@201007201232 completed. Since then, it has already been working for three days to copy the storage/home@201112111350 snapshot, with no end in sight. In fact, due to the poor performance, I actually did a local dump:

      zfs send -i <snap1> <snap2> | compress - > snap1_2.Z

      This local dump completed in 14 hours. Then, I copied this 488GB file to the backup host and proceeded to restore it using zcat snap1_2.Z | zfs recv -d<zfs name>

      Since then, the typical zfs iostat behavior shows:


      root@snapshot:/storage/mbuffer# zpool iostat 60
      capacity operations bandwidth
      pool alloc free read write read write
      rpool 22.7G 349G 0 1 26.4K 939K
      storage 3.18T 2.26T 23 530 1.09M 9.71M
      rpool 22.7G 349G 0 0 1.62K 0
      storage 3.18T 2.26T 292 285 14.2M 4.42M
      rpool 22.7G 349G 0 0 1.62K 0
      storage 3.18T 2.26T 287 282 14.2M 4.10M
      rpool 22.7G 349G 0 0 1.62K 0
      storage 3.18T 2.26T 279 278 13.2M 4.06M


      This has been going on for hours now - if it is indeed moving forward at a rate of 13.2M read and 4.06M write a minute, I'm in for a long wait.

      edit: it just completed:

      ls -% all

      zfs-increment.Z
      timestamp: atime Dec 20 17:17:44 2011
      timestamp: ctime Dec 18 23:24:09 2011
      timestamp: mtime Dec 18 23:24:09 2011
      timestamp: crtime Dec 18 09:58:39 2011

      The zfs send started Dec 18th at 09:58, and the zcat stopped on Dec 20th at 17:17 - 2 days, 7 hours, 19 minutes for 488GB of backup data -> 2.5 MB/s, substantially below the capacity of the drives and controller.


      Can anybody shed some light on this behavior? Could it be this slow because the volume has lots of little files that all need to be checked? Is there a way to monitor the progress of zfs recv?

      Kind regards.

      B.

      Edited by: Bones on Dec 19, 2011 1:54 PM

      Edited by: Bones on Dec 20, 2011 8:40 AM
        • 1. Re: Incredibly poor zfs recv performance
          budachst
          There are a number of possible causes for the zfs recv to become slow, the first ones coming to my mind are:

          - receiving zpool is more then 80% full
          - dedupe

          Can you post the layout of your zpools? And btw, for initial zfs send/recv I am always using mbuffer which outperforms zfs send over ssh by at least 100%.
          You would first have to check whether you can read fast enough from your source zpool - maybe by using something like this:

          ifs send <fs@snapshot> | pv | dd of=/dev/null bs=1024k

          This should give you at least a feeling about how your source zpool performs.
          • 2. Re: Incredibly poor zfs recv performance
            Mgerdts-Oracle
            You may be seeing a manifestation of:

            7060894 zfs recv is excruciatingly slow

            When the recv is running slowly, what does the following say:

            # dtrace -n 'BEGIN fbt::txg_wait_synced:entry {i++} tick-1s {printf("%d", i); i=0}'
            • 3. Re: Incredibly poor zfs recv performance
              900070
              sudo dtrace -n 'BEGIN fbt::txg_wait_synced:entry {i++} tick-1s {printf("%d", i); i=0}'
              Password:
              dtrace: description 'BEGIN ' matched 3 probes
              CPU ID FUNCTION:NAME
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 0
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1
              3 69280 :tick-1s 1


              Could you enlighten me a bit more on what these Dtrace probes report on?
              • 4. Re: Incredibly poor zfs recv performance
                900070
                Hi,

                my apologies for the layout, I also had to fight a bad disks.

                the zpool layout for the sending system:


                zpool status
                pool: rpool
                state: ONLINE
                scan: scrub repaired 6.50K in 0h5m with 0 errors on Mon Dec 19 21:45:56 2011
                config:

                NAME STATE READ WRITE CKSUM
                rpool ONLINE 0 0 0
                mirror-0 ONLINE 0 0 0
                c10d1s0 ONLINE 0 0 0
                c10d0s0 ONLINE 0 0 0

                errors: No known data errors

                pool: storage
                state: ONLINE
                scan: resilvered 1.46T in 15h23m with 0 errors on Sun Jan 29 05:21:33 2012
                config:

                NAME STATE READ WRITE CKSUM
                storage ONLINE 0 0 0
                mirror-0 ONLINE 0 0 0
                c11t2d0p0 ONLINE 0 0 0
                c11t3d0p0 ONLINE 0 0 0
                mirror-1 ONLINE 0 0 0
                c11t0d0p0 ONLINE 0 0 0
                c11t1d0p0 ONLINE 0 0 0


                And the layout for the receiving system:


                zpool status -v
                pool: rpool
                state: ONLINE
                scan: resilvered 22.7G in 0h12m with 0 errors on Sat Jan 21 23:29:50 2012
                config:

                NAME STATE READ WRITE CKSUM
                rpool ONLINE 0 0 0
                mirror-0 ONLINE 0 0 0
                c8t0d0s0 ONLINE 0 0 0
                c8t1d0s0 ONLINE 0 0 0

                errors: No known data errors

                pool: storage
                state: ONLINE
                scan: none requested
                config:

                NAME STATE READ WRITE CKSUM
                storage ONLINE 0 0 0
                mirror-0 ONLINE 0 0 0
                c8t2d0 ONLINE 0 0 0
                c8t4d0 ONLINE 0 0 0
                mirror-1 ONLINE 0 0 0
                c8t3d0 ONLINE 0 0 0
                c8t5d0 ONLINE 0 0 0

                errors: No known data errors
                • 5. Re: Incredibly poor zfs recv performance
                  Mgerdts-Oracle
                  The dtrace output is showing that every 1 - 2 seconds, the pool is being forced to commit a transaction group. When a transaction group is committed, the data that has not been flushed to disk yet is flushed to disk and then a flurry of other activity happens to update pool metadata so that the on-disk state reflects that this committed data is in place. Because pool metadata is intentionally stored in physically distant parts of the disk, the metadata sync introduces more random I/O than receiving a stream would normally require. The frequent syncing of transaction groups makes it so that you are spending more time updating pool metadata than you receiving the data from the stream. Ideally, you would be receiving for many seconds between transaction groups and you would be seeing many tens of megabytes of write throughput to each disk.

                  I generally find the output of 'zpool iostat' to be useless in understanding whether zfs is working well. Instead, I use "iostat -xzn". If you find that your disks are doing 200+ reads + writes and only pushing a few megabytes of data, the I/O pattern is limiting the throughput. Typical disks can do about 200 random I/O operations per second (iops = reads + writes). They can often do much better than that for sequential iops, as the head movement takes far less time.

                  The cause of:

                  7060894 zfs recv is excruciatingly slow

                  is that the receive process is forcing transaction groups to be committed too frequently. This forces the receive process into a random I/O pattern with lots of small writes.

                  If getting a fix for this somewhat soon is important to you, I'd highly suggest opening a support call. In the context of that call, you need to ask that an escalation be filed so that the fix can be made available as an IDR and/or in an upcoming SRU to Solaris 11.
                  • 6. Re: Incredibly poor zfs recv performance
                    900070
                    Your description looks right on the mark, as demonstrated by:

                    iostat -xnz 5

                    <snip>
                    extended device statistics
                    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                    124.2 33.2 1087.5 156.7 1.4 3.6 8.9 22.6 47 54 c8t2d0
                    122.8 39.2 1333.0 242.8 6.9 0.8 42.8 4.9 80 80 c8t3d0
                    131.8 32.6 784.9 156.7 1.3 3.4 7.9 20.4 44 51 c8t4d0
                    115.2 42.2 759.9 234.7 1.0 3.2 6.1 20.1 38 46 c8t5d0
                    extended device statistics
                    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                    130.2 58.2 1377.6 537.7 2.4 2.7 12.7 14.4 50 56 c8t2d0
                    124.0 51.2 1410.4 527.9 6.6 0.8 38.0 4.5 77 79 c8t3d0
                    124.0 56.0 868.2 537.7 2.2 2.6 12.4 14.4 47 54 c8t4d0
                    113.6 53.6 779.7 528.2 1.6 2.8 9.4 16.9 41 51 c8t5d0
                    extended device statistics
                    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                    116.8 78.4 1035.9 1026.5 2.2 2.4 11.3 12.4 45 50 c8t2d0
                    116.2 69.8 1259.6 973.8 6.7 0.8 36.1 4.2 77 78 c8t3d0
                    121.0 78.2 839.2 1026.5 2.3 2.5 11.8 12.8 48 53 c8t4d0
                    105.4 77.4 692.4 975.9 1.6 2.7 8.7 14.7 41 47 c8t5d0
                    extended device statistics
                    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                    136.0 41.4 1210.2 451.8 3.3 2.5 18.7 14.0 58 62 c8t2d0
                    138.4 39.0 1459.9 379.5 4.8 2.9 26.9 16.3 55 80 c8t3d0
                    128.2 42.6 911.0 481.5 3.0 2.5 17.4 14.9 54 60 c8t4d0
                    128.8 43.2 923.3 377.1 2.4 3.2 13.9 18.8 54 65 c8t5d0
                    extended device statistics
                    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                    177.6 61.2 1762.9 1038.9 2.4 5.0 10.0 21.0 62 74 c8t2d0
                    175.2 69.4 1750.3 969.7 1.5 5.4 6.1 21.9 50 72 c8t3d0
                    160.8 63.0 1203.5 1009.2 2.3 5.1 10.1 23.0 63 73 c8t4d0
                    138.6 70.4 1071.8 969.7 1.7 4.7 8.2 22.3 53 64 c8t5d0

                    I'm not ready to open a support call for what is my home backup-server. I was wondering if using part of my SSD boot-drives as intent log could help ease the problem?
                    • 7. Re: Incredibly poor zfs recv performance
                      900070
                      Some further weirdness. While doing the incremental zfs send/recv update, I am seeing unequal loading on the four storage devices that comprise my zpool.
                      In contrast to previous posts, I have destroyed and recreated my zpool with a RAID controller (Adapted 5405) with battery backed up cache, and two SSD log devices:


                      pool: backup
                      state: ONLINE
                      scan: none requested
                      config:

                      NAME STATE READ WRITE CKSUM
                      backup ONLINE 0 0 0
                      mirror-0 ONLINE 0 0 0
                      c11t64d0 ONLINE 0 0 0
                      c11t66d0 ONLINE 0 0 0
                      mirror-1 ONLINE 0 0 0
                      c11t65d0 ONLINE 0 0 0
                      c11t67d0 ONLINE 0 0 0
                      logs
                      mirror-2 ONLINE 0 0 0
                      c8t2d0p1 ONLINE 0 0 0
                      c8t3d0p1 ONLINE 0 0 0

                      errors: No known data errors

                      c11 is the Adaptec raid controller, cache set to write-back, battery working and fully charged. All drives are brand new Hitachi 3Tb drives (5200rpm).

                      When looking at the iostat I see the following:

                      iostat -xnz 5
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.0 0.0 1.6 0.4 0.0 0.0 2.8 1.8 0 0 c8t0d0
                      0.0 0.0 0.8 0.4 0.0 0.0 3.0 2.0 0 0 c8t1d0
                      0.3 90.3 1.1 2702.7 0.0 0.9 0.0 9.8 0 15 c11t64d0
                      0.3 88.7 1.1 2705.4 0.0 0.5 0.0 5.1 0 7 c11t65d0
                      0.3 90.2 1.1 2702.7 0.0 0.8 0.0 9.3 0 14 c11t66d0
                      0.3 88.2 1.1 2705.4 0.0 2.3 0.0 25.9 0 40 c11t67d0
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.0 65.3 0.0 421.5 0.0 0.1 0.0 1.1 0 1 c11t64d0
                      0.4 53.5 0.7 311.4 0.0 0.1 0.0 1.2 0 2 c11t65d0
                      0.0 65.9 0.0 421.5 0.0 0.1 0.0 1.1 0 1 c11t66d0
                      0.0 50.9 0.0 341.7 0.0 4.1 0.0 81.1 0 98 c11t67d0
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.0 45.2 0.0 177.4 0.0 0.0 0.0 1.0 0 1 c11t64d0
                      0.0 48.6 0.0 270.4 0.0 0.0 0.0 1.0 0 1 c11t65d0
                      0.0 45.4 0.0 177.4 0.0 0.0 0.0 1.0 0 1 c11t66d0
                      0.2 48.2 0.4 273.3 0.0 4.6 0.0 94.1 0 100 c11t67d0
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.2 62.0 0.4 299.7 0.0 0.1 0.0 1.1 0 2 c11t64d0
                      0.2 53.8 0.4 195.0 0.0 0.1 0.0 1.0 0 2 c11t65d0
                      0.0 62.0 0.0 299.7 0.0 0.1 0.0 1.0 0 1 c11t66d0
                      0.0 51.4 0.0 192.5 0.0 3.9 0.0 76.3 0 98 c11t67d0
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.2 39.0 0.3 293.0 0.0 0.0 0.0 1.2 0 2 c11t64d0
                      0.0 47.6 0.0 362.3 0.0 0.1 0.0 1.1 0 1 c11t65d0
                      0.0 38.8 0.0 293.0 0.0 0.0 0.0 1.1 0 1 c11t66d0
                      0.0 46.8 0.0 367.1 0.0 4.9 0.0 105.5 0 98 c11t67d0
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.0 45.8 0.0 305.8 0.0 0.0 0.0 1.0 0 1 c11t64d0
                      0.2 49.8 0.3 328.5 0.0 0.1 0.0 1.2 0 2 c11t65d0
                      0.0 45.2 0.0 305.8 0.0 0.0 0.0 1.0 0 1 c11t66d0
                      0.2 47.6 0.4 318.3 0.0 4.5 0.0 94.6 0 99 c11t67d0
                      extended device statistics
                      r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
                      0.0 52.6 0.0 315.5 0.0 0.1 0.0 1.1 0 1 c11t64d0
                      0.2 46.2 0.3 218.7 0.0 0.1 0.0 1.2 0 2 c11t65d0
                      0.0 52.6 0.0 315.5 0.0 0.1 0.0 1.1 0 1 c11t66d0
                      0.0 47.6 0.0 228.5 0.0 4.5 0.0 94.6 0 99 c11t67d0


                      At first I thought this was a drive problem, but everything checks out, Even changing the disks around, between the previous and current creation of the zpool does not change the behavior - it seems to be inherent to ZFS. How is this possible? I would assume that under mirror/stripe conditions disk usage would be pretty much equal for all devices.
                      • 8. Re: Incredibly poor zfs recv performance
                        900070
                        Some more statistics:

                        summary: 413 GiByte in 114 h 31 min 1051 KiB/s, using a Sun Storagetek internal SAS raid HBA, with 256MB Cache, no battery backup.

                        summary: 377 GiByte in 159 h 53 min 687 KiB/s, idem - this is the next snapshot

                        Edited by: Bones on Mar 4, 2012 12:30 PM