Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Solaris 11, 11.1, 11.2 and 11.3: Possible Bug in Time-Slider/ZFS Auto-Snapshot

Steffen MoserFeb 28 2016 — edited Jun 22 2016

Hi all,

I think that I encountered buggy behavior of the ZFS auto-snapshot feature in Solaris 11, 11.1, 11.2 and 11.3. Let's assume, we have the following ZFS datasets on a freshly installed test machine (running Solaris 11.3):

steffen@solaris:~$ zfs list

NAME                                      USED  AVAIL  REFER  MOUNTPOINT

rpool                                    9,48G  21,5G  4,85M  /rpool

rpool/ROOT                               5,67G  21,5G    31K  legacy

rpool/ROOT/solaris                       5,67G  21,5G  4,70G  /

rpool/ROOT/solaris/var                    854M  21,5G   710M  /var

rpool/VARSHARE                            246K  21,5G    47K  /var/share

rpool/VARSHARE/pkg                         81K  21,5G    32K  /var/share/pkg

rpool/VARSHARE/pkg/repositories            31K  21,5G    31K  /var/share/pkg/repositories

rpool/VARSHARE/zones                       31K  21,5G    31K  /system/zones

rpool/dump                               2,06G  21,6G  2,00G  -

rpool/export                             5,77M  21,5G    32K  /export

rpool/export/home                        5,72M  21,5G    32K  /export/home

rpool/export/home/steffen                5,67M  21,5G  5,33M  /export/home/steffen

rpool/swap                               1,03G  21,6G  1,00G  -

rpool/zones                               724M  21,5G    32K  /zones

rpool/zones/test                          724M  21,5G    32K  /zones/test

rpool/zones/test/rpool                    724M  21,5G    31K  /rpool

rpool/zones/test/rpool/ROOT               724M  21,5G    31K  legacy

rpool/zones/test/rpool/ROOT/solaris       724M  21,5G   667M  /zones/test/root

rpool/zones/test/rpool/ROOT/solaris/var  49,7M  21,5G  48,7M  /zones/test/root/var

rpool/zones/test/rpool/VARSHARE            38K  21,5G    38K  /var/share

rpool/zones/test/rpool/export              62K  21,5G    31K  /export

rpool/zones/test/rpool/export/home         31K  21,5G    31K  /export/home

When activating the auto-snapshot feature, we will find that the following datasets will get auto-snapshotted correctly:

steffen@solaris:~$ zfs list -t snapshot|grep zfs-auto-snap

rpool@zfs-auto-snap_frequent-2016-02-29-03h01                                          0      -  4,85M  -

rpool/ROOT/solaris/var@zfs-auto-snap_frequent-2016-02-29-03h01                         0      -  1,07G  -

rpool/VARSHARE/pkg/repositories@zfs-auto-snap_frequent-2016-02-29-03h01                0      -    31K  -

rpool/VARSHARE/zones@zfs-auto-snap_frequent-2016-02-29-03h01                           0      -    31K  -

rpool/export/home/steffen@zfs-auto-snap_frequent-2016-02-29-03h01                      0      -  5,33M  -

rpool/zones/test/rpool/ROOT/solaris/var@zfs-auto-snap_frequent-2016-02-29-03h01        0      -   239M  -

rpool/zones/test/rpool/VARSHARE@zfs-auto-snap_frequent-2016-02-29-03h01                0      -    38K  -

rpool/zones/test/rpool/export/home@zfs-auto-snap_frequent-2016-02-29-03h01             0      -    31K  -

But there are other datasets which will never get an auto-snapshot - completely independent from what is modified on that dataset:

rpool/ROOT

rpool/ROOT/solaris

rpool/VARSHARE

rpool/dump

rpool/export

rpool/swap

rpool/zones

rpool/zones/test

rpool/zones/test/rpool

rpool/zones/test/rpool/ROOT

rpool/zones/test/rpool/ROOT/solaris

rpool/zones/test/rpool/export

For example, if I added, deleted or modified files or directories at "rpool/ROOT/solaris" (which is mounted as "/" in the global zone) or at "rpool/zones/test/rpool/ROOT/solaris" (which is mounted as "/" in the zone "test"), I would expect that there will be snapshots for these datasets, too -- but they are never created. While it is, of course, obvious for "rpool/dump" and "rpool/swap", such a behavior shouldn't occur for the other datasets (e.g. "rpool/zones/test/rpool/ROOT/solaris") as the ZFS properties are set to enable auto-snapshots:


NAME                             PROPERTY           VALUE  SOURCE

rpool/zones/test/rpool/ROOT/solaris  com.sun:auto-snapshot  true   inherited from rpool

There is neither a problem to snapshot them manually nor to get them snapshotted when updating the system and creating a new boot environment. So it seems that it is not a problem of ZFS or the ZFS snapshot feature in general, but has rather to do with the auto-snapshot feature of time-slider.

The problem occurs on all servers I am working with and it occurs in Solaris 11, 11.1, 11.2 and 11.3.

To track this down, I made the python script

/usr/share/time-slider/lib/time_slider/zfs.py

a bit more verbose by adding several debug outputs. It shows the following behavior which presumably explains the whole problem:

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool/export/home

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool/export/home

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/export/home@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool/export

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool/export

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/export@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool/VARSHARE

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool/VARSHARE

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/VARSHARE@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool/ROOT/solaris/var

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool/ROOT/solaris/var

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/ROOT/solaris/var@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool/ROOT/solaris

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool/ROOT/solaris

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/ROOT/solaris@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool/ROOT

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool/ROOT

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/ROOT@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test/rpool

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test/rpool

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones/test

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones/test

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/zones

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/zones

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/export/home/steffen

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/export/home/steffen

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/export/home/steffen@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/export/home

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/export/home

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/export/home@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/export

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/export

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/export@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/VARSHARE/zones

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/VARSHARE/zones

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/VARSHARE/zones@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/VARSHARE/pkg/repositories

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/VARSHARE/pkg/repositories

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/VARSHARE/pkg/repositories@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/VARSHARE/pkg

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/VARSHARE/pkg

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/VARSHARE/pkg@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/VARSHARE

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/VARSHARE

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/VARSHARE@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/ROOT/solaris/var

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/ROOT/solaris/var

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/ROOT/solaris/var@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] in finalrecursive: rpool/ROOT

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool/ROOT

Feb 28 03:13:10 solaris time-sliderd: [ID 702911 daemon.notice] util.run_command():  /usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/ROOT@zfs-auto-snap_daily-2016-02-28-03h13

Feb 28 03:13:11 solaris time-sliderd: [ID 702911 daemon.notice] in single: rpool

Feb 28 03:13:11 solaris time-sliderd: [ID 702911 daemon.notice] create_snapshot: rpool

The lines with "util.run_command()" show the commands which are fired by time-slider to create the auto-snapshots. To make it more clearly, I'll pick the two bold-highlighted lines of the above log.

What happens is: At first, a snapshot is created this way:

/usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/ROOT/solaris/var@zfs-auto-snap_daily-2016-02-28-03h13


Immediately after this snapshot command, another command is executed:


/usr/bin/pfexec /usr/sbin/zfs snapshot -r rpool/zones/test/rpool/ROOT/solaris@zfs-auto-snap_daily-2016-02-28-03h13


The second command fails because "-r" tells "zfs snapshot" to create snapshots for all sub-datasets of "rpool/zones/test/rpool/ROOT/solaris". As "rpool/zones/test/rpool/ROOT/solaris/var" is a sub-dataset of "rpool/zones/test/rpool/ROOT/solaris" and a snapshot for "rpool/zones/test/rpool/ROOT/solaris/var" has already been created before using the same snapshot tag, the "zfs" tool quits with the message:

cannot create snapshot 'rpool/zones/test/rpool/ROOT/solaris/var@zfs-auto-snap_daily-2016-02-28-03h13': snapshot already exists

no snapshots were created

For this reason, auto-snapshots only seem to work for ZFS datasets which don't have child datasets. Only the leaves of the tree (and the root) actually get auto-snapshotted, but branches in-between won't.

In my opinion, this behavior is not documented this way and therefore should not occur. People who rely on auto-snapshots might get disappointed when they see that snapshots they have been expecting have actually never created. Especially in times were cryptographic ransom-ware viruses and Trojan horses are in the wild which attack Windows systems (and encrypt even possibly their server-stored backups), it is important to have a reliably working auto-snapshot feature on the server (presuming that Solaris is often used to provide data storage services to Windows clients). Independently from the usage scenario, a bug like this should be fixed as soon as possible.

My question to you is: Is this problem known? Does anybody else observe the problem as well? Did I do or understand anything in a wrong way?

Thank you very much in advance for any helpful replies!

Best regards,

Steffen

Comments

Dave Miner-Oracle

This is in fact a known bug, 15775093.  You may wish to open a support request regarding it.

3262458

Hello Steffen,

Since Solaris 11.1, the auto-snapshot feature is buggy, the only thing that is working correctly is the "frequent auto-snapshot" service.

If we enable hourly, daily, ..., the plugin zfs-send crashes after some snapshots and snapshots get stay in "hold state".

But time-slider continues working, so if we don't have a look, we think everything is ok. And it's not.

I found that in the zfs.py script, on line 117 the command is not correct:

the command is

recursive.append (datasetname)

and should be

recursive.insert (0,datasetname)

Hope it helps.

Regards,

1 - 2
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jul 19 2016
Added on Feb 28 2016
2 comments
2,047 views