This discussion is archived
5 Replies Latest reply: Oct 15, 2012 6:14 PM by Dude! RSS

Trying to understand BtrFS snapshot feature

Dude! Guru
Currently Being Moderated
I'm trying to understand how the copy-on-write and Btrfs snapshot works.

Following simple test:

<pre>
# cd /
# touch testfile
# ls --full-time testfile
-rw-r--r-- 1 root root 0 2012-10-15 12:04:43.629620401 +0200 testfile

Test 1:
# btrfs subvol snapshot / /snap1
# touch testfile
# ls --full-time testfile /snap1/testfile
-rw-r--r-- 1 root root 0 2012-10-15 12:04:43.629620401 +0200 /snap1/testfile
-rw-r--r-- 1 root root 0 2012-10-15 12:07:38.348932127 +0200 testfile

Test 2::
# btrfs subvol snapshot / /snap2
# touch testfile
# ls --full-time testfile /snap1/testfile /snap2/testfile
-rw-r--r-- 1 root root 0 2012-10-15 12:04:43.629620401 +0200 /snap1/testfile
-rw-r--r-- 1 root root 0 2012-10-15 12:07:38.348932127 +0200 /snap2/testfile
-rw-r--r-- 1 root root 0 2012-10-15 12:09:21.769606369 +0200 testfile
</pre>

According to the above tests I'm concluding/questioning the following:

1) Btrfs determines which snapshot maintains a logical copy and physically copies the file to the appropriate snapshot before it is modified.
a) Does it copy the complete file or work on the block level?
b) What happens if the file is very large, e.g. 100 GB and there is not enough space on disk to copy the file to the snapshot directory?
c) Doesn't it have a huge negative impact on performance when a file needs to be copied before it can be altered?
  • 1. Re: Trying to understand BtrFS snapshot feature
    Avi Miller Guru
    Currently Being Moderated
    Dude wrote:
    1) Btrfs determines which snapshot maintains a logical copy and physically copies the file to the appropriate snapshot before it is modified.
    There is no such thing as a logical copy nor indeed even files to btrfs: the copy-on-write happens at the block/inode level. Hence, after the snapshot, there is no way to determine which of the source and target files were actually the source or target.
    a) Does it copy the complete file or work on the block level?
    Block.
    b) What happens if the file is very large, e.g. 100 GB and there is not enough space on disk to copy the file to the snapshot directory?
    It doesn't do any copying: a snapshot is zero-cost, i.e. it consumes no storage until there is a change to the file in the snapshot.
    c) Doesn't it have a huge negative impact on performance when a file needs to be copied before it can be altered?
    It's not copied before it can be altered: it's copied on write, i.e. the blocks that change during the fsync process are written out elsewhere on the disk. The entire file is not copied. Have you watched my btrfs presentation on YouTube? It does cover this. :)
  • 2. Re: Trying to understand BtrFS snapshot feature
    Dude! Guru
    Currently Being Moderated
    Hi, thanks for the answers!

    I guess calling it "logical copy" was a bad choice. Would calling the initial snapshot a "hard link of a file system" be more appropriate?

    Ok, so BTRFS works on the block level. I've done some tests and can confirm what you said (see below)

    I find it interesting that although the snapshot maintains the "hard link" to the original copy - I guess "before block image" (?) - there really is no negative performance impact.

    How does this work? Perhaps it is not overwriting the existing file, but rather creating a new file? So the snapshot still has the "hard link" to the original file, hence nothing changed for the snapshot? Simply a new file was created, and that's showing in the current file system?

    It actually reminds me of the old VMS ODS filesystem, which used file versioning by adding a simicolon, e.g. text.txt;1. When modifying the file the result would be text.txt;2 and so on. When listing or using the file without versions, it would simply show and use the last version. You could purge old version if necessary. The file system was actually structured by records (RMS), similar like a database.


    <pre>
    [root@vm004 /]# # df -h /
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda3 16G 2.3G 12G 17% /

    # time dd if=/dev/zero of=/testfile bs=8k count=1M
    1048576+0 records in
    1048576+0 records out
    8589934592 bytes (8.6 GB) copied, 45.3253 s, 190 MB/s

    Let's create a snapshot and overwrite the testfile

    # btrfs subvolume snapshot / /snap1
    # time dd if=/dev/zero of=/testfile bs=8k count=1M
    dd: writing `/testfile': No space left on device
    491105+0 records in
    491104+0 records out
    4023123968 bytes (4.0 GB) copied, 21.2399 s, 189 MB/s

    real     0m21.613s
    user     0m0.021s
    sys     0m3.325s
    <pre>

    So obviously the there is not enough space to maintain the original file and the snapshot file.
    Since I'm creating a complete new file, I guess that's to be expected.

    Let's try with a smaller file, and also check performance:

    <pre>
    # btrfs subvol delete /snap1
    Delete subvolume '//snap1'

    # time dd if=/dev/zero of=/testfile bs=8k count=500k
    512000+0 records in
    512000+0 records out
    4194304000 bytes (4.2 GB) copied, 21.7176 s, 193 MB/s

    real     0m21.726s
    user     0m0.024s
    sys     0m2.977s

    # time echo "This is a test to test the test" >> /testfile

    real     0m0.000s
    user     0m0.000s
    sys     0m0.000s

    # btrfs subvol snapshot / /snap1
    Create a snapshot of '/' in '//snap1'

    # df -k /
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda3 16611328 6505736 8221432 45% /

    # time echo "This is a test to test the test" >> /testfile

    real     0m0.000s
    user     0m0.000s
    sys     0m0.000s

    # df -k /
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda3 16611328 6505780 8221428 45% /

    # btrfs subvol delete /snap1
    Delete subvolume '//snap1'

    # df -k /
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda3 16611328 6505740 8221428 45% /

    The snapshot occupied 40k

    # btrfs subvol snapshot / /snap1
    Create a snapshot of '/' in '//snap1'

    # time dd if=/dev/zero of=/testfile bs=8k count=500k
    512000+0 records in
    512000+0 records out
    4194304000 bytes (4.2 GB) copied, 21.3818 s, 196 MB/s

    real     0m21.754s
    user     0m0.019s
    sys     0m3.322s

    # df -k /
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda3 16611328 10612756 4125428 73% /

    There was no performance impact, although the space occupied doubled.

    </pre>
  • 3. Re: Trying to understand BtrFS snapshot feature
    Avi Miller Guru
    Currently Being Moderated
    Dude wrote:
    I guess calling it "logical copy" was a bad choice. Would calling the initial snapshot a "hard link of a file system" be more appropriate?
    Nope.
    I find it interesting that although the snapshot maintains the "hard link" to the original copy - I guess "before block image" (?) - there really is no negative performance impact.
    There is no such thing as an original copy: the snapshot is just a copy-on-write marker to the filesystem.
    How does this work? Perhaps it is not overwriting the existing file, but rather creating a new file? So the snapshot still has the "hard link" to the original file, hence nothing changed for the snapshot? Simply a new file was created, and that's showing in the current file system?
    There is no file. There is no new file. There are only b-trees and blocks that, when they are changed (anywhere) are copy-on-write'd to elsewhere on the disk. There is no new file: there is a new inode and some new metadata to create a new file entry so that you can see it in the output of ls, but the actual data at this stage is the same as the old stuff.
    <pre>
    [root@vm004 /]# # df -h /
    </pre>
    df lies, particularly under btrfs. Don't use it. Don't trust it. Ignore it completely. Again, this is covered in my btrfs talk: you should use btrfs fi df / instead. There is also work in the mainline btrfs-progs development to make the output of btrfs fi df / easier to understand.

    Watch the btrfs presentation: http://www.youtube.com/watch?v=hxWuaozpe2I
  • 4. Re: Trying to understand BtrFS snapshot feature
    Dude! Guru
    Currently Being Moderated
    Ok. I'm heading back into my hole and do more research. :-)

    Again, thanks for taking the time to answer!
  • 5. Re: Trying to understand BtrFS snapshot feature
    Avi Miller Guru
    Currently Being Moderated
    Dude wrote:
    Ok. I'm heading back into my hole and do more research. :-)
    btrfs can take some time to understand - I know I spent a few hours on the phone to Chris Mason (the original btrfs author) when preparing for that talk to make sure I got my head around exactly how btrfs actually works on disk. Use the btrfs wiki at http://btrfs.wiki.kernel.org and the documentation there. Also, there are specific btrfs mailing lists that are probably a better place to get detailed responses to questions than the Oracle forum. :)

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points