Ext4 VM Images

From Ext4
(Difference between revisions)
Jump to: navigation, search
(Initial creation)
 
Line 70: Line 70:
  
 
Calvin Walton notes that with a sufficiently recent QEMU profile (qemu 1.5+), if one configures an FS image as a virtual SCSI disk, it is possible to use fstrim inside the VM to make the backing file sparse.
 
Calvin Walton notes that with a sufficiently recent QEMU profile (qemu 1.5+), if one configures an FS image as a virtual SCSI disk, it is possible to use fstrim inside the VM to make the backing file sparse.
 +
 +
NOTE: It is not a good idea to "zero" the filesystem image by "cat /dev/zero > /mnt/zerofile; rm -rf /mnt/zerofile"!  While this does have the effect of filling most of the filesystem's free blocks with zeroes, it will be necessary to populate a block map or an extent tree; these blocks will not be zeroed.  It is much more efficient to zero unused blocks offline or discard/trim unused blocks online, since there's no need to waste time invoking the block allocator on a huge temporary file.

Revision as of 10:40, 15 February 2014

(djwong is still cleaning this up; readers beware)

Managing ext4 VM images

Apparently, various ext4 users have a particular use-case for provisioning where they create a FS image, populate it with whatever files they want, and then use resize2fs -M to shrink the image down to minimum size. At deploy time the minimal image is copied to a disk and resize2fs'd to fill the whole disk.

Jon Bernard wrote,

In order to support very large partitions, the filesystem is created
with an abnormally large inode table so that large resizes would be
possible.  I traced it to this commit as best I can tell:

    https://github.com/openstack/diskimage-builder/commit/fb246a02eb2ed330d3cc37f5795b3ed026aabe07

I assumed that additional inodes would be allocated along with block
groups during an online resize, but that commit contradicts my current
understanding. 

Ted T'so replied (and djwong has cleaned up somewhat),

Additional inodes *are* allocated as the file system is grown. Whoever thought otherwise was wrong. What happens is that there is a fixed number of inodes per block group. When the file system is resized, either by growing or shrinking file system, as block groups are added or removed from the file system, the inodes are added or removed along with the block groups.

What causes the least optimal data block layout is copying files into a large file system and then shrinking the file system to its minimum size with resize2fs -M. resize2fs' block migration algorithm is pretty stupid -- all blocks that require moving are moved, one by one to the lowest available block, without any regards to file fragmentation.

From a fragmentation standpoint it is better to create a file system that is slightly larger than the data you're trying to copy into it. There is so some non-optimality that occurs as the file system gets filled beyond about 90% full, but it's not nearly as bad as shrinking the file system -- which you should avoid at all costs.

From a performance point of view, the only time you should try to do an off-line resize2fs shrink is if you are shrinking the file system by a handful of blocks as part of converting a file system in place to use LVM or LUKS encryption, and you need to make room for some metadata blocks at the end of the partition.

The other thing thing to note is that if you are using a format such as qcow2, or something like the device-mapper's thin-provisining (thinp) scheme, or if you are willing to deal with sparse files, one approach is to not resize the file system at all. You could just use a tool like zerofree[1] to zero out all of the unused blocks in the file system, and then use "/bin/cp --sparse=always" to cause all zero blocks to be treated as sparse blocks on the destination file.

[1] http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/util/zerofree.c

This is part of how Ted maintains his root filesystem that he uses in a VM for testing ext4 changes upstream. After updating to the latest Debian unstable package updates and installing the latest updates from the xfstests and e2fsprogs git repositories, he runs the following script which uses the zerofree.c program to compress the qcow2 root file system image that he use with kvm:

http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/compress-rootfs

Also, starting with e2fsprogs 1.42.10, there's another way you can efficiently deploy a large file system image by only copying the blocks which are in use, by using a command like this:

      e2image -rap src_fs dest_fs

(See also the -c flag as described in e2image's man page if you want to use this technique to do incremental image-based backups onto a flash-based backup medium; Ted was using this for a while to keep two laptop SSD's root filesystems in sync with one another.)

So there are lots of ways that you can do what you need, all without playing games with resize2fs. Perhaps some of them would actually be better for your use case.

Calvin Walton notes that with a sufficiently recent QEMU profile (qemu 1.5+), if one configures an FS image as a virtual SCSI disk, it is possible to use fstrim inside the VM to make the backing file sparse.

NOTE: It is not a good idea to "zero" the filesystem image by "cat /dev/zero > /mnt/zerofile; rm -rf /mnt/zerofile"! While this does have the effect of filling most of the filesystem's free blocks with zeroes, it will be necessary to populate a block map or an extent tree; these blocks will not be zeroed. It is much more efficient to zero unused blocks offline or discard/trim unused blocks online, since there's no need to waste time invoking the block allocator on a huge temporary file.

Personal tools