Design For 1st Class Quota in Ext4
Revision as of 22:03, 20 June 2010
DRAFT Design Specification, v0.60
This proposal promotes quota to being a first class supported feature in ext4. The primary difference is that user.quota and group.quota files will be hidden inodes, and will be managed directly by e2fsprogs, and quota will be enabled automatically as soon as the file system is mounted. The repquota program will not function initially, until a new QUOTASCAN_OPEN interface is implemented by Jan Kara.
New Superblock Fields
We define the following two new fields to the superblock. These fields will be valid if a new COMPAT feature, EXT4_FEATURE_COMPAT_QUOTA is set.
- A superblock field containing the inode number for the user quota file. This inode number will be 3 if the inode is user quota file is hidden. If this field is zero, then user quotas will not be tracked.
- A superblock field containing the inode number for the group quota file. This inode number will be 4 if the inode is group quota file is hidden. If this field is zero, then group quotas will not be tracked.
Quota File Format
The quota files will use the v2r1 (see struct v2r1_disk_dqblk in /usr/src/linux/fs/quota/quotaio_v2.h) format, and updates to the quota files will be protected with the journal if the journal is present.
Changes to e2fsck
If e2fsck needs to do a full file system consistency check, it will keep track of the disk space used by each user and/or groups ids, and update the user and/or group quota files at the end of the e2fsck run.
Older versions of e2fsck will refuse to touch a filesystem which has the EXT4_FEATURE_COMPAT_QUOTA feature flag set. This is by design, since we want e2fsck to be check the quota files for consistency and to flag warnings if they do not match. This will help us flesh out bugs.
Use of mtime fields
If the filesystem appears consistent, but the user and/or group quota files' mtime fields are not equal to the last superblock write time, e2fsck will do a partial file system consistency check. This will consist of e2fsck pass #1, and if no errors were detected, e2fsck will update the user and group quota files and exit. If any errors were detected during pass #1, then e2fsck will continue to do pass numbers 2-5, and thus do a full file system consistency check before updating the quota files.
Changes to mke2fs
Mke2fs will take an extended option (quota=user,group) which if present will force the initialization of the quota inodes. Using the /etc/mke2fs.conf file, the system administrator can also specify a quota option in the [defaults] and [fs_types] section, so that quota files can be enabled by default.
Changes to tune2fs
Tune2fs will have a facility for adding and removing user and group quotas inodes while the file system is mounted. The quota usage will not be correct after the quota inodes are newly added, however, so quota will not be enabled by default. If the quota inodes are removed, quota will be disabled first.
Bulk quota export
There will be a new interface so that bulk quota information can be fetched from the file system. This needs to be negotiated with Jan Kara. Jan has proposed a QUOTASCAN_OPEN quotactl which will return a fd for the caller:
"Reading from the fd would give quota information as if_dqblk structures. The read would return only complete if_dqblk structures. Internally I'd use f_pos to store the id (uid / gid) of the next structure to return to maintain scan state. This should be reasonably flexible, efficient, and clean interface."
Legacy Quota Support
Traditional style quota will still be supported; that is the appropriate magic flags will be passed through to /proc/mounts so that the old-style init scripts will still function correctly. This support will be deprecated over an 18 month period after the new-style kernel code and userspace tools have been released.
Quota Design FAQ's
How can userspace tell if quota is enabled?
Since we will no longer be using mount flags, userspace programs will not be able to determine if a filesystem has quotas enabled by checking /proc/mounts. Instead, they will have to use the quotaon -p command, or use the Q_GETFMT subcommand to quotactl(2).
How can userspace convert back to traditional quota files?
This can be done using the commands debugfs -R "dump <3> user.aquota" /dev/sdXX" and <tt>debugfs -R "dump <4> group.aquota" /dev/sdXX".
We could make this be an enhancement to tune2fs, so that when the advanced quota feature is disabled via "tune2fs -O ^quota", instead of simply deleting the quota files, it could copy the contents of the quota files to *.aquota files, either by default or optionally if these files do not already exist. (If they do exist tune2fs would have to pick alternate names, such as "user (1).aquota" ala Firefox.)
How much extra time/space will this add to e2fsck
This feature will not change e2fsck run time in any significant way, since e2fsck is already collect this required during pass 1 of the file system check anyway. Collecting this information will cost approximately 32 bytes of memory usage by e2fsck for each user or group id in use on the system.
Optional Quota Checksum Feature
The struct v2r1_disk_dqblk structure has a 32-bit dqb_pad field which is currently unused. This could potentially be used to include a CRC of the user or group id (in 32-bit little endian format) followed by the contents of the v2r1_disk_dqblk data structure. This would allow the quota subsystem to detect corrupted quota entries. If the quota entry is detected to be corrupted, a warning should be logged on the system console, and the quota entry should be treated as non-existent. Alternatively, the file system could provide a method function that should be called when a corrupted quota entry is found, which might allow the filesystem to follow whatever file system's error handling behaviour might be. (i.e., for ext4 the file system might continue, remount the file system read-only, or panic the system.)