Ext4 Howto

From Ext4
(Difference between revisions)
Jump to: navigation, search
m (s/ftp/http/)
(Getting Ext4 code: remove outdated information about ext4 support)
 
(44 intermediate revisions by 26 users not shown)
Line 10: Line 10:
 
== Compatibility ==
 
== Compatibility ==
  
Any existing Ext3 filesystem can be migrated to Ext4 with an easy procedure which consists in running a couple of commands in read-only mode (described in the next section). This means that you can improve the performance, storage limits and features of your current filesystems without reformatting and/or reinstalling your OS and software environment. If you need the advantages of Ext4 on a production system, you can upgrade the filesystem. The procedure is safe and doesn't risk your data (obviously, backup of critical data is recommended, even if you aren't updating your filesystem :). Ext4 will use the new data structures only on new data, the old structures will remain untouched and it will be possible to read/modify them when needed. This means, that, of course, that once you convert your filesystem to Ext4 you won't be able to go back to Ext3 again (although there's a possibility, described in the next section, of mounting a Ext3 filesystem with Ext4 without using the new disk format and you'll be able to mount it with Ext3 again, but you lose many of the advantages of Ext4).
+
Any existing Ext3 filesystem can be mounted as Ext4 without requiring any on-disk format changes.  However, it is possible to upgrade an Ext3 filesystem to take advantage of some Ext4 features by running a couple of commands in read-only mode (described in the next section). This means that you can improve the performance, storage limits and features of your current filesystems without reformatting and/or reinstalling your OS and software environment. If you need the advantages of Ext4 on a production system, you can upgrade the filesystem. The procedure is safe and doesn't risk your data (obviously, backup of critical data is recommended, even if you aren't updating your filesystem :). Ext4 will use the new data structures only on new data, the old structures will remain untouched and it will be possible to read/modify them when needed. This means that if you convert your filesystem to Ext4 you won't be able to go back to Ext3 again.
  
== Bigger filesystem/file sizes ==  
+
== Bigger File System and File Sizes ==  
  
Currently, Ext3 support 16 TB of maximum filesystem size, and 2 TB of maximum file size. Ext4 adds 48-bit block addressing, so it will have 1 EB of maximum filesystem size and 16 TB of maximum file size. 1 EB = 1,048,576 TB (1 EB = 1024 PB, 1 PB = 1024 TB, 1 TB = 1024 GB). Why 48-bit and not 64-bit? There are some limitations that would need to be fixed before making Ext4 fully 64-bit capable, which have not been addressed in Ext4. The Ext4 data structures have been designed keeping this in mind, so a future update to Ext4 will implement full 64-bit support at some point. 1 EB will be enough (really :)) until that happens. (Note: The code to create filesystems bigger than 16 TB is -at the time of writing this article- not in any stable release of e2fsprogs. It will be in future releases.)
+
Currently, Ext3 support 16 TiB of maximum file system size and 2 TiB of maximum file size for 4 KiB block size. Ext4 adds 48-bit block addressing, so it will have 1 '''EiB'''{{footnote|1}} of maximum file system size and 16 TiB of maximum file size for 4 KiB block size. Why 48-bit and not 64-bit? There are some limitations that would need to be fixed before making Ext4 fully 64-bit capable, which have not been addressed in Ext4, and with 4 KiB block size (2<sup>12</sup> bytes) the filesystem size limit is already 2<sup>60</sup> bytes.  With 64 KiB block size the limit is 2<sup>64</sup> bytes so there isn't really a pressing need to have full 64-bit block addresses. The Ext4 data structures have been designed in case this is ever required, so a future update to Ext4 may implement full 64-bit support at some point. 1 EiB will be enough (really :)) until that happens.
 +
 
 +
{{footnotes|
 +
#An '''EiB''' or [http://en.wikipedia.org/wiki/Exbibyte Exbibyte] is 2<sup>60</sup> bytes or 1,048,576 TiB.
 +
 
 +
{{bootparm|1EiB|1024PiB}}
 +
 
 +
{{bootparm|1PiB|1024TiB}}
 +
 
 +
{{bootparm|1TiB|1024GiB}}
 +
}}
 +
 
 +
In order to map blocks beyond 2^32 to a file, extents must be enabled since block maps only know about 32-bit block numbers.  As of e2fsprogs 1.42.9, this requirement is enforced by mke2fs.
  
 
== Sub directory scalability ==
 
== Sub directory scalability ==
  
Right now the maximum possible number of sub directories contained in a single directory in Ext3 is 32000. Ext4 breaks that limit and allows a unlimited number of sub directories.
+
Right now the maximum possible number of sub directories contained in a single directory in Ext3 is 32000. Ext4 doubles that limit and allows 64000 sub directories.
  
 
== Extents ==
 
== Extents ==
  
The traditionally Unix-derived file systems like Ext3 use a indirect block mapping scheme to keep track of each block used for the blocks corresponding to the data of a file. This is inefficient for large files, specially on large file delete and truncate operations, because the mapping keeps a entry for every single block, and big files have many blocks -> huge mappings, slow to handle. Modern file systems use a different approach called "extents". An extent is basically a bunch of contiguous physical blocks. It basically says "The data is in the next n blocks". For example, a 100 MB file can be allocated into a single extent of that size, instead of needing to create the indirect mapping for 25600 blocks (4 KB per block). Huge files are split in several extents. Extents improve the performance and also help to reduce the fragmentation, since an extent encourages continuous layouts on the disk.
+
Traditional, Unix-derived, file systems, like Ext3, use a indirect block mapping scheme to keep track of each block used for the blocks corresponding to the data of a file. This is inefficient for large files, especially during large file delete and truncate operations, because the mapping keeps an entry for every single block, and big files have many blocks -> huge mappings, slow to handle. Modern file systems use a different approach called "extents". An extent is basically a bunch of contiguous physical blocks. It basically says "The data is in the next n blocks". For example, a 100 MiB file can be allocated into a single extent of that size, instead of needing to create the indirect mapping for 25600 blocks (4 KiB per block). Huge files are split in several extents. Extents improve the performance and also help to reduce the fragmentation, since an extent encourages continuous layouts on the disk.
  
 
== Multiblock allocation ==
 
== Multiblock allocation ==
  
When Ext3 needs to write new data to the disk, there's a block allocator that decides which free blocks will be used to write the data. But the Ext3 block allocator only allocates one block (4KB) at a time. That means that if the system needs to write the 100 MB data mentioned in the previous point, it will need to call the block allocator 25600 times (and it was just 100 MB!). Not only this is inefficient, it doesn't allow the block allocator to optimize the allocation policy because it doesn't knows how many total data is being allocated, it only knows about a single block. Ext4 uses a "multiblock allocator" (mballoc) which allocates many blocks in a single call, instead of a single block per call, avoiding a lot of overhead. This improves the performance, and it's particularly useful with delayed allocation and extents. This feature doesn't affect the disk format. Also, note that the Ext4 block/inode allocator has other improvements, described in detail in this paper.
+
When Ext3 needs to write new data to the disk, there's a block allocator that decides which free blocks will be used to write the data. But the Ext3 block allocator only allocates one block (4KiB) at a time. That means that if the system needs to write the 100 MiB data mentioned in the previous point, it will need to call the block allocator 25600 times (and it was just 100 MiB!). Not only this is inefficient, it doesn't allow the block allocator to optimize the allocation policy because it doesn't know how many total data is being allocated, it only knows about a single block. Ext4 uses a "multiblock allocator" (mballoc) which allocates many blocks in a single call, instead of a single block per call, avoiding a lot of overhead. This improves the performance, and it's particularly useful with delayed allocation and extents. This feature doesn't affect the disk format. Also, note that the Ext4 block/inode allocator has other improvements, described in details in this paper.
  
 
== Delayed allocation ==
 
== Delayed allocation ==
Line 34: Line 46:
 
== Fast fsck ==
 
== Fast fsck ==
  
Fsck is a very slow operation, especially the first step: checking all the inodes in the file system. In Ext4, at the end of each group's inode table will be stored a list of unused inodes (with a checksum, for safety), so fsck will not check those inodes. The result is that total fsck time improves from 2 to 20 times, depending on the number of used inodes (http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_Ext4). It must be noticed that it's fsck, and not Ext4, who will build the list of unused inodes. This means that you must run fsck to get the list of unused inodes built, and only the next fsck run will be faster (you need to pass a fsck in order to convert a Ext3 filesystem to Ext4 anyway). There's also a feature that takes part in this fsck speed up - "flexible block groups" - that also speeds up fil esystem operations.
+
Fsck is a very slow operation, especially the first step: checking all the inodes in the file system. In Ext4, at the end of each group's inode table will be stored a list of unused inodes (with a checksum, for safety), so fsck will not check those inodes. The result is that total fsck time improves from 2 to 20 times, depending on the number of used inodes (http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_Ext4). It must be noticed that it's fsck, and not Ext4, who will build the list of unused inodes. This means that you must run fsck to get the list of unused inodes built, and only the next fsck run will be faster (you need to pass a fsck in order to convert a Ext3 filesystem to Ext4 anyway). There's also a feature that takes part in this fsck speed up - "flexible block groups" - that also speeds up file system operations.
  
 
== Journal checksumming ==
 
== Journal checksumming ==
Line 42: Line 54:
 
== "No Journaling" mode ==
 
== "No Journaling" mode ==
  
Journaling ensures the integrity of the filesystem by keeping a log of the ongoing disk changes. However, it is know to have a small overhead. Some people with special requirements and workloads can run without a journal and its integrity advantages. In Ext4 the journaling feature can be disabled, which provides a small performance improvement.
+
Journaling ensures the integrity of the filesystem by keeping a log of the ongoing disk changes. However, it is known to have a small overhead. Some people with special requirements and workloads can run without a journal and its integrity advantages. In Ext4 the journaling feature can be disabled, which provides a small performance improvement.
 
+
  
 
== Online defragmentation ==
 
== Online defragmentation ==
  
(This feature is being developed and will be included in future releases). While delayed allocation, extents and multiblock allocation help to reduce the fragmentation, with usage filesystems can still fragment. For example: You write three files in a directory and continually on the disk. Some day you need to update the file of the middle, but the updated file has grown a bit, so there's not enough room for it. You have no option but fragment the excess of data to another place of the disk, which will cause a seek, or allocate the updated file continually in another place, far from the other two files, resulting in seeks if an application needs to read all the files on a directory (say, a file manager doing thumbnails on a directory full of images). Besides, the filesystem can only care about certain types of fragmentation, it can't know, for example, that it must keep all the boot-related files contiguous, because it doesn't know which files are boot-related. To solve this issue, Ext4 will support online fragmentation, and there's a e4defrag tool which can defragment individual files or the whole filesystem.
+
(This feature is being developed and will be included in future releases). While delayed allocation, extents and multiblock allocation help to reduce the fragmentation, with usage filesystems can still fragment. For example: You write three files in a directory and continually on the disk. Some day you need to update the file of the middle, but the updated file has grown a bit, so there's not enough room for it. You have no option but fragment the excess of data to another place of the disk, which will cause a seek, or allocate the updated file continually in another place, far from the other two files, resulting in seeks if an application needs to read all the files on a directory (say, a file manager doing thumbnails on a directory full of images). Besides, the filesystem can only care about certain types of fragmentation, it can't know, for example, that it must keep all the boot-related files contiguous, because it doesn't know which files are boot-related. To solve this issue, Ext4 will support online defragmentation, and there's a [http://manpages.ubuntu.com/manpages/quantal/en/man8/e4defrag.8.html e4defrag] tool which can defragment individual files or the whole filesystem.
  
 
== Inode-related features ==
 
== Inode-related features ==
Line 66: Line 77:
  
 
This is an option that improves the integrity of the filesystem at the cost of some performance (you can disable it with "mount -o barrier=0", recommended trying it if you're benchmarking). [http://lwn.net/Articles/283161/ From this LWN article]: "The filesystem code must, before writing the [journaling] commit record, be absolutely sure that all of the transaction's information has made it to the journal. Just doing the writes in the proper order is insufficient; contemporary drives maintain large internal caches and will reorder operations for better performance. So the filesystem must explicitly instruct the disk to get all of the journal data onto the media before writing the commit record; if the commit record gets written first, the journal may be corrupted. The kernel's block I/O subsystem makes this capability available through the use of barriers; in essence, a barrier forbids the writing of any blocks after the barrier until all blocks written before the barrier are committed to the media. By using barriers, filesystems can make sure that their on-disk structures remain consistent at all times."  
 
This is an option that improves the integrity of the filesystem at the cost of some performance (you can disable it with "mount -o barrier=0", recommended trying it if you're benchmarking). [http://lwn.net/Articles/283161/ From this LWN article]: "The filesystem code must, before writing the [journaling] commit record, be absolutely sure that all of the transaction's information has made it to the journal. Just doing the writes in the proper order is insufficient; contemporary drives maintain large internal caches and will reorder operations for better performance. So the filesystem must explicitly instruct the disk to get all of the journal data onto the media before writing the commit record; if the commit record gets written first, the journal may be corrupted. The kernel's block I/O subsystem makes this capability available through the use of barriers; in essence, a barrier forbids the writing of any blocks after the barrier until all blocks written before the barrier are committed to the media. By using barriers, filesystems can make sure that their on-disk structures remain consistent at all times."  
 +
 +
== Ext4 code implements discard/TRIM ==
 +
 +
Requires mounting with "discard" flag. See [http://sites.google.com/site/lightrush/random-1/howtoconfigureext4toenabletrimforssdsonubuntu howto] and [http://forums.gentoo.org/viewtopic-p-6187612.html Verifying TRIM].
  
 
= Getting Ext4 code =
 
= Getting Ext4 code =
  
== For people who build their own kernel ==
+
For the vast majority of users today, ext4 will be included as a default part of any Linux installation, since it has been included in the mainline kernel since 2009.
 +
 
 +
== For people who build their own kernel and utilities ==
  
 
1. Start with a 2.6.29 or later kernel.  It is highly recommended that you apply the latest [http://www.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/ patchset] (if available) to get the latest bug fixes.  In your kernel's <tt>.config</tt> file, enable <tt>EXT4_FS</tt> (along with <tt>EXT4_FS_XATTR</tt> and <tt>EXT4_FS_POSIX_ACL</tt> if you like).
 
1. Start with a 2.6.29 or later kernel.  It is highly recommended that you apply the latest [http://www.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/ patchset] (if available) to get the latest bug fixes.  In your kernel's <tt>.config</tt> file, enable <tt>EXT4_FS</tt> (along with <tt>EXT4_FS_XATTR</tt> and <tt>EXT4_FS_POSIX_ACL</tt> if you like).
  
2.  Compile the latest version of e2fsprogs (as of this writing 1.41.5) from [http://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ kernel.org] or from [http://sourceforge.net/project/showfiles.php?group_id=2406 Sourceforge].  Note that it is highly important to install the mke2fs.conf file that comes with the e2fsprogs 1.41.x sources in <tt>/etc/mke2fs.conf</tt>.  If you have edited the <tt>/etc/mke2fs.conf</tt> file, you will need to merge your changes with the version from e2fsprogs 1.41.x.
+
2.  Compile the latest version of e2fsprogs (as of this writing 1.44.5) from [https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git kernel.org].  Note that it may be desirable to install the <tt>mke2fs.conf</tt> file that comes with the e2fsprogs sources in <tt>/etc/mke2fs.conf</tt> to get the latest features enabled by default, but if you also need to boot with an older kernel it is prudent to stick with the older <tt>/etc/mke2fs.conf</tt> features and only enable new features on the <tt>mke2fs</tt> command-line as needed.
  
== For people who are running Fedora ==
+
== For people who are running Fedora and RHEL/CentOS ==
  
 
Recent Fedora is generally very up to date with respect to ext4 code in kernelspace and userspace.
 
Recent Fedora is generally very up to date with respect to ext4 code in kernelspace and userspace.
  
Fedora 11 uses the ext4 filesystem as the default root filesystem, and as such should generally contain the most uptodate code, features, and fixes.  It will initially be based on the 2.6.29.x kernel series.
+
== For people who are running RHEL/CentOS ==
 
+
Fedora 10 currently has a kernel based on 2.6.27 that has working ext4 support.  It is missing some of the latest fixes and performance optimizations.  A 2.6.29 kernel update should be available soon, with more up-to-date ext4 code.
+
 
+
Fedora 10 and later has all of the basic infrastructure needed to be able to run ext4, including updated udev, blkid, and other bits needed for ext4 to be a transparently recognized filesystem.
+
 
+
Fedora 9 has only rudimentary ext4 support, and no significant ext4 updates are planned in that release.
+
 
+
No version of Fedora at this time (including F11) has support in grub for booting from ext4, so /boot must be ext3 or some other supported filesystem.  The anaconda installer enforces this restriction.
+
 
+
== For people who are running RHEL ==
+
 
+
ext4 is currently included as a Technology Preview in [http://www.redhat.com/rhel/ RHEL 5.3]. To use it, update to 5.3 or later, boot into 2.6.18-128.el5 or higher, and yum install e4fsprogs. 
+
 
+
The e4fsprogs package is designed to install alongside stock e2fsprogs without overlapping, so some utilities are renamed; for example e4fsck, debuge4fs, e4image, etc.  rpm -q e4fsprogs | grep bin will give a good summary of what is included.
+
  
The ext4 code in RHEL5.3 does not support delayed allocation, as it is a backport of an older ext4 snapshot (2.7.27).  The RHEL5.4 kernel is expected to have an update, based on the 2.6.29 ext4 codebase with several post-2.6.29 fixes.
+
ext4 is included as the default filesystem in RHEL6, and is available in RHEL7 and later.
  
 
== For people who are running openSuSE ==
 
== For people who are running openSuSE ==
  
Although [http://lists.opensuse.org/opensuse-project/2009-03/msg00029.html planned], openSuSE still installs with ext3 as a default. But the [http://software.opensuse.org/developer openSuSE 11.2 Milestone 1] (Javascript needed) is already in place and comes with a 2.6.29 kernel. Here's how I converted my ext3 rootfs into ext4:
+
Ext4 [http://en.opensuse.org/OpenSUSE_11.2#Under_the_Hood ext4 is the default filesystem for openSuSE 11.2]. Converting to ext4 from [http://lists.opensuse.org/opensuse-project/2009-03/msg00029.html earlier versions] should be done just like on any other <tt>initrd</tt>-based system:
  
 
# Run <tt>tune2fs -O extents,uninit_bg /dev/ROOT</tt>
 
# Run <tt>tune2fs -O extents,uninit_bg /dev/ROOT</tt>
Line 105: Line 108:
 
# Add "ext4" to <tt>INITRD_MODULES</tt> in <tt>/etc/sysconfig/kernel</tt>
 
# Add "ext4" to <tt>INITRD_MODULES</tt> in <tt>/etc/sysconfig/kernel</tt>
 
## Run <tt>sudo mkinitrd</tt> (takes about 2 min here, dunno why)
 
## Run <tt>sudo mkinitrd</tt> (takes about 2 min here, dunno why)
# Run <tt>e2fsck</tt> on the ROOT fs. As this is currently mounted, you might want to change into single user mode (<tt>init 1</tt>) and remount read-only (<tt>mount -o remount,ro /</tt>) or just reboot and the bootprocess will complain a bit and run <tt>e2fsck</tt> anyway - don't know if this is The Right Thing To Do though...
+
# Run <tt>e2fsck</tt> on the ROOT fs. As this is currently mounted, you might want to change into single user mode (<tt>init 1</tt>) and remount read-only (<tt>mount -o remount,ro /</tt>) or just reboot and the bootprocess will complain a bit and run <tt>e2fsck</tt> anyway - don't know if this is The Right Thing To Do though.
 
+
'''Note''': Booting off an ext4 partition is still [https://features.opensuse.org/305162 unsupported] with the GrUB version currently available for openSuSE, so be sure to have <tt>/boot</tt> as an extra partition ''before'' converting your rootfs to ext4!
+
  
 
== For people who are running Ubuntu ==
 
== For people who are running Ubuntu ==
  
 
Ubuntu 9.04 and later include ext4 as a manual partitioning option at installation time, including support for ext4 as the root filesystem.
 
Ubuntu 9.04 and later include ext4 as a manual partitioning option at installation time, including support for ext4 as the root filesystem.
 
For people who are running Ubuntu, it is *highly* recommended that you download a set of modified util-linux packages and install them.  Packages for Ubuntu Hardy are available [ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ubuntu-fixed-util-linux here].  These packages revert a [http://changelogs.ubuntu.com/changelogs/pool/main/u/util-linux/util-linux_2.12r-19ubuntu1/changelog change made by Ubuntu] to use the volid library instead of the blkid library.  The volid library has a number of shortcomings, including that they don't work on freshly created filesystems or swap devices until after you reboot (since it is tied to udev probing) and the volid library doesn't understand ext4dev filesystems.  The blkid library is much better, and Debian uses the blkid library for util-linux.  Unfortunately, Ubuntu chose to make this decision for some unknown reason.  For other versions of Ubuntu, the patch that was applied can be found [ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ubuntu-fixed-util-linux/util-linux-patch here].
 
  
 
== For people who are running Debian ==
 
== For people who are running Debian ==
  
In Debian Lenny (Testing), the following current packages provide ext4 support:
+
Ext4 has been available since Debian Lenny and later releases.
* ext4dev module in the linux-image package (2.6.26-10)
+
* e2fsprogs (1.41.3-1)
+
 
+
It should be noted that the stock 2.6.26 ext4 has problems with [[DelayedAllocation|delayed allocation]] and with filesystems with non-extent based files.  So until Debian starts shipping a 2.6.27 based kernel or a 2.6.26 kernel with at least the 2.6.26-ext4-7 patchset, you should mount ext4dev filesystems using -o nodelalloc and only use freshly created filesystems using "<tt>mke2fs -t ext4dev</tt>".  (Without these fixes, if you try to use an ext3 filesystem which was converted using "<tt>tune2fs -E test_fs -o extents /dev/DEV</tt>", you will probably hit a kernel '''BUG''' the moment you try to delete or truncate an old non-extent based file.)
+
 
+
== For people who are running Ark Linux ==
+
Ark Linux dockyard-devel has ext4 support, including ext4 support during installation. grub is also patched to boot from ext4 /boot partitions.
+
To install an Ark Linux system with ext4 as main filesystem, get the dockyard-devel image (or, once released, any 2009.x release), and type "fs=ext4" at the CD boot prompt to override the default setting.
+
Depending the outcome of (pending) performance tests, fs=ext4 may be made the default for 2009.1.
+
 
+
If you're on the stable branch, install the kernel, kernel-filesystem-ext4 and e2fsprogs packages from dockyard-devel.
+
  
 
= Creating ext4 filesystems =
 
= Creating ext4 filesystems =
Line 146: Line 134:
 
If you have a sufficiently new system, the "-t ext4" should not be needed.
 
If you have a sufficiently new system, the "-t ext4" should not be needed.
  
NOTE: Although very large fileystems are on ext4's feature list, current e2fsprogs currently still limits the filesystem size to 2^32 blocks (16T for a 4k block filesystem).  Filesystems larger than 16T is one of the very next high-priority features to complete for ext4.
+
NOTE: Although very large fileystems are on ext4's feature list, current e2fsprogs currently still limits the filesystem size to 2^32 blocks (16TiB for a 4KiB block filesystem).  Allowing filesystems larger than 16T is one of the very next high-priority features to complete for ext4.
  
 
= Booting from an ext4 filesystem =
 
= Booting from an ext4 filesystem =
  
Right now there's not a stable version of grub that supports booting a kernel from a ext4 partition. It's recommended that you keep /boot in a ext3 partition.
+
Ext4 support has been [http://lists.gnu.org/archive/html/grub-devel/2009-10/msg00373.html added] in the 1.97 version of GRUB2
  
Preliminary ext4 support seems to have been [http://svn.savannah.gnu.org/viewvc?view=rev&root=grub&revision=1699 added] to the 1.97 version of the GRUB2 development branch.
+
There's also a Google Summer of Code [http://code.google.com/soc/2008/suse/appinfo.html?csaid=91DC4C762E7EE6D7 project] (from opensuse) which seem to have [http://code.google.com/p/grub4ext4/ developed] ext4 support for grub legacy (0.97). Both projects&mdash;GRUB2 and the GSoC projects&mdash;seem (sadly) to be different efforts.
 
+
There's also a Google Summer of Code [http://code.google.com/soc/2008/suse/appinfo.html?csaid=91DC4C762E7EE6D7 project] (from opensuse) which seem to have [http://code.google.com/p/grub4ext4/ developed] ext4 grub support. Both projects -GRUB2 and the GSoC projects- seem (sadly) to be different efforts.
+
  
 
The grub package in Ubuntu 9.04 and later includes a patch to support booting from ext4 filesystems (see [https://bugs.edge.launchpad.net/bugs/314350 bug 314350]).
 
The grub package in Ubuntu 9.04 and later includes a patch to support booting from ext4 filesystems (see [https://bugs.edge.launchpad.net/bugs/314350 bug 314350]).
 +
 +
Syslinux 4.00 and higher (currently in beta) supports ext4 in extlinux.  See [http://syslinux.zytor.com/].
  
 
= Converting an ext3 filesystem to ext4 =
 
= Converting an ext3 filesystem to ext4 =
  
It is possible to mount both ext3 (and ext2, in kernels 2.6.28 and later) filesystems directly using the ext4 filesystem driver.  This will allow you to use many of the in-core performance enhancements such as delayed allocation (delalloc) and multi-block allocation (mballoc), and large inodes if your ext3 filesystem have been formatted with large inodes as is the default with newer versions of e2fsprogs.  Simply mounting an ext3 (or ext2) filesystem with a modern (2.6.27+) version of ext4 will not change the on-disk structures, and it is possible to revert to the ext3 (or ext2) driver should there be any problem with ext4.  If you plan to use the ext4 driver to boot from an ext2/3 partition, and you compile your kernel without the ext2/3 drivers, you may need to add rootfstype=ext4 to the
+
This section is now a [[UpgradeToExt4 | separate page]].
kernel command line.
+
 
+
In addition to the in-core performance enhancements, there are additional features which modify the on-disk format from what ext3 understands, such as extents, which can significantly improve the ext4 filesystem performance, but mean the filesystem cannot be mounted by kernels that do not support ext4.  There are additional ext4 features, such as flex_bg and > 16TB filesystem support that can only be enabled at format time via mke2fs.
+
 
+
To change an ext2 filesystem (should you still have one) to ext3 (enabling the journal feature), use the command:
+
 
+
{{cmdroot|tune2fs -j /dev/''DEV''}}
+
 
+
To enable the ext4 features on an existing ext3 filesystem, use the command:
+
 
+
{{cmdroot|tune2fs -O extents,uninit_bg,dir_index /dev/''DEV''}}
+
 
+
WARNING: Once you run this command, the filesystem will no longer be mountable using the ext3 filesystem!
+
 
+
After running this command, you MUST run fsck to fix up some on-disk structures that tune2fs has modified:
+
 
+
{{cmdroot|e2fsck -fpDC0 /dev/''DEV''}}
+
 
+
Notes:
+
* by enabling the '''extents''' feature new files will be created in extents format, but this will not convert existing files to use extents.  Non-extent files can be transparently read and written by Ext4.
+
* If you convert your root filesystem ("/") to ext4, and you use the GRUB boot loader, you will need to install a version of GRUB which understands ext4.  Your system may boot OK the first time, but when your kernel is upgraded, it will become unbootable.
+
* WARNING: It is NOT recommended to resize the inodes using <tt>resize2fs</tt> with e2fsprogs 1.41.0 or later, as this is known to corrupt some filesystems.
+
  
 
= Acknowledgements =
 
= Acknowledgements =
  
Portions of this article have been taken from the [http://kernelnewbies.org/Ext4 Ext4 article ] from [http://kernelnewbiews.org the Kernel Newbies website].  The Kernel Newbies article was written by [http://kernelnewbies.org/MichaelBlizek Michael Blizek] and was released under [http://creativecommons.org/licenses/by/2.5 Creative Commons Attribution-2.5-Generic license].
+
Portions of this article have been taken from the [http://kernelnewbies.org/Ext4 Ext4 article] from [http://kernelnewbies.org the Kernel Newbies website].  The Kernel Newbies article was written by [http://kernelnewbies.org/diegocalleja diegocalleja] and was released under [http://creativecommons.org/licenses/by/2.5 Creative Commons Attribution-2.5-Generic license].

Latest revision as of 23:30, 11 February 2019

Contents

[edit] General Information

Ext4 was released as a functionally complete and stable filesystem in Linux 2.6.28, and it's getting included in all the modern distros (in some cases as the default fs), so if you are using a modern distro, it's possible that you already have Ext4 support and you don't need to modify your system to run Ext4.


It's safe to use it in production environments, but as any piece of software, it has bugs (which are more likely to be hit in the first stable versions). Any known critical bug will be quickly fixed. If you find one, you can contact the Ext4 developers at the ext4 mailing list. They sometimes also can be found on IRC.

[edit] EXT4 features

[edit] Compatibility

Any existing Ext3 filesystem can be mounted as Ext4 without requiring any on-disk format changes. However, it is possible to upgrade an Ext3 filesystem to take advantage of some Ext4 features by running a couple of commands in read-only mode (described in the next section). This means that you can improve the performance, storage limits and features of your current filesystems without reformatting and/or reinstalling your OS and software environment. If you need the advantages of Ext4 on a production system, you can upgrade the filesystem. The procedure is safe and doesn't risk your data (obviously, backup of critical data is recommended, even if you aren't updating your filesystem :). Ext4 will use the new data structures only on new data, the old structures will remain untouched and it will be possible to read/modify them when needed. This means that if you convert your filesystem to Ext4 you won't be able to go back to Ext3 again.

[edit] Bigger File System and File Sizes

Currently, Ext3 support 16 TiB of maximum file system size and 2 TiB of maximum file size for 4 KiB block size. Ext4 adds 48-bit block addressing, so it will have 1 EiB1 of maximum file system size and 16 TiB of maximum file size for 4 KiB block size. Why 48-bit and not 64-bit? There are some limitations that would need to be fixed before making Ext4 fully 64-bit capable, which have not been addressed in Ext4, and with 4 KiB block size (212 bytes) the filesystem size limit is already 260 bytes. With 64 KiB block size the limit is 264 bytes so there isn't really a pressing need to have full 64-bit block addresses. The Ext4 data structures have been designed in case this is ever required, so a future update to Ext4 may implement full 64-bit support at some point. 1 EiB will be enough (really :)) until that happens.


FOOTNOTES [Δ]
  1. An EiB or Exbibyte is 260 bytes or 1,048,576 TiB.

1EiB=1024PiB

1PiB=1024TiB

1TiB=1024GiB

In order to map blocks beyond 2^32 to a file, extents must be enabled since block maps only know about 32-bit block numbers. As of e2fsprogs 1.42.9, this requirement is enforced by mke2fs.

[edit] Sub directory scalability

Right now the maximum possible number of sub directories contained in a single directory in Ext3 is 32000. Ext4 doubles that limit and allows 64000 sub directories.

[edit] Extents

Traditional, Unix-derived, file systems, like Ext3, use a indirect block mapping scheme to keep track of each block used for the blocks corresponding to the data of a file. This is inefficient for large files, especially during large file delete and truncate operations, because the mapping keeps an entry for every single block, and big files have many blocks -> huge mappings, slow to handle. Modern file systems use a different approach called "extents". An extent is basically a bunch of contiguous physical blocks. It basically says "The data is in the next n blocks". For example, a 100 MiB file can be allocated into a single extent of that size, instead of needing to create the indirect mapping for 25600 blocks (4 KiB per block). Huge files are split in several extents. Extents improve the performance and also help to reduce the fragmentation, since an extent encourages continuous layouts on the disk.

[edit] Multiblock allocation

When Ext3 needs to write new data to the disk, there's a block allocator that decides which free blocks will be used to write the data. But the Ext3 block allocator only allocates one block (4KiB) at a time. That means that if the system needs to write the 100 MiB data mentioned in the previous point, it will need to call the block allocator 25600 times (and it was just 100 MiB!). Not only this is inefficient, it doesn't allow the block allocator to optimize the allocation policy because it doesn't know how many total data is being allocated, it only knows about a single block. Ext4 uses a "multiblock allocator" (mballoc) which allocates many blocks in a single call, instead of a single block per call, avoiding a lot of overhead. This improves the performance, and it's particularly useful with delayed allocation and extents. This feature doesn't affect the disk format. Also, note that the Ext4 block/inode allocator has other improvements, described in details in this paper.

[edit] Delayed allocation

Delayed allocation is a performance feature (it doesn't change the disk format) found in a few modern filesystems such as XFS, ZFS, btrfs or Reiser 4, and it consists in delaying the allocation of blocks as much as possible, contrary to what traditionally filesystems (such as Ext3, reiser3, etc) do: allocate the blocks as soon as possible. For example, if a process write()s, the filesystem code will allocate immediately the blocks where the data will be placed - even if the data is not being written right now to the disk and it's going to be kept in the cache for some time. This approach has disadvantages. For example when a process is writing continually to a file that grows, successive write()s allocate blocks for the data, but they don't know if the file will keep growing. Delayed allocation, on the other hand, does not allocate the blocks immediately when the process write()s, rather, it delays the allocation of the blocks while the file is kept in cache, until it is really going to be written to the disk. This gives the block allocator the opportunity to optimize the allocation in situations where the old system couldn't. Delayed allocation plays very nicely with the two previous features mentioned, extents and multiblock allocation, because in many workloads when the file is written finally to the disk it will be allocated in extents whose block allocation is done with the mballoc allocator. The performance is much better, and the fragmentation is much improved in some workloads.

[edit] Fast fsck

Fsck is a very slow operation, especially the first step: checking all the inodes in the file system. In Ext4, at the end of each group's inode table will be stored a list of unused inodes (with a checksum, for safety), so fsck will not check those inodes. The result is that total fsck time improves from 2 to 20 times, depending on the number of used inodes (http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_Ext4). It must be noticed that it's fsck, and not Ext4, who will build the list of unused inodes. This means that you must run fsck to get the list of unused inodes built, and only the next fsck run will be faster (you need to pass a fsck in order to convert a Ext3 filesystem to Ext4 anyway). There's also a feature that takes part in this fsck speed up - "flexible block groups" - that also speeds up file system operations.

[edit] Journal checksumming

The journal is the most used part of the disk, making the blocks that form part of it more prone to hardware failure. And recovering from a corrupted journal can lead to massive corruption. Ext4 checksums the journal data to know if the journal blocks are failing or corrupted. But journal checksumming has a bonus: it allows one to convert the two-phase commit system of Ext3's journaling to a single phase, speeding the filesystem operation up to 20% in some cases - so reliability and performance are improved at the same time. (Note: the part of the feature that improves the performance, the asynchronous logging, is turned off by default for now, and will be enabled in future releases, when its reliability improves)

[edit] "No Journaling" mode

Journaling ensures the integrity of the filesystem by keeping a log of the ongoing disk changes. However, it is known to have a small overhead. Some people with special requirements and workloads can run without a journal and its integrity advantages. In Ext4 the journaling feature can be disabled, which provides a small performance improvement.

[edit] Online defragmentation

(This feature is being developed and will be included in future releases). While delayed allocation, extents and multiblock allocation help to reduce the fragmentation, with usage filesystems can still fragment. For example: You write three files in a directory and continually on the disk. Some day you need to update the file of the middle, but the updated file has grown a bit, so there's not enough room for it. You have no option but fragment the excess of data to another place of the disk, which will cause a seek, or allocate the updated file continually in another place, far from the other two files, resulting in seeks if an application needs to read all the files on a directory (say, a file manager doing thumbnails on a directory full of images). Besides, the filesystem can only care about certain types of fragmentation, it can't know, for example, that it must keep all the boot-related files contiguous, because it doesn't know which files are boot-related. To solve this issue, Ext4 will support online defragmentation, and there's a e4defrag tool which can defragment individual files or the whole filesystem.

[edit] Inode-related features

Larger inodes, nanosecond timestamps, fast extended attributes, inodes reservation...

  • Larger inodes: Ext3 supports configurable inode sizes (via the -I mkfs parameter), but the default inode size is 128 bytes. Ext4 will default to 256 bytes. This is needed to accommodate some extra fields (like nanosecond timestamps or inode versioning), and the remaining space of the inode will be used to store extend attributes that are small enough to fit it that space. This will make the access to those attributes much faster, and improves the performance of applications that use extend attributes by a factor of 3-7 times.
  • Inode reservation consists in reserving several inodes when a directory is created, expecting that they will be used in the future. This improves the performance, because when new files are created in that directory they'll be able to use the reserved inodes. File creation and deletion is hence more efficient.
  • Nanoseconds timestamps means that inode fields like "modified time" will be able to use nanosecond resolution instead of the second resolution of Ext3.

[edit] Persistent preallocation

This feature, available in Ext4 in the latest kernel versions, and emulated by glibc in the filesystems that don't support it, allows applications to preallocate disk space: Applications tell the filesystem to preallocate the space, and the filesystem preallocates the necessary blocks and data structures, but there's no data on it until the application really needs to write the data in the future. This is what P2P applications do in their own when they "preallocate" the necessary space for a download that will last hours or days, but implemented much more efficiently by the filesystem and with a generic API. This have several uses: first, to avoid applications (like P2P apps) doing it themselves inefficiently by filling a file with zeros. Second, to improve fragmentation, since the blocks will be allocated at one time, as contiguously as possible. Third, to ensure that applications has always the space they know they will need, which is important for RT-ish applications, since without preallocation the filesystem could get full in the middle of an important operation. The feature is available via the libc posix_fallocate() interface.

[edit] Barriers on by default

This is an option that improves the integrity of the filesystem at the cost of some performance (you can disable it with "mount -o barrier=0", recommended trying it if you're benchmarking). From this LWN article: "The filesystem code must, before writing the [journaling] commit record, be absolutely sure that all of the transaction's information has made it to the journal. Just doing the writes in the proper order is insufficient; contemporary drives maintain large internal caches and will reorder operations for better performance. So the filesystem must explicitly instruct the disk to get all of the journal data onto the media before writing the commit record; if the commit record gets written first, the journal may be corrupted. The kernel's block I/O subsystem makes this capability available through the use of barriers; in essence, a barrier forbids the writing of any blocks after the barrier until all blocks written before the barrier are committed to the media. By using barriers, filesystems can make sure that their on-disk structures remain consistent at all times."

[edit] Ext4 code implements discard/TRIM

Requires mounting with "discard" flag. See howto and Verifying TRIM.

[edit] Getting Ext4 code

For the vast majority of users today, ext4 will be included as a default part of any Linux installation, since it has been included in the mainline kernel since 2009.

[edit] For people who build their own kernel and utilities

1. Start with a 2.6.29 or later kernel. It is highly recommended that you apply the latest patchset (if available) to get the latest bug fixes. In your kernel's .config file, enable EXT4_FS (along with EXT4_FS_XATTR and EXT4_FS_POSIX_ACL if you like).

2. Compile the latest version of e2fsprogs (as of this writing 1.44.5) from kernel.org. Note that it may be desirable to install the mke2fs.conf file that comes with the e2fsprogs sources in /etc/mke2fs.conf to get the latest features enabled by default, but if you also need to boot with an older kernel it is prudent to stick with the older /etc/mke2fs.conf features and only enable new features on the mke2fs command-line as needed.

[edit] For people who are running Fedora and RHEL/CentOS

Recent Fedora is generally very up to date with respect to ext4 code in kernelspace and userspace.

[edit] For people who are running RHEL/CentOS

ext4 is included as the default filesystem in RHEL6, and is available in RHEL7 and later.

[edit] For people who are running openSuSE

Ext4 ext4 is the default filesystem for openSuSE 11.2. Converting to ext4 from earlier versions should be done just like on any other initrd-based system:

  1. Run tune2fs -O extents,uninit_bg /dev/ROOT
  2. Change ext3 to ext4 in /etc/fstab
  3. Add "ext4" to INITRD_MODULES in /etc/sysconfig/kernel
    1. Run sudo mkinitrd (takes about 2 min here, dunno why)
  4. Run e2fsck on the ROOT fs. As this is currently mounted, you might want to change into single user mode (init 1) and remount read-only (mount -o remount,ro /) or just reboot and the bootprocess will complain a bit and run e2fsck anyway - don't know if this is The Right Thing To Do though.

[edit] For people who are running Ubuntu

Ubuntu 9.04 and later include ext4 as a manual partitioning option at installation time, including support for ext4 as the root filesystem.

[edit] For people who are running Debian

Ext4 has been available since Debian Lenny and later releases.

[edit] Creating ext4 filesystems

Creating a new ext4 filesystem is very easy once you have upgraded to e2fsprogs 1.41 or later. Simply type:

# mke2fs -t ext4 /dev/DEV

or

# mkfs.ext4 /dev/DEV

Once the filesystem is created, it can be mounted as follows:

# mount -t ext4 /dev/DEV /wherever

If you have a sufficiently new system, the "-t ext4" should not be needed.

NOTE: Although very large fileystems are on ext4's feature list, current e2fsprogs currently still limits the filesystem size to 2^32 blocks (16TiB for a 4KiB block filesystem). Allowing filesystems larger than 16T is one of the very next high-priority features to complete for ext4.

[edit] Booting from an ext4 filesystem

Ext4 support has been added in the 1.97 version of GRUB2

There's also a Google Summer of Code project (from opensuse) which seem to have developed ext4 support for grub legacy (0.97). Both projects—GRUB2 and the GSoC projects—seem (sadly) to be different efforts.

The grub package in Ubuntu 9.04 and later includes a patch to support booting from ext4 filesystems (see bug 314350).

Syslinux 4.00 and higher (currently in beta) supports ext4 in extlinux. See [1].

[edit] Converting an ext3 filesystem to ext4

This section is now a separate page.

[edit] Acknowledgements

Portions of this article have been taken from the Ext4 article from the Kernel Newbies website. The Kernel Newbies article was written by diegocalleja and was released under Creative Commons Attribution-2.5-Generic license.

Personal tools