Frequently Asked Questions

From Ext4
(Difference between revisions)
Jump to: navigation, search
(What are the key on-disk format differences between jbd and jbd2?)
(Undo revision 5881 by RobertFloyd (talk) / SPAM)
 
(47 intermediate revisions by 10 users not shown)
Line 1: Line 1:
{{Stub}}
 
 
== Getting Started ==
 
== Getting Started ==
 +
 +
=== How do I get started using ext4? ===
 +
 +
Please see the [[Ext4 Howto]] page for information on getting started using ext4.
  
 
=== Where do I get the latest version of e2fsprogs? ===
 
=== Where do I get the latest version of e2fsprogs? ===
  
The latest version of e2fsprogs can be found at [http://sourceforge.net/project/showfiles.php?group_id=2406&package_id=2374 Soureforge] or at [ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs kernel.org].
+
The latest version of e2fsprogs can be found at [http://sourceforge.net/project/showfiles.php?group_id=2406&package_id=2374 Soureforge] or at [ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs kernel.org].  Recently released versions of e2fsprogs support most of the ext4 features (excluding > 16TiB support, as of 2010-11-28), so there is not a requirement to build an e2fsprogs release for using ext4.
  
 
=== How do I build e2fsprogs? ===
 
=== How do I build e2fsprogs? ===
Line 10: Line 13:
 
The <tt>INSTALL</tt> file in the top of the source tree gives more detailed information, but e2fsprogs uses a standard configure script, so the standard "<tt>./configure; make</tt>" will build the e2fsprogs binaries.  Note that if you wish to build the ELF shared libraries, you need to add the "<tt>--enable-elf-shlibs</tt>" option to the configure invocation.
 
The <tt>INSTALL</tt> file in the top of the source tree gives more detailed information, but e2fsprogs uses a standard configure script, so the standard "<tt>./configure; make</tt>" will build the e2fsprogs binaries.  Note that if you wish to build the ELF shared libraries, you need to add the "<tt>--enable-elf-shlibs</tt>" option to the configure invocation.
  
=== How do I mount a fresh new storage device as ext4? ===
+
=== How do I create and mount a new ext4 filesystem? ===
  
For example, if the new device has been detected as /dev/sdb1 (check "dmesg" output):
+
First, make sure that you have e2fsprogs 1.41.0 or later installed on your system.  This is required for ext4 support.  If the new partition where you would like to create the ext4 filesystem is /dev/sdb1, then all you have to type is:
  
Using the newly installed e2fsprogs (using the OS default mke2fs most likely will fail - just compare the message returne d with the following messages):
+
'''/sbin/mke2fs -t ext4 /dev/sdb1<br>'''
 
+
'''cd misc/<br>'''
+
'''./mke2fs -E test_fs /dev/sdb1<br>'''
+
<pre>
+
mke2fs 1.40.8 (13-Mar-2008)
+
Filesystem label=
+
OS type: Linux
+
Block size=1024 (log=0)
+
Fragment size=1024 (log=0)
+
126976 inodes, 506044 blocks
+
25302 blocks (5.00%) reserved for the super user
+
First data block=1
+
Maximum filesystem blocks=67633152
+
62 block groups
+
8192 blocks per group, 8192 fragments per group
+
2048 inodes per group
+
Superblock backups stored on blocks:
+
        8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
+
 
+
Writing inode tables: done                           
+
Writing superblocks and filesystem accounting information: done
+
 
+
This filesystem will be automatically checked every 22 mounts or
+
180 days, whichever comes first.  Use tune2fs -c or -i to override.
+
</pre>
+
  
'''cd ../debugfs/<br>'''
+
Then to mount this new filesystem, all you need to do is:
'''./debugfs -w /dev/sdb1<br>'''
+
debugfs 1.40.5 (27-Jan-2008)<br>
+
debugfs: '''set_super_value s_flags 4<br>'''
+
debugfs: '''quit<br>'''
+
  
'''cd ../misc/<br>'''
 
'''./tune2fs -j /dev/sdb1<br>'''
 
'''./blkid /dev/sdb1<br>'''
 
/dev/sdb1: UUID="3aa983ef-0dcd-474d-a9d3-2660ff6ef3e7" TYPE="ext4dev"<br>
 
 
'''mkdir /mnt/test<br>'''
 
'''mkdir /mnt/test<br>'''
'''mount -t ext4dev /dev/sdb1 /mnt/test<br>'''
+
'''mount -t ext4 /dev/sdb1 /mnt/test<br>'''
  
To verify the mounted filesystem:
+
For more information, please see the [[Ext4 Howto]] document.
  
'''df<br>'''
 
  
=== Why did I get two different output for blkid? ===
+
<!-- This question is no longer relevant, is it?
  
When executing:
+
=== Why do I get "EXT4-fs: sdb1: not marked OK to use with test code." in my dmesg? ===
  
<pre>
+
Ext4 is currently in development, and as a safety measure, it requires that filesystems that it mounts have a flag indicating that it's OK for in-development code be able to mount the filesystem.  This requirement will be dropped relatively soon, once the ext4 developers are confident that ext4 is stable.
blkid /dev/sdb2
+
/dev/sdb2: UUID="5b8706af-208c-4e4c-a3bf-16f1c0864c82" TYPE="ext2"
+
</pre>
+
  
But when the latest e2fsprogs' version of blkid is executed:
+
The test_fs flag can be set by using the command "tune2fs -E test_fs /dev/sdb1".  Filesystems which are created using the command "mke2fs -t ext4dev /dev/sdb1" will automatically have the test_fs flag set.  When the ext4 filesystem has become stable, the command "mke2fs -t ext4 /dev/sdb1" will create a filesystem with all of the appropriate filesystem options suitable for the ext4 filesystem, but without the test_fs flag.
  
<pre>
+
-->
/download/e2fsprogs/e2fsprogs/misc>./blkid /dev/sdb2
+
/dev/sdb2: UUID="5b8706af-208c-4e4c-a3bf-16f1c0864c82" SEC_TYPE="ext2" TYPE="ext4dev"
+
</pre>
+
  
So this explained why you get different output.
 
  
=== Why do I get "EXT4-fs: sdb1: not marked OK to use with test code." in my dmesg? ===
+
== History of ext2, ext3, and ext4 ==
  
This error arises from not executing the "mke2fs -E test_fs /dev/sdb1", which will label the partition as test_fs.  This signature is currently checked for in the ext4 development source codes.
+
=== What is the difference between ext2, ext3, and ext4? ===
  
Searching for this symbol in e2fsprogs package:
+
The ext2, ext3, and ext4 file systems are a family of file systems that have a strong amount of backwards and forward compatibility.  In fact, they can be considered a single filesytem format with a number of feature extensions, and ext2, ext3, and ext4 are merely the names of the implementations found in the Linux kernel.  This way of looking at things is supported by the fact that they share the same userspace utilities (e2fsprogs), and that many filesystems can be mounted on different filesystems.  For example, a filesystem which is created for use with ext3 can be mounted using either ext2 or ext4.  However, a filesystem with ext4-specific extensions can not be mounted using ext2 or ext3, and the ext3 file systems code in the kernel requires the presence of a journal, which is generally not present in partitions formatted for use by the ext2 file system.  The ext4 code has the ability to mount and use a filesystem without a journal.
  
<pre>
+
=== Why was ext2 created? ===
/download/e2fsprogs/e2fsprogs/misc>grep test_fs *.c
+
mke2fs.c:              } else if (!strcmp(token, "test_fs")) {
+
mke2fs.c:                      "\ttest_fs\n"),
+
tune2fs.c:              if (!strcmp(token, "test_fs")) {
+
tune2fs.c:              } else if (!strcmp(token, "^test_fs")) {
+
tune2fs.c:                      "\ttest_fs\n"
+
tune2fs.c:                      "\t^test_fs\n"));
+
</pre>
+
  
To add test_fs signature to formatted partition, execute: "tune2fs -E test_fs /dev/sdb1".
+
In April 1992, the ext filesystem was written by Remy Card to address two key limitations with the Minix filesystem, which had previously been the only filesystem available to Linux: filenames could be only 14 characters, and the maximum file system size supported by Minix was 64MiB.  The ext filesystem supported block devices up to 2GiB, and file names up to 255 characters, but (like Minix) it only had a single timestamp for last modification time, last access time, and inode change time.  It also used linked lists to store free blocks, which meant that files tended to get fragmented very easily.  In January, 1993, the ext2 filesystem was released which further increased the maximum block size to 4TiB, added POSIX timestamps, and supported variable block sizes.  More importantly, it added support for extensibility so that new features could be added to the filesystem.
  
=== How do I turn off the mballoc feature? ===
+
== File System Features ==
  
By default "mount" will enable mballoc feature when mounting.  To turn it off:
+
=== What features are supported by the ext2 filesystem? ===
  
mount -t ext4dev -o data=writeback,delalloc,extents,nomballoc /dev/sdb1 /mnt/test
+
As of this writing, the ext2 filesystem supports the following features:
  
=== How do I turn off the extent feature? ===
+
* Hash-indexed directories (EXT2_FEATURE_COMPAT_DIR_INDEX) (note: the ext2 filesystem only understands indexed directories in that it knows how to clear the indexed directory flag when it modifies such a directory)
 +
* Extended attribute blocks (EXT2_FEATURE_COMPAT_EXT_ATTR)
 +
* File type in directory entries (EXT2_FEATURE_INCOMPAT_FILETYPE)
 +
* Reduced block group backups (EXT2_FEATURE_INCOMPAT_META_BG)
 +
* Reduced superblock backups (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER)
 +
* Files larger than 2GiB in size (EXT2_FEATURE_RO_COMPAT_LARGE_FILE)
  
By default "mount" will enable extent feature when mounting.  To turn it off:
+
=== What features are supported by the ext3 file system? ===
  
mount -t ext4dev -o data=writeback,delalloc,noextents,mballoc /dev/sdb1 /mnt/test
+
As of this writing, the ext3 file system supports the following features:
  
CAUTION:   Once extent feature has been turned on, it is not possible to mount it as ext3 anymore.
+
* Extended attribute blocks and large inodes (EXT3_FEATURE_COMPAT_EXT_ATTR)
 +
* Online filesystem resize reservations (EXT3_FEATURE_COMPAT_RESIZE_INODE)
 +
* Hash-indexed directories (EXT3_FEATURE_COMPAT_DIR_INDEX)
 +
* Journal file/device present (EXT3_FEATURE_COMPAT_HAS_JOURNAL) (note: this feature *must* be set for ext3 to mount the filesystem)
 +
* File type in directory entries (EXT3_FEATURE_INCOMPAT_FILETYPE)
 +
* Journal recovery required (EXT3_FEATURE_INCOMPAT_RECOVER)
 +
* Reduced block group backups (EXT3_FEATURE_INCOMPAT_META_BG)
 +
* Reduced superblock backups (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)
 +
* Files larger than 2GiB in size (EXT3_FEATURE_RO_COMPAT_LARGE_FILE)
  
=== Why do I get "EXT4-fs: Unrecognized mount option "delalloc" or missing value"? ===
+
=== What features are supported by the ext4 file system? ===
  
Please download the ext4 version from the git patchset, instead of using the kernel source.
+
As of this writing, the ext4 file system supports the following features:
  
=== What is mballoc feature? After I mounted a partition as mballoc, can I remount it as nomballoc? ===
+
* Extended attribute blocks and large inodes (EXT3_FEATURE_COMPAT_EXT_ATTR)
 
+
* Online filesystem resize reservations (EXT3_FEATURE_COMPAT_RESIZE_INODE)
Yes, it is possible, just dismount the partition first, and execute "mount" with the nomballoc feature.
+
* Hash-indexed directories (EXT3_FEATURE_COMPAT_DIR_INDEX)
 +
* Journal file/device present (EXT3_FEATURE_COMPAT_HAS_JOURNAL) (not required for ext4 to mount the filesystem)
 +
* File type in directory entries (EXT3_FEATURE_INCOMPAT_FILETYPE)
 +
* Journal recovery required (EXT3_FEATURE_INCOMPAT_RECOVER)
 +
* Files allocated with extent format (EXT4_FEATURE_INCOMPAT_EXTENTS)
 +
* Support for more than 2^32 filesystem blocks (EXT4_FEATURE_INCOMPAT_64BIT)
 +
* Flexible block group metadata location (EXT4_FEATURE_INCOMPAT_FLEX_BG)
 +
* Reduced block group backups (EXT3_FEATURE_INCOMPAT_META_BG)
 +
* Reduced superblock backups (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)
 +
* Files larger than 2GiB in size (EXT3_FEATURE_RO_COMPAT_LARGE_FILE)
 +
* Group descriptor checksums and sparse inode table (EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
 +
* Over 32000 subdirectories (EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
 +
* Nanosecond timestamps and creation time (EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
 +
* Files larger than 2TiB in size (EXT4_FEATURE_RO_COMPAT_HUGE_FILE)
  
 
== Understanding how it works ==
 
== Understanding how it works ==
Line 120: Line 100:
 
=== What are the new features in Ext4 (vs Ext2/3)? ===
 
=== What are the new features in Ext4 (vs Ext2/3)? ===
  
Check here: http://ext4.wiki.kernel.org/index.php/New_ext4_features.
+
Have a look on the [[New_ext4_features|Ext4 features page]].
  
 
=== How do I test the features in Ext4? ===
 
=== How do I test the features in Ext4? ===
  
=== What external tools are available for testing Ext4 FS? ===
 
  
One commonly known simple tool is fsfuzzer: http://www.digitaldwarf.be/products/mangle.c, and its content is reproduced here:
+
=== How do I benchmark the performance of Ext4 as against other FS? What are the tools available? ===
  
<pre>
+
For any filesystem and hardware platform the best benchmark are the actual applications that will be running on the system.
/*
+
Benchmarks are approximations of real world applications, but they may not reflect the IO load of your applications.
  trivial binary file fuzzer by Ilja van Sprundel.
+
  It's usage is very simple, it takes a filename and headersize
+
  as input. it will then change approximatly between 0 and 10% of
+
  the header with random bytes (biased towards the highest bit set)
+
+
  obviously you need a bash script or something as a wrapper !
+
  
  so far this broke: - libmagic (used file)
+
* There exists a wide variety of tools and comparison in the [http://www.google.com/search?q=file+system+performance+testing&btnG=Google+Search intertubes].
                    - preview (osX pdf viewer)
+
    - xpdf (hang, not a crash ...)
+
    - mach-o loading (osX 10.3.7, seems to be fixed later)
+
    - qnx elf loader (panics almost instantly, yikes !)
+
    - FreeBSD elf loading
+
    - openoffice
+
    - amp
+
    - osX image loading (.dmg)
+
    - libbfd (used objdump)
+
    - libtiff (used tiff2pdf)
+
    - xine (division by 0, took 20 minutes of fuzzing)
+
    - OpenBSD elf loading (3.7 on a sparc)
+
    - unixware 713 elf loading
+
    - DragonFlyBSD elf loading
+
    - solaris 10 elf loading
+
    - cistron-radiusd
+
    - linux ext2fs (2.4.29) image loading (division by 0)
+
    - linux reiserfs (2.4.29) image loading (instant panic !!!)
+
    - linux jfs (2.4.29) image loading (long (uninteruptable) loop, 2 oopses)
+
    - linux xfs (2.4.29) image loading (instant panic)
+
    - windows macromedia flash .swf loading (obviously the windows version of mangle needs a few tweaks to work ...)
+
    - Quicktime player 7.0.1 for MacOS X
+
    - totem
+
    - gnumeric
+
                    - vlc
+
                    - mplayer
+
                    - python bytecode interpreter
+
                    - realplayer 10.0.6.776 (GOLD)
+
                    - dvips
+
*/
+
#include <stdio.h>
+
#include <sys/types.h>
+
#include <sys/mman.h>
+
#include <fcntl.h>
+
  
#define DEFAULT_HEADER_SIZE 1024
+
* Wikipedia has a good article on [http://en.wikipedia.org/wiki/Comparison_of_file_systems Comparison of file systems].
#define DEFAULT_NAME "test2"
+
  
int getseed(void) {
+
=== Can I undelete files in Ext4? ===
int fd = open("/dev/urandom", O_RDONLY);
+
int r;
+
if (fd < 0) {
+
perror("open");
+
exit(0);
+
}
+
read(fd, &r, sizeof(r));
+
close(fd);
+
return(r);
+
}
+
  
int main(int argc, char **argv) {
+
No, in the same way that the ext3 journal requirements to be consistent after a crash prevent [[Undeletion|undelete of ext3 files]], it isn't possible to undelete ext4 files.
+
int fd;
+
char *p, *name;
+
unsigned char c;
+
unsigned int count, i, off, hsize;
+
  
if (argc < 2) {
+
=== Can I mount existing Ext3 as Ext4? And vice versa? Similarly from Ext2 to Ext4 and its reverse? ===
hsize = DEFAULT_HEADER_SIZE;
+
name = DEFAULT_NAME;
+
} else if (argc < 3) {
+
hsize = DEFAULT_HEADER_SIZE;
+
name = argv[1];
+
} else {
+
hsize = atoi(argv[2]);
+
name = argv[1];
+
}
+
fd = open(name, O_RDWR);
+
if (fd < 0) {
+
perror("open");
+
exit(0);
+
}
+
p = mmap(0, hsize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
if ((int) p == -1) {
+
perror("mmap");
+
close(fd);
+
exit(0);
+
}
+
srand(getseed());
+
count = (unsigned) rand() % (hsize / 10);
+
for (i = 0; i < count; i++) {
+
off = rand() % hsize;
+
c = rand() % 256;
+
/* we want the highest bit set more often, in case of signedness issues */
+
if ( (rand() % 2) && c < 128) c |= 0x80;
+
p[off] = c;
+
}
+
close(fd);
+
munmap(p, hsize);
+
}
+
</pre>
+
  
Its fuller content is packaged here (which shell scripts to run the compiled programs, but customization for ext4 will be needed): [http://projects.info-pull.com/mokb/fsfuzzer-0.6-lmh.tgz]
+
With recent versions of ext4 (2.6.29 and later), you can mount any ext2 or ext3 filesystem as ext4 without any changes.  You must use
 +
tune2fs to enable the new ext4 features:
  
=== How do I benchmark the performance of Ext4 as against other FS?  What are the tools available? ===
+
{{cmdroot|tune2fs -O extents,uninit_bg,huge_file /dev/''DEV''}}<br/>
 +
{{cmdroot|e2fsck -f /dev/''DEV''}}<br/>
  
There exists a wide variety of tools and comparison, for more information on the different performance testing tools available: [http://www.google.com/search?q=file+system+performance+testing&btnG=Google+Search]
+
If you want to create a journal on an ext2 filesystem that you have mounted as ext4, you must also issue the command:
  
Anothere reference here: http://en.wikipedia.org/wiki/Comparison_of_file_systems.
+
{{cmdroot|tune2fs -j /dev/''DEV''}}
  
=== Can I undelete files in Ext4? ===
+
Once you have enabled extents a former ext2 or ext3 filesystem, it is an ext4 filesystem and cannot be reverted to the old format.
  
No, in the same way that the ext3 journal requirements to be consistent after a crash prevent undelete of ext3 files, it isn't possible to undelete ext4 files.
+
If you have created a journal on a former ext2, it can be removed if it needs to be reverted to ext2:
  
=== Can I mount existing Ext3 as Ext4? And vice versa? Similarly from Ext2 to Ext4 and its reverse? ===
+
{{cmdroot|tune2fs -O ^has_journal /dev/''DEV''}}
 +
 
 +
Some ext4 features cannot be enabled on an existing ext3 filesystem.
  
=== What is the information provided by /proc/fs/jbd2/<partition>/history? ===
+
See the [[Ext4 Howto]] for more details.
  
Executing "cat /proc/fs/jbd2/<partition>/history" gives:
+
=== What is the information provided by /proc/fs/jbd2/''partition''/history? ===
 +
 
 +
Executing {{cmduser|cat /proc/fs/jbd2/''partition''/history}} gives:
  
 
<pre>
 
<pre>
 
R/C  tid  wait  run  lock  flush log  hndls  block inlog ctime write drop  close
 
R/C  tid  wait  run  lock  flush log  hndls  block inlog ctime write drop  close
R    2     0    107692 0    0    0    1      1     2    
+
R    7102  0     5000  0    1424  4    68681  5    6   
 +
R    7103  0    5000  0    1644  4    64579  9    10 
 +
R    7104  0    5000  0    856  32    38719  11    12 
 +
R    7105  0    5000  0    1052  0    47142  12    13 
 +
R    7106  0    5000  0    1172  16    56028  11    12 
 +
R    7107  0    5000  0    1416  4    71047  11    12 
 +
R    7108  0    5000  0    1640  4    81125  5    6   
 +
R    7109  0    5000  0    1616  4    77314  6    7   
 +
R    7110  0    5000  0    1640  0    76111  5     6    
 +
:
 +
:
 
</pre>
 
</pre>
  
The purpose of this history is to provide a capture of the statistical properties on the performance of the Ext4 filesystem.   It can be observed that the entries are added after a file has been copied to the file system - not immediately, but delayed by a short while, due to the asynchronous mechanism of jbd2 logging(*FIXED ME*).
+
The purpose of this history is to provide information on the behaviour of the ext4 journaling layer (JBD2).
 +
There is a line added to the {{path|history}} file for recently committed journal transactions.  The fields are:
 +
 
 +
; R/C : whether transaction is Running or Committed
 +
{{NOTE|the current JBD2 statistics only show results for the Running transaction and do not show the Commit statistics.}}
 +
; tid : transaction ID is an internal identifier given to every JBD2 transaction
 +
; wait : number of milliseconds spent waiting for the transaction to start.  This may happen if the journal is too small and previous transactions have not checkpointed yet.
 +
; run : number of milliseconds the transaction was running (default 5000ms = 5s). May be shorter if the transaction contains the maximum number of blocks (1/4 of the journal size) or if the application is doing synchronous operations.
 +
; lock : number of milliseconds spent waiting for the transaction to be locked
 +
; flush : number of milliseconds flushing blocks to the filesystem for ordered mode before the transaction can be committed
 +
; log : number of milliseconds to write the blocks to the journal
 +
; hndls : number of filesystem transaction handles for this journal transaction
 +
; block : number of filesystem blocks in the transaction
 +
; inlog : total number of blocks written to the journal for this transaction, including journal overhead
  
 
=== What is the information provided by /proc/fs/jbd2/<partition>/info? ===
 
=== What is the information provided by /proc/fs/jbd2/<partition>/info? ===
  
Executing "cat /proc/fs/jbd2/<partition>/info" gives:
+
Executing {{cmduser|cat /proc/fs/jbd2/''partition''/info}} gives:
  
56 transaction, each upto 2048 blocks
+
{{cmdresult|
average:  
+
56 transaction, each upto 2048 blocks
 +
average:  
 
   0ms waiting for transaction
 
   0ms waiting for transaction
 
   57671ms running transaction
 
   57671ms running transaction
Line 269: Line 189:
 
   6 blocks per transaction
 
   6 blocks per transaction
 
   7 logged blocks per transaction
 
   7 logged blocks per transaction
 +
}}
 +
 +
This file shows the average statistics from the {{path|/proc/fs/jbd2/''partition''/history}} file since the filesystem was first mounted.
  
 
=== How to online resize the Ext4 filesystem? ===
 
=== How to online resize the Ext4 filesystem? ===
  
Online resizing of ext4 works in a similar manner as ext3, using either resize2fs or ext2resize, but there is currently a limit (around 4TB or so) to the maximum filesystem size.  Implementing online resize with the META_BG feature would allow this limit to be exceeded.
+
Online resizing of ext4 works in a similar manner as ext3, using either resize2fs or ext2resize, but there is currently a limit (around 4TiB or so) to the maximum filesystem size.  Implementing online resize with the META_BG feature would allow this limit to be exceeded.
  
 
=== What is the difference between extents mapping and traditional indirect block mapping? ===
 
=== What is the difference between extents mapping and traditional indirect block mapping? ===
  
To quote from the paper:  http://ext2.sourceforge.net/2005-ols/2005-ext3-paper.pdf:
+
To quote from the [http://ext2.sourceforge.net/2005-ols/2005-ext3-paper.pdf 2005-OLS-ext3 paper]:
  
 
<pre>
 
<pre>
Currently, the ext2/ext3 filesystem, like other traditional UNIX filesystems, uses a direct, indi-
+
Currently, the ext2/ext3 filesystem, like other traditional UNIX filesystems, uses a direct,
rect, double indirect, and triple indirect blocks to map file offsets to on-disk blocks. This
+
indirect, double indirect, and triple indirect blocks to map file offsets to on-disk blocks.
scheme, sometimes simply called an indirect block mapping scheme, is not efficient for large
+
This scheme, sometimes simply called an indirect block mapping scheme, is not efficient for
files, especially large file deletion. In order to address this problem, many modern filesystems
+
large files, especially large file deletion. In order to address this problem, many modern
(including XFS and JFS on Linux) use some form of extent maps instead of the traditional
+
filesystems (including XFS and JFS on Linux) use some form of extent maps instead of the
indirect block mapping scheme.
+
traditional indirect block mapping scheme.
  
Since most filesystems try to allocate blocks in a contiguous fashion, extent maps are a more efficient
+
Since most filesystems try to allocate blocks in a contiguous fashion for performance reasons,
way to represent the mapping between logical and physical blocks for large files. An extent is a single  
+
extent maps are a more efficient way to represent the mapping between logical and physical
descriptor for a range of contiguous blocks, instead of using, say hundreds of entries to describe  
+
blocks for large files. An extent is a single descriptor for a range of contiguous blocks,
each block individually.
+
instead of using hundreds of entries to describe hundreds of blocks individually.
 
</pre>
 
</pre>
  
=== What is delayed allocation?   What are its advantages in Ext4? ===
+
=== What is delayed allocation (delalloc)? What are its advantages in Ext4? ===
  
Delayed allocation worked by deferring the allocation of new blocks in
+
[[DelayedAllocation|Delayed allocation]] works by deferring the mapping of newly-written file
the filesystem to disk blocks until writeback time. This helps in three ways:
+
data blocks to disk blocks in the filesystem until writeback time. This helps in several ways:
  
1.  Reduced fragmentation.<br>
+
# Reduced filesystem fragmentation, because all (or a large number) of blocks for a single file can be allocated at the same time. Knowing the total number of blocks in each file allows the block allocator (mballoc) to find a suitable chunk of free space for each file instead of picking a free chunk that is too large or too small.
2.   Reduced CPU cycles spent in get_block() calls.<br>
+
# Reduced CPU cycles spent in block allocation, because the block allocator can allocate many or all of the blocks for the file at one time, instead of doing searching and locking for each block in the file as it is written without delayed allocation.
3.  It may avoid the need for disk updates for metadata creation, which in turn  
+
# It may avoid the need for disk updates for metadata creation for short-lived files, which in turn reduces fragmentation.
reduces impact on fragmentation.<br>
+
 
 +
This is the default allocation mode for ext4.
  
 
=== What is multiblock allocation (mballoc)? ===
 
=== What is multiblock allocation (mballoc)? ===
Line 308: Line 232:
 
The mballoc code is active when using the O_DIRECT flag for writes, or if the delayed allocation (delalloc) feature is being used.  This allows the file to have many dirty blocks submitted for writes at the same time, unlike the existing kernel mechanism of submitting each block to the filesystem separately for allocation.
 
The mballoc code is active when using the O_DIRECT flag for writes, or if the delayed allocation (delalloc) feature is being used.  This allows the file to have many dirty blocks submitted for writes at the same time, unlike the existing kernel mechanism of submitting each block to the filesystem separately for allocation.
  
=== What is this bitmap allocator? ===
+
=== What is the bitmap allocator? ===
 +
 
 +
The allocator used in ext2 and ext3 would scan the free blocks bitmap for every new block written to a file.  This was inefficient, and the block allocator in ext4 (mballoc) replaced the bitmap allocator and is one of the reasons ext4 is much faster than ext3.
  
 
=== Can you say something about the history of Ext4? ===
 
=== Can you say something about the history of Ext4? ===
Line 316: Line 242:
 
=== When was Ext4 first annouced to the LKML? ===
 
=== When was Ext4 first annouced to the LKML? ===
  
Check here: http://kerneltrap.org/node/6776
+
The ext4 filesystem has been [http://lkml.org/lkml/2006/6/28/454 announced] on [http://kerneltrap.org/node/6776 28 June 2006].
  
 
=== What are the key differences between ext3 and ext4? ===
 
=== What are the key differences between ext3 and ext4? ===
Line 328: Line 254:
 
* removed 32000 subdirectory limit (DIR_NLINKS)
 
* removed 32000 subdirectory limit (DIR_NLINKS)
 
* nanosecond inode timestamps (EXTRA_ISIZE)
 
* nanosecond inode timestamps (EXTRA_ISIZE)
 +
 +
In addition, use of a journal is optional and may be enabled or disabled with tune2fs and the "has_journal" option.
  
 
=== What are the key differences between jbd and jbd2? ===
 
=== What are the key differences between jbd and jbd2? ===
  
The code between jbd and jbd2 is nearly identical, but jbd2 adds a few new features in a compatible way:
+
The code between jbd and jbd2 is largely the same, but jbd2 adds a few new features in a compatible way:
 
* support for 64-bit filesystems (64_BIT)
 
* support for 64-bit filesystems (64_BIT)
 
* checksumming of journal transactions (CHECKSUM)
 
* checksumming of journal transactions (CHECKSUM)
 
* asynchronous transaction commit block write (ASYNC_COMMIT)
 
* asynchronous transaction commit block write (ASYNC_COMMIT)
  
=== What are the key on-disk format differences between ext3 and ext4? ===
+
In addition, jbd2 implements a new ''ordered'' mode for flushing data blocks to the filesystem that works in conjunction with delayed allocation to avoid blocking journal commits when there is a lot of data being written to the filesystem.  This avoids long delays for fsync() operations when another thread is doing heavy writes to the filesystem.
<pre>
+
/*
+
* This is the extent on-disk structure.
+
* It's used at the bottom of the tree.
+
  */
+
struct ext4_extent {
+
        __le32  ee_block;      /* first logical block extent covers */
+
        __le16  ee_len;        /* number of blocks covered by extent */
+
        __le16  ee_start_hi;    /* high 16 bits of physical block */
+
        __le32  ee_start_lo;    /* low 32 bits of physical block */
+
};
+
</pre>
+
 
+
/*
+
* This is index on-disk structure.
+
* It's used at all the levels except the bottom.
+
*/
+
struct ext4_extent_idx {
+
        __le32  ei_block;      /* index covers logical blocks from 'block' */
+
        __le32  ei_leaf_lo;    /* pointer to the physical block of the next *
+
                                * level. leaf or next index could be there */
+
        __le16  ei_leaf_hi;    /* high 16 bits of physical block */
+
        __u16  ei_unused;
+
};
+
  
=== What are the key on-disk format differences between jbd and jbd2? ===
+
=== Can I use ext4 on Solid-state drives (SSD)? ===
  
struct commit_header
+
Yes, SSD is generally no different to ext4 than any other block device. With modern solid-state disks, [http://marc.info/?l=linux-ext4&m=125803982214652&w=2 you can even put the journal on the SSD as well].
{
+
      __be32          h_magic;
+
      __be32          h_blocktype;
+
      __be32          h_sequence;
+
      unsigned char  h_chksum_type;
+
      unsigned char  h_chksum_size;
+
      unsigned char  h_padding[2];
+
      __be32          h_chksum[JFS_CHECKSUM_BYTES];
+
};
+

Latest revision as of 19:49, 1 February 2013

Contents

[edit] Getting Started

[edit] How do I get started using ext4?

Please see the Ext4 Howto page for information on getting started using ext4.

[edit] Where do I get the latest version of e2fsprogs?

The latest version of e2fsprogs can be found at Soureforge or at kernel.org. Recently released versions of e2fsprogs support most of the ext4 features (excluding > 16TiB support, as of 2010-11-28), so there is not a requirement to build an e2fsprogs release for using ext4.

[edit] How do I build e2fsprogs?

The INSTALL file in the top of the source tree gives more detailed information, but e2fsprogs uses a standard configure script, so the standard "./configure; make" will build the e2fsprogs binaries. Note that if you wish to build the ELF shared libraries, you need to add the "--enable-elf-shlibs" option to the configure invocation.

[edit] How do I create and mount a new ext4 filesystem?

First, make sure that you have e2fsprogs 1.41.0 or later installed on your system. This is required for ext4 support. If the new partition where you would like to create the ext4 filesystem is /dev/sdb1, then all you have to type is:

/sbin/mke2fs -t ext4 /dev/sdb1

Then to mount this new filesystem, all you need to do is:

mkdir /mnt/test
mount -t ext4 /dev/sdb1 /mnt/test

For more information, please see the Ext4 Howto document.



[edit] History of ext2, ext3, and ext4

[edit] What is the difference between ext2, ext3, and ext4?

The ext2, ext3, and ext4 file systems are a family of file systems that have a strong amount of backwards and forward compatibility. In fact, they can be considered a single filesytem format with a number of feature extensions, and ext2, ext3, and ext4 are merely the names of the implementations found in the Linux kernel. This way of looking at things is supported by the fact that they share the same userspace utilities (e2fsprogs), and that many filesystems can be mounted on different filesystems. For example, a filesystem which is created for use with ext3 can be mounted using either ext2 or ext4. However, a filesystem with ext4-specific extensions can not be mounted using ext2 or ext3, and the ext3 file systems code in the kernel requires the presence of a journal, which is generally not present in partitions formatted for use by the ext2 file system. The ext4 code has the ability to mount and use a filesystem without a journal.

[edit] Why was ext2 created?

In April 1992, the ext filesystem was written by Remy Card to address two key limitations with the Minix filesystem, which had previously been the only filesystem available to Linux: filenames could be only 14 characters, and the maximum file system size supported by Minix was 64MiB. The ext filesystem supported block devices up to 2GiB, and file names up to 255 characters, but (like Minix) it only had a single timestamp for last modification time, last access time, and inode change time. It also used linked lists to store free blocks, which meant that files tended to get fragmented very easily. In January, 1993, the ext2 filesystem was released which further increased the maximum block size to 4TiB, added POSIX timestamps, and supported variable block sizes. More importantly, it added support for extensibility so that new features could be added to the filesystem.

[edit] File System Features

[edit] What features are supported by the ext2 filesystem?

As of this writing, the ext2 filesystem supports the following features:

  • Hash-indexed directories (EXT2_FEATURE_COMPAT_DIR_INDEX) (note: the ext2 filesystem only understands indexed directories in that it knows how to clear the indexed directory flag when it modifies such a directory)
  • Extended attribute blocks (EXT2_FEATURE_COMPAT_EXT_ATTR)
  • File type in directory entries (EXT2_FEATURE_INCOMPAT_FILETYPE)
  • Reduced block group backups (EXT2_FEATURE_INCOMPAT_META_BG)
  • Reduced superblock backups (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER)
  • Files larger than 2GiB in size (EXT2_FEATURE_RO_COMPAT_LARGE_FILE)

[edit] What features are supported by the ext3 file system?

As of this writing, the ext3 file system supports the following features:

  • Extended attribute blocks and large inodes (EXT3_FEATURE_COMPAT_EXT_ATTR)
  • Online filesystem resize reservations (EXT3_FEATURE_COMPAT_RESIZE_INODE)
  • Hash-indexed directories (EXT3_FEATURE_COMPAT_DIR_INDEX)
  • Journal file/device present (EXT3_FEATURE_COMPAT_HAS_JOURNAL) (note: this feature *must* be set for ext3 to mount the filesystem)
  • File type in directory entries (EXT3_FEATURE_INCOMPAT_FILETYPE)
  • Journal recovery required (EXT3_FEATURE_INCOMPAT_RECOVER)
  • Reduced block group backups (EXT3_FEATURE_INCOMPAT_META_BG)
  • Reduced superblock backups (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)
  • Files larger than 2GiB in size (EXT3_FEATURE_RO_COMPAT_LARGE_FILE)

[edit] What features are supported by the ext4 file system?

As of this writing, the ext4 file system supports the following features:

  • Extended attribute blocks and large inodes (EXT3_FEATURE_COMPAT_EXT_ATTR)
  • Online filesystem resize reservations (EXT3_FEATURE_COMPAT_RESIZE_INODE)
  • Hash-indexed directories (EXT3_FEATURE_COMPAT_DIR_INDEX)
  • Journal file/device present (EXT3_FEATURE_COMPAT_HAS_JOURNAL) (not required for ext4 to mount the filesystem)
  • File type in directory entries (EXT3_FEATURE_INCOMPAT_FILETYPE)
  • Journal recovery required (EXT3_FEATURE_INCOMPAT_RECOVER)
  • Files allocated with extent format (EXT4_FEATURE_INCOMPAT_EXTENTS)
  • Support for more than 2^32 filesystem blocks (EXT4_FEATURE_INCOMPAT_64BIT)
  • Flexible block group metadata location (EXT4_FEATURE_INCOMPAT_FLEX_BG)
  • Reduced block group backups (EXT3_FEATURE_INCOMPAT_META_BG)
  • Reduced superblock backups (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)
  • Files larger than 2GiB in size (EXT3_FEATURE_RO_COMPAT_LARGE_FILE)
  • Group descriptor checksums and sparse inode table (EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
  • Over 32000 subdirectories (EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
  • Nanosecond timestamps and creation time (EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
  • Files larger than 2TiB in size (EXT4_FEATURE_RO_COMPAT_HUGE_FILE)

[edit] Understanding how it works

[edit] What are the new features in Ext4 (vs Ext2/3)?

Have a look on the Ext4 features page.

[edit] How do I test the features in Ext4?

[edit] How do I benchmark the performance of Ext4 as against other FS? What are the tools available?

For any filesystem and hardware platform the best benchmark are the actual applications that will be running on the system. Benchmarks are approximations of real world applications, but they may not reflect the IO load of your applications.

  • There exists a wide variety of tools and comparison in the intertubes.

[edit] Can I undelete files in Ext4?

No, in the same way that the ext3 journal requirements to be consistent after a crash prevent undelete of ext3 files, it isn't possible to undelete ext4 files.

[edit] Can I mount existing Ext3 as Ext4? And vice versa? Similarly from Ext2 to Ext4 and its reverse?

With recent versions of ext4 (2.6.29 and later), you can mount any ext2 or ext3 filesystem as ext4 without any changes. You must use tune2fs to enable the new ext4 features:

# tune2fs -O extents,uninit_bg,huge_file /dev/DEV
# e2fsck -f /dev/DEV

If you want to create a journal on an ext2 filesystem that you have mounted as ext4, you must also issue the command:

# tune2fs -j /dev/DEV

Once you have enabled extents a former ext2 or ext3 filesystem, it is an ext4 filesystem and cannot be reverted to the old format.

If you have created a journal on a former ext2, it can be removed if it needs to be reverted to ext2:

# tune2fs -O ^has_journal /dev/DEV

Some ext4 features cannot be enabled on an existing ext3 filesystem.

See the Ext4 Howto for more details.

[edit] What is the information provided by /proc/fs/jbd2/partition/history?

Executing $ cat /proc/fs/jbd2/partition/history gives:

R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  close
R    7102  0     5000  0     1424  4     68681  5     6    
R    7103  0     5000  0     1644  4     64579  9     10   
R    7104  0     5000  0     856   32    38719  11    12   
R    7105  0     5000  0     1052  0     47142  12    13   
R    7106  0     5000  0     1172  16    56028  11    12   
R    7107  0     5000  0     1416  4     71047  11    12   
R    7108  0     5000  0     1640  4     81125  5     6    
R    7109  0     5000  0     1616  4     77314  6     7    
R    7110  0     5000  0     1640  0     76111  5     6    
:
:

The purpose of this history is to provide information on the behaviour of the ext4 journaling layer (JBD2). There is a line added to the history file for recently committed journal transactions. The fields are:

R/C 
whether transaction is Running or Committed
NOTE!
the current JBD2 statistics only show results for the Running transaction and do not show the Commit statistics.
tid 
transaction ID is an internal identifier given to every JBD2 transaction
wait 
number of milliseconds spent waiting for the transaction to start. This may happen if the journal is too small and previous transactions have not checkpointed yet.
run 
number of milliseconds the transaction was running (default 5000ms = 5s). May be shorter if the transaction contains the maximum number of blocks (1/4 of the journal size) or if the application is doing synchronous operations.
lock 
number of milliseconds spent waiting for the transaction to be locked
flush 
number of milliseconds flushing blocks to the filesystem for ordered mode before the transaction can be committed
log 
number of milliseconds to write the blocks to the journal
hndls 
number of filesystem transaction handles for this journal transaction
block 
number of filesystem blocks in the transaction
inlog 
total number of blocks written to the journal for this transaction, including journal overhead

[edit] What is the information provided by /proc/fs/jbd2/<partition>/info?

Executing $ cat /proc/fs/jbd2/partition/info gives:


56 transaction, each upto 2048 blocks
average: 
 0ms waiting for transaction
 57671ms running transaction
 0ms transaction was being locked
 28ms flushing data (in ordered mode)
 14ms logging transaction
 2383 handles per transaction
 6 blocks per transaction
 7 logged blocks per transaction


This file shows the average statistics from the /proc/fs/jbd2/partition/history file since the filesystem was first mounted.

[edit] How to online resize the Ext4 filesystem?

Online resizing of ext4 works in a similar manner as ext3, using either resize2fs or ext2resize, but there is currently a limit (around 4TiB or so) to the maximum filesystem size. Implementing online resize with the META_BG feature would allow this limit to be exceeded.

[edit] What is the difference between extents mapping and traditional indirect block mapping?

To quote from the 2005-OLS-ext3 paper:

Currently, the ext2/ext3 filesystem, like other traditional UNIX filesystems, uses a direct,
indirect, double indirect, and triple indirect blocks to map file offsets to on-disk blocks.
This scheme, sometimes simply called an indirect block mapping scheme, is not efficient for
large files, especially large file deletion. In order to address this problem, many modern
filesystems (including XFS and JFS on Linux) use some form of extent maps instead of the
traditional indirect block mapping scheme.

Since most filesystems try to allocate blocks in a contiguous fashion for performance reasons,
extent maps are a more efficient way to represent the mapping between logical and physical
blocks for large files.  An extent is a single descriptor for a range of contiguous blocks,
instead of using hundreds of entries to describe hundreds of blocks individually.

[edit] What is delayed allocation (delalloc)? What are its advantages in Ext4?

Delayed allocation works by deferring the mapping of newly-written file data blocks to disk blocks in the filesystem until writeback time. This helps in several ways:

  1. Reduced filesystem fragmentation, because all (or a large number) of blocks for a single file can be allocated at the same time. Knowing the total number of blocks in each file allows the block allocator (mballoc) to find a suitable chunk of free space for each file instead of picking a free chunk that is too large or too small.
  2. Reduced CPU cycles spent in block allocation, because the block allocator can allocate many or all of the blocks for the file at one time, instead of doing searching and locking for each block in the file as it is written without delayed allocation.
  3. It may avoid the need for disk updates for metadata creation for short-lived files, which in turn reduces fragmentation.

This is the default allocation mode for ext4.

[edit] What is multiblock allocation (mballoc)?

mballoc is a mechanism to allow many blocks to be allocated to a file in a single operation, in order to dramatically reduce the amount of CPU usage searching for many free blocks in the filesystem. Also, because many file blocks are allocated at the same time, a much better decision can be made to find a chunk of free space where all of the blocks will fit.

The mballoc code is active when using the O_DIRECT flag for writes, or if the delayed allocation (delalloc) feature is being used. This allows the file to have many dirty blocks submitted for writes at the same time, unlike the existing kernel mechanism of submitting each block to the filesystem separately for allocation.

[edit] What is the bitmap allocator?

The allocator used in ext2 and ext3 would scan the free blocks bitmap for every new block written to a file. This was inefficient, and the block allocator in ext4 (mballoc) replaced the bitmap allocator and is one of the reasons ext4 is much faster than ext3.

[edit] Can you say something about the history of Ext4?

Check here: http://en.wikipedia.org/wiki/Ext4.

[edit] When was Ext4 first annouced to the LKML?

The ext4 filesystem has been announced on 28 June 2006.

[edit] What are the key differences between ext3 and ext4?

The main new features in ext4 are below, and are described more fully in New_ext4_features:

  • extent-mapped files for more efficient storage of file metadata (EXTENTS)
  • multi-block and delayed allocation for faster/better file allocations
  • support for larger filesystems (up to 2^48 blocks, currently 2^60 bytes) (64_BIT)
  • optimized storage of filesystem metadata like bitmaps and inode table (FLEX_BG)
  • less overhead for e2fsck, on-disk checksum of group descriptors (GDT_CSUM)
  • removed 32000 subdirectory limit (DIR_NLINKS)
  • nanosecond inode timestamps (EXTRA_ISIZE)

In addition, use of a journal is optional and may be enabled or disabled with tune2fs and the "has_journal" option.

[edit] What are the key differences between jbd and jbd2?

The code between jbd and jbd2 is largely the same, but jbd2 adds a few new features in a compatible way:

  • support for 64-bit filesystems (64_BIT)
  • checksumming of journal transactions (CHECKSUM)
  • asynchronous transaction commit block write (ASYNC_COMMIT)

In addition, jbd2 implements a new ordered mode for flushing data blocks to the filesystem that works in conjunction with delayed allocation to avoid blocking journal commits when there is a lot of data being written to the filesystem. This avoids long delays for fsync() operations when another thread is doing heavy writes to the filesystem.

[edit] Can I use ext4 on Solid-state drives (SSD)?

Yes, SSD is generally no different to ext4 than any other block device. With modern solid-state disks, you can even put the journal on the SSD as well.

Personal tools