Frequently Asked Questions

From Ext4
Revision as of 06:23, 29 November 2010 by Adilger (Talk | contribs)

Jump to: navigation, search

Contents

Getting Started

How do I get started using ext4?

Please see the Ext4 Howto page for information on getting started using ext4.

Where do I get the latest version of e2fsprogs?

The latest version of e2fsprogs can be found at Soureforge or at kernel.org. Recently released versions of e2fsprogs support most of the ext4 features (excluding > 16TB support, as of 2010-11-28), so there is not a requirement to build an e2fsprogs release for using ext4.

How do I build e2fsprogs?

The INSTALL file in the top of the source tree gives more detailed information, but e2fsprogs uses a standard configure script, so the standard "./configure; make" will build the e2fsprogs binaries. Note that if you wish to build the ELF shared libraries, you need to add the "--enable-elf-shlibs" option to the configure invocation.

How do I create and mount a new ext4 filesystem?

First, make sure that you have e2fsprogs 1.41.0 or later installed on your system. This is required for ext4 support. If the new partition where you would like to create the ext4 filesystem is /dev/sdb1, then all you have to type is:

/sbin/mke2fs -t ext4 /dev/sdb1

Then to mount this new filesystem, all you need to do is:

mkdir /mnt/test
mount -t ext4 /dev/sdb1 /mnt/test

For more information, please see the Ext4 Howto document.



History of ext2, ext3, and ext4

What is the difference between ext2, ext3, and ext4?

The ext2, ext3, and ext4 file systems are a family of file systems that have a strong amount of backwards and forward compatibility. In fact, they can be considered a single filesytem format with a number of feature extensions, and ext2, ext3, and ext4 are merely the names of the implementations found in the Linux kernel. This way of looking at things is supported by the fact that they share the same userspace utilities (e2fsprogs), and that many filesystems can be mounted on different filesystems. For example, a filesystem which is created for use with ext3 can be mounted using either ext2 or ext4. However, a filesystem with ext4-specific extensions can not be mounted using ext2 or ext3, and the ext3 file systems code in the kernel requires the presence of a journal, which is generally not present in partitions formatted for use by the ext2 file system. The ext4 code has the ability to mount and use a filesystem without a journal.

Why was ext2 created?

In April 1992, the ext filesystem was written by Remy Card to address two key limitations with the Minix filesystem, which had previously been the only filesystem available to Linux: filenames could be only 14 characters, and the maximum file system size supported by Minix was 64MB. The ext filesystem supported block devices up to 2GB, and file names up to 255 characters, but (like Minix) it only had a single timestamp for last modification time, last access time, and inode change time. It also used linked lists to store free blocks, which meant that files tended to get fragmented very easily. In January, 1993, the ext2 filesystem was released which further increased the maximum block size to 4TB, added POSIX timestamps, and supported variable block sizes. More importantly, it added support for extensibility so that new features could be added to the filesystem.

File System Features

What features are supported by the ext2 filesystem?

As of this writing, the ext2 filesystem supports the following features:

  • Hash-indexed directories (EXT2_FEATURE_COMPAT_DIR_INDEX) (note: the ext2 filesystem only understands indexed directories in that it knows how to clear the indexed directory flag when it modifies such a directory)
  • Extended attribute blocks (EXT2_FEATURE_COMPAT_EXT_ATTR)
  • File type in directory entries (EXT2_FEATURE_INCOMPAT_FILETYPE)
  • Reduced block group backups (EXT2_FEATURE_INCOMPAT_META_BG)
  • Reduced superblock backups (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER)
  • Files larger than 2GB in size (EXT2_FEATURE_RO_COMPAT_LARGE_FILE)

What features are supported by the ext3 file system?

As of this writing, the ext3 file system supports the following features:

  • Extended attribute blocks and large inodes (EXT3_FEATURE_COMPAT_EXT_ATTR)
  • Online filesystem resize reservations (EXT3_FEATURE_COMPAT_RESIZE_INODE)
  • Hash-indexed directories (EXT3_FEATURE_COMPAT_DIR_INDEX)
  • Journal file/device present (EXT3_FEATURE_COMPAT_HAS_JOURNAL) (note: this feature *must* be set for ext3 to mount the filesystem)
  • File type in directory entries (EXT3_FEATURE_INCOMPAT_FILETYPE)
  • Journal recovery required (EXT3_FEATURE_INCOMPAT_RECOVER)
  • Reduced block group backups (EXT3_FEATURE_INCOMPAT_META_BG)
  • Reduced superblock backups (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)
  • Files larger than 2GB in size (EXT3_FEATURE_RO_COMPAT_LARGE_FILE)

What features are supported by the ext4 file system?

As of this writing, the ext4 file system supports the following features:

  • Extended attribute blocks and large inodes (EXT3_FEATURE_COMPAT_EXT_ATTR)
  • Online filesystem resize reservations (EXT3_FEATURE_COMPAT_RESIZE_INODE)
  • Hash-indexed directories (EXT3_FEATURE_COMPAT_DIR_INDEX)
  • Journal file/device present (EXT3_FEATURE_COMPAT_HAS_JOURNAL) (not required for ext4 to mount the filesystem)
  • File type in directory entries (EXT3_FEATURE_INCOMPAT_FILETYPE)
  • Journal recovery required (EXT3_FEATURE_INCOMPAT_RECOVER)
  • Files allocated with extent format (EXT4_FEATURE_INCOMPAT_EXTENTS)
  • Support for more than 2^32 filesystem blocks (EXT4_FEATURE_INCOMPAT_64BIT)
  • Flexible block group metadata location (EXT4_FEATURE_INCOMPAT_FLEX_BG)
  • Reduced block group backups (EXT3_FEATURE_INCOMPAT_META_BG)
  • Reduced superblock backups (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)
  • Files larger than 2GB in size (EXT3_FEATURE_RO_COMPAT_LARGE_FILE)
  • Group descriptor checksums and sparse inode table (EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
  • Over 32000 subdirectories (EXT4_FEATURE_RO_COMPAT_DIR_NLINK)
  • Nanosecond timestamps and creation time (EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)
  • Files larger than 2TB in size (EXT4_FEATURE_RO_COMPAT_HUGE_FILE)

Understanding how it works

What are the new features in Ext4 (vs Ext2/3)?

Have a look on the Ext4 features page.

How do I test the features in Ext4?

How do I benchmark the performance of Ext4 as against other FS? What are the tools available?

For any filesystem and hardware platform the best benchmark are the actual applications that will be running on the system. Benchmarks are approximations of real world applications, but they may not reflect the IO load of your applications.

  • There exists a wide variety of tools and comparison in the intertubes.

Can I undelete files in Ext4?

No, in the same way that the ext3 journal requirements to be consistent after a crash prevent undelete of ext3 files, it isn't possible to undelete ext4 files.

Can I mount existing Ext3 as Ext4? And vice versa? Similarly from Ext2 to Ext4 and its reverse?

With recent versions of ext4 (2.6.29 and later), you can mount any ext2 or ext3 filesystem as ext4 without any changes. You must use tune2fs to enable the new ext4 features:

# tune2fs -O extents,uninit_bg,huge_file /dev/DEV
# e2fsck -f /dev/DEV

If you want to create a journal on an ext2 filesystem that you have mounted as ext4, you must also issue the command:

# tune2fs -j /dev/DEV

Once you have enabled extents a former ext2 or ext3 filesystem, it is an ext4 filesystem and cannot be reverted to the old format.

If you have created a journal on a former ext2, it can be removed if it needs to be reverted to ext2:

# tune2fs -O ^has_journal /dev/DEV

Some ext4 features cannot be enabled on an existing ext3 filesystem.

See the Ext4 Howto for more details.

What is the information provided by /proc/fs/jbd2/partition/history?

Executing $ cat /proc/fs/jbd2/partition/history gives:

R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  close
R    7102  0     5000  0     1424  4     68681  5     6    
R    7103  0     5000  0     1644  4     64579  9     10   
R    7104  0     5000  0     856   32    38719  11    12   
R    7105  0     5000  0     1052  0     47142  12    13   
R    7106  0     5000  0     1172  16    56028  11    12   
R    7107  0     5000  0     1416  4     71047  11    12   
R    7108  0     5000  0     1640  4     81125  5     6    
R    7109  0     5000  0     1616  4     77314  6     7    
R    7110  0     5000  0     1640  0     76111  5     6    
:
:

The purpose of this history is to provide information on the behaviour of the ext4 journaling layer (JBD2). There is a line added to the history file for each journal transaction committed. The fields are:

R/C 
whether transaction is Running or Committed
NOTE!
the current JBD2 statistics only show results for the Running transaction and do not show the Commit statistics.
tid 
transaction ID is an internal identifier given to every JBD2 transaction
wait 
number of milliseconds spent waiting for the transaction to start. This may happen if the journal is too small and previous transactions have not checkpointed yet.
run 
number of milliseconds the transaction was running (default 5000ms = 5s). May be shorter if the transaction contains the maximum number of blocks (1/4 of the journal size) or if the application is doing synchronous operations.
lock 
number of milliseconds spent waiting for the transaction to be locked
flush 
number of milliseconds flushing blocks to the filesystem for ordered mode before the transaction can be committed
log 
number of milliseconds to write the blocks to the journal
hndls 
number of filesystem transaction handles for this journal transaction
block 
number of filesystem blocks in the transaction
inlog 
total number of blocks written to the journal for this transaction, including journal overhead

What is the information provided by /proc/fs/jbd2/<partition>/info?

Executing $ cat /proc/fs/jbd2/partition/info gives:


56 transaction, each upto 2048 blocks
average: 
 0ms waiting for transaction
 57671ms running transaction
 0ms transaction was being locked
 28ms flushing data (in ordered mode)
 14ms logging transaction
 2383 handles per transaction
 6 blocks per transaction
 7 logged blocks per transaction


This file shows the average statistics from the /proc/fs/jbd2/partition/history file since the filesystem was first mounted.

How to online resize the Ext4 filesystem?

Online resizing of ext4 works in a similar manner as ext3, using either resize2fs or ext2resize, but there is currently a limit (around 4TB or so) to the maximum filesystem size. Implementing online resize with the META_BG feature would allow this limit to be exceeded.

What is the difference between extents mapping and traditional indirect block mapping?

To quote from the 2005-OLS-ext3 paper:

Currently, the ext2/ext3 filesystem, like other traditional UNIX filesystems, uses a direct, indi-
rect, double indirect, and triple indirect blocks to map file offsets to on-disk blocks. This
scheme, sometimes simply called an indirect block mapping scheme, is not efficient for large
files, especially large file deletion. In order to address this problem, many modern filesystems
(including XFS and JFS on Linux) use some form of extent maps instead of the traditional
indirect block mapping scheme.

Since most filesystems try to allocate blocks in a contiguous fashion, extent maps are a more efficient 
way to represent the mapping between logical and physical blocks for large files. An extent is a single 
descriptor for a range of contiguous blocks, instead of using, say hundreds of entries to describe 
each block individually.

What is delayed allocation (delalloc)? What are its advantages in Ext4?

Delayed allocation works by deferring the mapping of newly-written file data blocks to disk blocks in the filesystem until writeback time. This helps in several ways:

  1. Reduced filesystem fragmentation, because all (or a large number) of blocks for a single file can be allocated at the same time. Knowing the total number of blocks in each file allows the block allocator (mballoc) to find a suitable chunk of free space for each file instead of picking a free chunk that is too large or too small.
  2. Reduced CPU cycles spent in block allocation, because the block allocator can allocate many or all of the blocks for the file at one time, instead of doing searching and locking for each block in the file as it is written without delayed allocation.
  3. It may avoid the need for disk updates for metadata creation for short-lived files, which in turn reduces fragmentation.

What is multiblock allocation (mballoc)?

mballoc is a mechanism to allow many blocks to be allocated to a file in a single operation, in order to dramatically reduce the amount of CPU usage searching for many free blocks in the filesystem. Also, because many file blocks are allocated at the same time, a much better decision can be made to find a chunk of free space where all of the blocks will fit.

The mballoc code is active when using the O_DIRECT flag for writes, or if the delayed allocation (delalloc) feature is being used. This allows the file to have many dirty blocks submitted for writes at the same time, unlike the existing kernel mechanism of submitting each block to the filesystem separately for allocation.

What is the bitmap allocator?

The allocator used in ext2 and ext3 would scan the free blocks bitmap for every new block written to a file. This was inefficient, and the block allocator in ext4 (mballoc) replaced the bitmap allocator and is one of the reasons ext4 is much faster than ext3.

Can you say something about the history of Ext4?

Check here: http://en.wikipedia.org/wiki/Ext4.

When was Ext4 first annouced to the LKML?

The ext4 filesystem has been announced on 28 June 2006.

What are the key differences between ext3 and ext4?

The main new features in ext4 are below, and are described more fully in New_ext4_features:

  • extent-mapped files for more efficient storage of file metadata (EXTENTS)
  • multi-block and delayed allocation for faster/better file allocations
  • support for larger filesystems (up to 2^48 blocks, currently 2^60 bytes) (64_BIT)
  • optimized storage of filesystem metadata like bitmaps and inode table (FLEX_BG)
  • less overhead for e2fsck, on-disk checksum of group descriptors (GDT_CSUM)
  • removed 32000 subdirectory limit (DIR_NLINKS)
  • nanosecond inode timestamps (EXTRA_ISIZE)

In addition, use of a journal is optional and may be enabled or disabled with tune2fs and the "has_journal" option.

What are the key differences between jbd and jbd2?

The code between jbd and jbd2 is largely the same, but jbd2 adds a few new features in a compatible way:

  • support for 64-bit filesystems (64_BIT)
  • checksumming of journal transactions (CHECKSUM)
  • asynchronous transaction commit block write (ASYNC_COMMIT)

In addition, jbd2 implements a new ordered mode for flushing data blocks to the filesystem that works in conjunction with delayed allocation to avoid blocking journal commits when there is a lot of data being written to the filesystem. This avoids long delays for fsync() operations when another thread is doing heavy writes to the filesystem.

Can I use ext4 on Solid-state drives (SSD)?

Yes, SSD is generally no different to ext4 than any other block device. With modern solid-state disks, you can even put the journal on the SSD as well.

Personal tools