Life of an ext4 write request

From Ext4
(Difference between revisions)
Jump to: navigation, search
(Life of a direct I/O write)
(Life of a direct I/O write)
Line 43: Line 43:
  
 
= Life of a direct I/O write =
 
= Life of a direct I/O write =
 +
 +
(Edited by Nauman. Ted or Jiaying, please verify.)
  
 
Ext4 has specially optimized way of doing direct IO writes to the
 
Ext4 has specially optimized way of doing direct IO writes to the
files, as long as the writes are not size extending writes. The most
+
files, as long as the writes are not size extending writes.
relevant logic can be found at ext4_ext_direct_IO.
+
 
 +
The most relevant logic can be found at ext4_ext_direct_IO function in
 +
fs/ext4/inode.c. At the start of a write, we set the inode state bit
 +
to EXT4_STATE_DIO_UNWRITTEN. That indicates that writes are under way
 +
for this file. The extents on which the write happens are not marked
 +
initialized, to prevent the stale data from being exposed to parallel
 +
buffered reads. If extents are being allocated for write, they are
 +
marked uninitialized.
 +
 
 +
For (non-AIO) direct IO, submit and completion
 +
coincide, so we immediately convert newly written extents to initialized
 +
by calling the function ext4_map_blocks, with flags set to
 +
EXT4_GET_BLOCKS_IO_CONVERT_EXT.
 +
 
 +
For (AIO) direct IO case, we defer the conversion to be done by
 +
ext4_end_io_dio which gets called when IO completion happens.
 +
 
 +
If writes are extending file size, they actually get handled by
 +
ext4_ind_direct_IO. The name is misleading ('ind' is supposed to
 +
indicate indirect mapping, not extent mapping). This function adds
 +
inode to orphan list, for the case when there is a crash in the middle
 +
of write.
  
 
= Life of an async direct I/O write =
 
= Life of an async direct I/O write =

Revision as of 23:40, 3 May 2011

Contents

Introduction

This article describes how various different ext4 write requests are handled in Linux 2.6.34 -- 2.6.39. Special attention will be paid to how ext4_map_blocks() is called, and with what EXT4_GET_BLOCKS_* flags. In addition how quota is reserved, claimed, and released will be discussed.

Flags passed to ext4_map_blocks()

The primary function of the ext4_map_blocks() function is to translate an inode and logical block number to a physical block number. If a mapping between a particular logical block number and physical block number does not exist, ext4_map_blocks() may create such a mapping, allocating blocks as necessary. However, there are a number of other things which ext4_map_blocks() can do, based on a flags bitmap which is passed to it. In some cases, a particular flag to ext4_map_blocks() can radically change its behavior. So it's important to document the current ext4_map_blocks flags and what they do.

At one point, ext4_map_blocks() had previously been called ext4_get_blocks(), which is the reason for the naming convention of these flags:

EXT4_GET_BLOCKS_CREATE 
to be written
EXT4_GET_BLOCKS_UNINIT_EXT 
to be written
EXT4_GET_BLOCKS_CREATE_UNINIT_EXT 
to be written
EXT4_GET_BLOCKS_DELALLOC_RESERVE 
to be written
EXT4_GET_BLOCKS_PRE_IO 
to be written
EXT4_GET_BLOCKS_CONVERT 
to be written
EXT4_GET_BLOCKS_IO_CREATE_EXT 
to be written
EXT4_GET_BLOCKS_IO_CONVERT_EXT 
to be written

Life of a nodelalloc buffered write

The write request

Description of how a write request happens from userspace (i.e., the codepath from generic_file_buffered_write() calling ext4_write_begin() and ext4_{writeback,ordered,journalled}_write_end() and/or the codepath from page_mkwrite() calling ext4_page_mkwrite()).

I/O submission

What happens when generic_writepages() calls ext4_write_cache_pages() which then calls ext4_writepage()

Life of a delalloc buffered write

The write request

Description of how a write request happens from userspace (i.e., the codepath from generic_file_buffered_write() calling ext4_da_write_begin() and ext4_da_write_end() and/or the codepath from page_mkwrite() calling ext4_page_mkwrite()).

I/O submission

What happens in ext4_writepages()

Modifications with dioread_nolock

What changes if dioread_nolock is enabled.

Life of a direct I/O write

(Edited by Nauman. Ted or Jiaying, please verify.)

Ext4 has specially optimized way of doing direct IO writes to the files, as long as the writes are not size extending writes.

The most relevant logic can be found at ext4_ext_direct_IO function in fs/ext4/inode.c. At the start of a write, we set the inode state bit to EXT4_STATE_DIO_UNWRITTEN. That indicates that writes are under way for this file. The extents on which the write happens are not marked initialized, to prevent the stale data from being exposed to parallel buffered reads. If extents are being allocated for write, they are marked uninitialized.

For (non-AIO) direct IO, submit and completion coincide, so we immediately convert newly written extents to initialized by calling the function ext4_map_blocks, with flags set to EXT4_GET_BLOCKS_IO_CONVERT_EXT.

For (AIO) direct IO case, we defer the conversion to be done by ext4_end_io_dio which gets called when IO completion happens.

If writes are extending file size, they actually get handled by ext4_ind_direct_IO. The name is misleading ('ind' is supposed to indicate indirect mapping, not extent mapping). This function adds inode to orphan list, for the case when there is a crash in the middle of write.

Life of an async direct I/O write

Personal tools