Talk:Clarifying Direct IO's Semantics

From Ext4
Jump to: navigation, search

Solaris's behavior

The Solaris mount_ufs man page suggests:

            If forcedirectio is specified [...] data  is transferred
            directly between user address space and the disk. 
            forcedirectio is a performance option that  is
            of  benefit only in large sequential data transfers.
            The default behavior is noforcedirectio.

[That was a quote: a paraphrase follows at the end of this page]

Note the mention of large sequential I/O: in a recent project we were pleased (and the customer was a little surprised) to find that Solaris UFS was coalescing many contiguous logical writes into a substantially smaller number of large physical writes. This improved their performance when doing full-table scans and large updates.

There is more discussion in the directio man page, where they note that buffered I/O is used if the buffer is misaligned or mmap'd, and:

     Large  sequential   I/O   generally   performs   best   with
    DIRECTIO_ON,  except  when  a  file  is  sparse  or is being
    extended  and  is  opened  with  O_SYNC  or   O_DSYNC   

[Another quote]

Again, they recommend direct i/o for large read or writes.

To paraphrase for copyright purposes, one might say:

Solaris provides "forcedirectio" as a mount option, and when it
is applied, data is transferred without being copied to the
buffer cache. It is recommended as a performance optimization
when large amounts of data are transferred sequentially, unlike
other discussion of direct I/O. 

In practice, forecedirectio indeed does appear to
coalesce multiple contiguous logical writes into a substantially 
smaller number of larger physical writes. This improves 
performance when doing full-table scans or other large
I/O operations. 
See man mount_ufs(1M), directio(3C)


Dave, if Solaris is coalescing multiple write requests into a smaller number of physical writes, that implies that the actual write to disk has not been completed at the time when the write(2) system call returns. Otherwise, it would not be possible to coalesce the write request with other write requests. But that raises a major problem; how does the application know when it is safe to reuse the buffer passed to the write(2) system call? Are you sure Solaris really does write coalescing when directio is enabled? I see no documentation of that on the Solaris man pages; just your claim here. How did you test for it? Tytso 19:16, 27 August 2009 (UTC)
Personal tools