| rfc9766.original | rfc9766.txt | |||
|---|---|---|---|---|
| Network File System Version 4 T. Haynes | Internet Engineering Task Force (IETF) T. Haynes | |||
| Internet-Draft T. Myklebust | Request for Comments: 9766 T. Myklebust | |||
| Intended status: Standards Track Hammerspace | Category: Standards Track Hammerspace | |||
| Expires: 11 August 2025 7 February 2025 | ISSN: 2070-1721 April 2025 | |||
| Add LAYOUT_WCC to NFSv4.2's Flex File Layout Type | Extensions for Weak Cache Consistency in NFSv4.2's Flexible File Layout | |||
| draft-ietf-nfsv4-layoutwcc-07 | ||||
| Abstract | Abstract | |||
| This document specifies extensions to the parallel Network File | This document specifies extensions to NFSv4.2 for improving Weak | |||
| System (NFS) version 4 (pNFS) for improving write cache consistency. | Cache Consistency (WCC). These extensions introduce mechanisms that | |||
| These extensions introduce mechanisms that ensure partial writes | ensure partial writes performed under a Parallel NFS (pNFS) layout | |||
| performed under a pNFS layout remain coherent and correctly tracked. | remain coherent and correctly tracked. The solution addresses | |||
| The solution addresses concurrency and data integrity concerns that | concurrency and data integrity concerns that may arise when multiple | |||
| may arise when multiple clients write to the same file through | clients write to the same file through separate data servers. By | |||
| separate data servers. By defining additional interactions among | defining additional interactions among clients, metadata servers, and | |||
| clients, metadata servers, and data servers, this specification | data servers, this specification enhances the reliability of NFSv4 in | |||
| enhances the reliability of NFSv4 in parallel-access environments and | parallel-access environments and ensures consistency across diverse | |||
| ensures consistency across diverse deployment scenarios. | deployment scenarios. | |||
| Note | ||||
| This note is to be removed before publishing as an RFC. | ||||
| Discussion of this draft takes place on the NFSv4 working group | ||||
| mailing list (nfsv4@ietf.org), which is archived at | ||||
| https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group | ||||
| information can be found at https://datatracker.ietf.org/wg/nfsv4/ | ||||
| about/. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 11 August 2025. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9766. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
| 1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Definitions | |||
| 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.2. Requirements Language | |||
| 2. Weak Cache Consistency (WCC) . . . . . . . . . . . . . . . . 4 | 2. Weak Cache Consistency (WCC) | |||
| 3. Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency . . 5 | 3. Operation 77: LAYOUT_WCC - Layout Weak Cache Consistency | |||
| 3.4. Implementation . . . . . . . . . . . . . . . . . . . . . 6 | 3.1. ARGUMENT | |||
| 3.4.1. Examples of when to use LAYOUT_WCC . . . . . . . . . 6 | 3.2. RESULT | |||
| 3.4.2. Examples of what to send in the LAYOUT_WCC . . . . . 7 | 3.3. DESCRIPTION | |||
| 3.5. Allowed Errors . . . . . . . . . . . . . . . . . . . . . 8 | 3.4. Implementation | |||
| 3.6. Extension of Existing Implementations . . . . . . . . . . 9 | 3.4.1. Examples of When to Use LAYOUT_WCC | |||
| 3.7. Flex Files Layout Type . . . . . . . . . . . . . . . . . 9 | 3.4.2. Examples of What to Send in LAYOUT_WCC | |||
| 4. Extraction of XDR . . . . . . . . . . . . . . . . . . . . . . 10 | 3.5. Allowed Errors | |||
| 4.1. Code Components Licensing Notice . . . . . . . . . . . . 11 | 3.6. Extension of Existing Implementations | |||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | 3.7. Flexible File Layout Type | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 4. Extraction of XDR | |||
| 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 5. Security Considerations | |||
| 7.1. Normative References . . . . . . . . . . . . . . . . . . 11 | 6. IANA Considerations | |||
| 7.2. Informative References . . . . . . . . . . . . . . . . . 12 | 7. References | |||
| Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 13 | 7.1. Normative References | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | 7.2. Informative References | |||
| Acknowledgments | ||||
| Authors' Addresses | ||||
| 1. Introduction | 1. Introduction | |||
| In the Network File System version 4 (NFSv4) with a Parallel NFS | In the Parallel NFS (pNFS) flexible file layout (see [RFC8435]), | |||
| (pNFS) Flexible File Layout (see Section 12 of [RFC8435]) server, | ||||
| there is no mechanism for the data servers to update the metadata | there is no mechanism for the data servers to update the metadata | |||
| servers for when the data portion of the file is modified. The | servers when the data portion of the file is modified. The metadata | |||
| metadata server needs this knowledge to correspondingly update the | server needs this knowledge to correspondingly update the metadata | |||
| metadata portion of the file. If the client is using NFSv3 as the | portion of the file. If the client is using NFSv3 as the protocol | |||
| protocol with the data server, it can leverage weak cache consistency | with the data server, it can leverage Weak Cache Consistency (WCC) to | |||
| (WCC) to update the metadata server of the attribute changes. In | update the metadata server of the attribute changes. In this | |||
| this document, we introduce a new operation called LAYOUT_WCC to | document, we introduce a new operation called LAYOUT_WCC to NFSv4.2, | |||
| NFSv4.2 which allows the client to periodically report the attributes | which allows the client to periodically report the attributes of the | |||
| of the data files to the metadata server. | data files to the metadata server. | |||
| Using the process detailed in [RFC8178], the revisions in this | Using the process detailed in [RFC8178], the revisions in this | |||
| document become an extension of NFSv4.2 [RFC7862]. They are built on | document become an extension of NFSv4.2 [RFC7862]. They are built on | |||
| top of the external data representation (XDR) [RFC4506] generated | top of the External Data Representation (XDR) [RFC4506] generated | |||
| from [RFC7863]. | from [RFC7863]. | |||
| 1.1. Definitions | 1.1. Definitions | |||
| For a more comprehensive set of definitions, see Section 1.1 of | For a more comprehensive set of definitions, see Section 1.1 of | |||
| [RFC8435]. | [RFC8435]. | |||
| (file) data: that part of the file system object that contains the | (file) data: that part of the file system object that contains the | |||
| data to be read or written. It is the contents of the object | data to be read or written. It is the contents of the object | |||
| rather than the attributes of the object. | rather than the attributes of the object. | |||
| skipping to change at page 3, line 38 ¶ | skipping to change at line 120 ¶ | |||
| metadata server (MDS): the pNFS server that provides metadata | metadata server (MDS): the pNFS server that provides metadata | |||
| information for a file system object. | information for a file system object. | |||
| storage device: the target to which clients may direct I/O requests | storage device: the target to which clients may direct I/O requests | |||
| when they hold an appropriate layout. Note that each data server | when they hold an appropriate layout. Note that each data server | |||
| is a storage device but that some storage device are not data | is a storage device but that some storage device are not data | |||
| servers. (See Section 2.1 of [RFC8434] for a discussion on the | servers. (See Section 2.1 of [RFC8434] for a discussion on the | |||
| difference between a data server and a storage device.) | difference between a data server and a storage device.) | |||
| weak cache consistency (WCC): In NFSv3, WCC allows the client to | weak cache consistency (WCC): the mechanism in NFSv3 that allows the | |||
| check for file attribute changes before and after an operation | client to check for file attribute changes before and after an | |||
| (See Section 2.6 of [RFC1813]). | operation (see Section 2.6 of [RFC1813]). | |||
| 1.2. Requirements Language | 1.2. Requirements Language | |||
| The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| 'OPTIONAL' in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 2. Weak Cache Consistency (WCC) | 2. Weak Cache Consistency (WCC) | |||
| A pNFS layout type enables the metadata server to inform the client | A pNFS layout type enables the metadata server to inform the client | |||
| of both the storage protocol and the locations of the data that the | of both the storage protocol and the locations of the data that the | |||
| client should use when communicating with the storage devices. The | client should use when communicating with the storage devices. The | |||
| Flex Files Layout Type, as specified in [RFC8435], describes how data | flexible file layout type, as specified in [RFC8435], describes how | |||
| servers using NFSv3 can be accessed. The client is restricted to | data servers using NFSv3 can be accessed. The client is restricted | |||
| performing NFSv3 READ (Section 3.3.6 of [RFC1813]), WRITE | to performing the following NFSv3 operations on the filehandles | |||
| (Section 3.3.6 of [RFC1813]), and COMMIT (Section 3.3.21 of | provided in the layout: READ, WRITE, and COMMIT (see Sections 3.3.6, | |||
| [RFC1813]) operations on the file handles provided in the layout. In | 3.3.7, and 3.3.21 of [RFC1813], respectively). In other words, the | |||
| other words, the client may only use NFSv3 operations that act | client may only use NFSv3 operations that act directly on the data | |||
| directly on the data portion of the file. | portion of the file. | |||
| Because there is no contol protocol (see [RFC8434]) possible with all | Because there is no control protocol (see [RFC8434]) possible with | |||
| data servers, NFSv3 is used as the control protocol. As such, the | all data servers, NFSv3 is used as the control protocol. As such, | |||
| NFSv3 CREATE (see Section 3.3.8 of [RFC1813]), GETATTR (see | the following NFSv3 operations are commonly used by the metadata | |||
| Section 3.3.1 of [RFC1813]), and SETATTR (see Section 3.3.2 of | server: CREATE, GETATTR, and SETATTR (see Sections 3.3.8, 3.3.1, and | |||
| [RFC1813]) are operations commonly used by the metadata server. | 3.3.2 of [RFC1813], respectively). That is, the metadata server is | |||
| I.e., the metadata server is only allowed to use NFSv3 operations | only allowed to use NFSv3 operations that directly act on the | |||
| which directly act on the metadata portion of the data file. GETATTR | metadata portion of the data file. GETATTR allows the metadata | |||
| allows the metadata server to mainly retrieve the mtime (modify | server to mainly retrieve the mtime (modify time), ctime (change | |||
| time), ctime (change time), and atime (access time). The metadata | time), and atime (access time). The metadata server can use this | |||
| server can use this information to determine if the client modified | information to determine if the client modified the file whilst it | |||
| the file whilst it held an iomode of LAYOUTIOMODE4_RW (see | held an iomode of LAYOUTIOMODE4_RW (see Section 3.3.20 of [RFC8881]). | |||
| Section 3.3.20 of [RFC8881]). Then it can determine the time_modify | Then it can determine the following for the metadata file: | |||
| (see Section 5.8.2.43 of [RFC8881]), time_metadata (see | time_modify, time_metadata, and time_access (see Sections 5.8.2.43, | |||
| Section 5.8.2.42 of [RFC8881]), and time_access (see Section 5.8.2.37 | 5.8.2.42, and 5.8.2.37 of [RFC8881], respectively). That is, it can | |||
| of [RFC8881]) for the metadata file. I.e., the information to return | determine the information to return to clients in an NFSv4.2 GETATTR | |||
| to clients in a NFSv4.2 GETATTR response. | response. | |||
| For example, the metadata server might issue an NFSv3 GETATTR | For example, the metadata server might issue an NFSv3 GETATTR | |||
| operation to the data server, which is typically triggered by a | operation to the data server, which is typically triggered by a | |||
| client's NFSv4 GETATTR request to the metadata server. In addition | client's NFSv4 GETATTR request to the metadata server. In addition | |||
| to the cost of each individual GETATTR operation, the data server can | to the cost of each individual GETATTR operation, the data server can | |||
| be overwhelmed by a large volume of such requests. NFSv3 addressed a | be overwhelmed by a large volume of such requests. NFSv3 addressed a | |||
| similar challenge by including a post-operation attribute in the READ | similar challenge by including a post-operation attribute in the READ | |||
| and WRITE operations to report weak cache consistency (WCC) data (see | and WRITE operations to report WCC data (see Section 2.6 of | |||
| Section 2.6 of [RFC1813]). | [RFC1813]). | |||
| Each NFSv3 operation entails a single round trip between the client | Each NFSv3 operation entails a single round trip between the client | |||
| and server. Consequently, issuing a WRITE followed by a GETATTR | and server. Consequently, issuing a WRITE followed by a GETATTR | |||
| would require two round trips. In that situation, the retrieved | would require two round trips. In that situation, the retrieved | |||
| attribute information is regarded as strict server-client | attribute information is regarded as having strict server-client | |||
| consistency. By contrast, NFSv4 enables a WRITE and GETATTR to be | consistency. By contrast, NFSv4 enables a WRITE and GETATTR to be | |||
| combined within a compound operation, which requires only one round | combined within a compound operation, which requires only one round | |||
| trip. This combined approach is likewise considered strict server- | trip. This combined approach is likewise considered to have strict | |||
| client consistency. Essentially, NFSv4 READ and WRITE operations | server-client consistency. Essentially, NFSv4 READ and WRITE | |||
| omit post-operation attributes, allowing the client to determine | operations omit post-operation attributes, allowing the client to | |||
| whether it requires that information. | determine whether it requires that information. | |||
| Whilst NFSv4 got rid of the requirement for WCC information to be | Whilst NFSv4 got rid of the requirement for WCC information to be | |||
| supplied by the WRITE or READ operations, the introduction of pNFS | supplied by the WRITE or READ operations, the introduction of pNFS | |||
| re-introduces the same problem. The metadata server has to | reintroduces the same problem. The metadata server has to | |||
| communicate with the data server in order to get at the data which | communicate with the data server in order to get the data that could | |||
| could be provided by a WCC model. | be provided by a WCC model. | |||
| With the flexible file layout type, the client can leverage the NFSv3 | With the flexible file layout type, the client can leverage the NFSv3 | |||
| WCC to service the proxying of times (See Section 4 of | WCC to service the proxying of times (see Section 5 of [RFC9754]), | |||
| [I-D.ietf-nfsv4-delstid]). But the granularity of this data is | but the granularity of this data is limited. With client-side | |||
| limited. With client side mirroring (See Section 8 of [RFC8435]), | mirroring (see Section 8 of [RFC8435]), the client has to aggregate | |||
| the client has to aggregate the N mirrored files in order to send one | the N mirrored files in order to send one piece of information | |||
| piece of information instead of N pieces of information. Also, the | instead of N pieces of information. Also, the client is limited to | |||
| client is limited to sending that information only when it returns | sending that information only when it returns the delegation. | |||
| the delegation. | ||||
| This document introduces a new NFSv4.2 operation, LAYOUT_WCC, which | This document introduces a new NFSv4.2 operation, LAYOUT_WCC, which | |||
| enables the client to provide the metadata server with information | enables the client to provide the metadata server with information | |||
| obtained from the data server. The client is responsible for | obtained from the data server. The client is responsible for | |||
| gathering the NFSv3 WCC data, returned by the three permissible NFSv3 | gathering the NFSv3 WCC data, returned by the three permissible NFSv3 | |||
| operations, and conveying it back to the metadata server as part of | operations, and conveying it back to the metadata server as part of | |||
| NFSv4.2 attributes. The metadata server MAY therefore avoid issuing | NFSv4.2 attributes. The metadata server MAY therefore avoid issuing | |||
| costly NFSv3 GETATTR calls to the data servers. Because this | costly NFSv3 GETATTR calls to the data servers. Because this | |||
| approach relies on a weak model, the metadata server MAY still | approach relies on a weak model, the metadata server MAY still | |||
| perform these calls if it chooses to strengthen the model. | perform these calls if it chooses to strengthen the model. | |||
| skipping to change at page 6, line 4 ¶ | skipping to change at line 217 ¶ | |||
| 3.1. ARGUMENT | 3.1. ARGUMENT | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| /// struct LAYOUT_WCC4args { | /// struct LAYOUT_WCC4args { | |||
| /// stateid4 lowa_stateid; | /// stateid4 lowa_stateid; | |||
| /// layouttype4 lowa_type; | /// layouttype4 lowa_type; | |||
| /// opaque lowa_body<>; | /// opaque lowa_body<>; | |||
| /// }; | /// }; | |||
| <CODE ENDS> | <CODE ENDS> | |||
| stateid4 is defined in Section 3.3.12 of [RFC8881]. layouttype4 is | stateid4 is defined in Section 3.3.12 of [RFC8881]. layouttype4 is | |||
| defined in Section 3.3.13 of [RFC8881]. | defined in Section 3.3.13 of [RFC8881]. | |||
| 3.2. RESULT | 3.2. RESULT | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| /// struct LAYOUT_WCC4res { | /// struct LAYOUT_WCC4res { | |||
| /// nfsstat4 lowr_status; | /// nfsstat4 lowr_status; | |||
| /// }; | /// }; | |||
| <CODE ENDS> | <CODE ENDS> | |||
| nfsstat4 is defined in Section 3.2 of [RFC8881]. | nfsstat4 is defined in Section 3.2 of [RFC8881]. | |||
| 3.3. DESCRIPTION | 3.3. DESCRIPTION | |||
| The current filehandle and the lowa_stateid identify the specific | The current filehandle and the lowa_stateid identify the specific | |||
| layout for the LAYOUT_WCC operation. The lowa_type indicates how to | layout for the LAYOUT_WCC operation. The lowa_type indicates how to | |||
| interpret the layout-type-specific payload contained in the lowa_body | interpret the layout-type-specific payload contained in the lowa_body | |||
| field. The lowa_type is the corresponding value from the IANA | field. The lowa_type is the corresponding value from the "pNFS | |||
| registry for 'pNFS Layout Types' for the layout type being used. | Layout Types" IANA registry for the layout type being used. | |||
| The lowa_body contains the data file attributes. The client is | The lowa_body contains the data file attributes. The client is | |||
| responsible for mapping NFSv3 post-operation attributes to the fattr4 | responsible for mapping NFSv3 post-operation attributes to the fattr4 | |||
| representation. Similar to the behavior of post-operation | representation. Similar to the behavior of post-operation | |||
| attributes, the client may ignore these attributes, and the server | attributes, the client may ignore these attributes, and the server | |||
| may also choose to ignore any attributes included in LAYOUT_WCC. | may also choose to ignore any attributes included in LAYOUT_WCC. | |||
| However, the server can use these attributes to avoid querying the | However, the server can use these attributes to avoid querying the | |||
| data server for data file attributes. Because these attributes are | data server for data file attributes. Because these attributes are | |||
| optional and the client has no recourse if the server opts to | optional and the client has no recourse if the server opts to | |||
| disregard them, there is no requirement to return a bitmap4 | disregard them, there is no requirement to return a bitmap4 | |||
| indicating which attributes have been accepted in the LAYOUT_WCC | indicating which attributes have been accepted in the LAYOUT_WCC | |||
| result. | result. | |||
| 3.4. Implementation | 3.4. Implementation | |||
| 3.4.1. Examples of when to use LAYOUT_WCC | 3.4.1. Examples of When to Use LAYOUT_WCC | |||
| The only way for the metadata server to detect modifications to the | The only way for the metadata server to detect modifications to the | |||
| data file is to probe the data servers via a GETATTR. It can compare | data file is to probe the data servers via a GETATTR. It can compare | |||
| the mtime results across multiple calls to detect a NFSv3 WRITE | the mtime results across multiple calls to detect an NFSv3 WRITE | |||
| operation by the client. Likewise, the atime results indicate the | operation by the client. Likewise, the atime results indicate the | |||
| client having issued a NFSv3 READ operation. As such, the client can | client having issued an NFSv3 READ operation. As such, the client | |||
| leverage the LAYOUT_WCC operation whenever it has the belief that the | can leverage the LAYOUT_WCC operation whenever it has the belief that | |||
| metadata server would need to refresh the attributes of the data | the metadata server would need to refresh the attributes of the data | |||
| files. While the client can send a LAYOUT_WCC at any time, there are | files. While the client can send a LAYOUT_WCC at any time, there are | |||
| times it will want to do this operation in order to avoid having the | times it will want to do this operation in order to avoid having the | |||
| metadata server issue NFSv3 GETATTR requests to the data servers: | metadata server issue NFSv3 GETATTR requests to the data servers: | |||
| * Whenever it sends a GETATTR for any of the following attributes: | * Whenever it sends a GETATTR for any of the following attributes: | |||
| size (see Section 5.8.1.5 of [RFC8881]), space_used (see | ||||
| Section 5.8.2.25 of [RFC8881]), change (see Section 5.8.1.4 of | - size (see Section 5.8.1.5 of [RFC8881]) | |||
| [RFC8881]), time_access (see Section 5.8.2.37 of [RFC8881]), | ||||
| time_metadata (see Section 5.8.2.42 of [RFC8881]), and time_modify | - space_used (see Section 5.8.2.35 of [RFC8881]) | |||
| (see Section 5.8.2.43 of [RFC8881]). | ||||
| - change (see Section 5.8.1.4 of [RFC8881]) | ||||
| - time_access (see Section 5.8.2.37 of [RFC8881]) | ||||
| - time_metadata (see Section 5.8.2.42 of [RFC8881]) | ||||
| - time_modify (see Section 5.8.2.43 of [RFC8881]) | ||||
| * Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or | * Whenever it sends an NFS4ERR_ACCESS error via LAYOUTRETURN or | |||
| LAYOUTERROR - it could have already gotten the NFSv3 uid and gid | LAYOUTERROR. It could have already gotten the NFSv3 uid and gid | |||
| values back in the WCC of the WRITE, READ, or COMMIT operation | values back in the WCC of the WRITE, READ, or COMMIT operation | |||
| which got the error. Thus it could report that information back | that got the error. Thus, it could report that information back | |||
| to the metadata server, saving it from querying that information | to the metadata server, saving it from querying that information | |||
| via a NFSv3 GETATTR. | via an NFSv3 GETATTR. | |||
| * Whenever it sends a SETATTR to refresh the proxied times (See | * Whenever it sends a SETATTR to refresh the proxied times (see | |||
| Section 4 of [I-D.ietf-nfsv4-delstid]) - the metadata server is | Section 5 of [RFC9754]). The metadata server will correlate these | |||
| going to want to correlate these times in order to detect later | times in order to detect later modification to the data file. | |||
| modification to the data file. | ||||
| 3.4.2. Examples of what to send in the LAYOUT_WCC | 3.4.2. Examples of What to Send in LAYOUT_WCC | |||
| The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT | The NFSv3 attributes returned in the WCC of WRITE, READ, and COMMIT | |||
| are a smaller subset of what can be transmitted as a NFSv4 attribute. | operations are a smaller subset of what can be transmitted as an | |||
| The mapping of NFSv3 to NFSv4 attributes is shown in Table 1. The | NFSv4 attribute. The mapping of NFSv3 to NFSv4 attributes is shown | |||
| LAYOUT_WCC MUST provide all of these attributes to the metadata | in Table 1. The LAYOUT_WCC MUST provide all of these attributes to | |||
| server. Both the uid and gid are stringified into their respective | the metadata server. Both the uid and gid are stringified into their | |||
| attributes of owner and owner_group. The reason to provide these two | respective attributes of owner and owner_group. In the case of | |||
| attributes is in case of NFS4ERR_ACCESS, the metadata server can | NFS4ERR_ACCESS, the reason to provide these two attributes is that | |||
| compare what it expects the values of the uid and gid of the data | the metadata server can compare what it expects the values of the uid | |||
| file to be versus the actual values. It can then repair the | and gid of the data file to be versus the actual values. It can then | |||
| permissions as needed or modify the expected values it has cached. | repair the permissions as needed or modify the expected values it has | |||
| cached. | ||||
| +=================+===================+ | +=================+===================+ | |||
| | NFSv3 Attribute | NFSv4.2 Attribute | | | NFSv3 Attribute | NFSv4.2 Attribute | | |||
| +=================+===================+ | +=================+===================+ | |||
| | size | size | | | size | size | | |||
| +-----------------+-------------------+ | +-----------------+-------------------+ | |||
| | used | space_used | | | used | space_used | | |||
| +-----------------+-------------------+ | +-----------------+-------------------+ | |||
| | mode | mode | | | mode | mode | | |||
| +-----------------+-------------------+ | +-----------------+-------------------+ | |||
| skipping to change at page 8, line 30 ¶ | skipping to change at line 330 ¶ | |||
| | mtime | time_modify | | | mtime | time_modify | | |||
| +-----------------+-------------------+ | +-----------------+-------------------+ | |||
| | ctime | time_metadata | | | ctime | time_metadata | | |||
| +-----------------+-------------------+ | +-----------------+-------------------+ | |||
| Table 1: NFSv3 to NFSv4.2 Attribute | Table 1: NFSv3 to NFSv4.2 Attribute | |||
| Mappings | Mappings | |||
| 3.5. Allowed Errors | 3.5. Allowed Errors | |||
| The LAYOUT_WCC operation can raise the errors in Table 2. When an | The LAYOUT_WCC operation can raise the errors listed in Table 2. | |||
| error is encountered, the metadata server can decide to ignore the | When an error is encountered, the metadata server can decide to | |||
| entire operation or depending on the layout type specific payload, it | ignore the entire operation, or depending on the layout-type-specific | |||
| could decide to apply a portion of the payload. Note that there are | payload, it could decide to apply a portion of the payload. Note | |||
| no new errors introduced for the LAYOUT_WCC operation and the errors | that there are no new errors introduced for the LAYOUT_WCC operation | |||
| in Table 2 are each defined in Section 15.1 of [RFC8881]. Table 2 | and the errors in Table 2 are each defined in Section 15.1 of | |||
| can be considered as an extension of Section 15.2 of [RFC8881]. | [RFC8881]. Table 2 can be considered as an extension of Section 15.2 | |||
| of [RFC8881]. | ||||
| +============+====================================================+ | +============+====================================================+ | |||
| | Operation | Errors | | | Operation | Errors | | |||
| +============+====================================================+ | +============+====================================================+ | |||
| | LAYOUT_WCC | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | | LAYOUT_WCC | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | |||
| | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | |||
| | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | |||
| | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | |||
| | | NFS4ERR_INVAL, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | | | NFS4ERR_INVAL, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | |||
| | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | |||
| skipping to change at page 9, line 27 ¶ | skipping to change at line 361 ¶ | |||
| | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | | | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, | | |||
| | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | | | | NFS4ERR_STALE, NFS4ERR_TOO_MANY_OPS, | | |||
| | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | |||
| | | NFS4ERR_WRONG_TYPE | | | | NFS4ERR_WRONG_TYPE | | |||
| +------------+----------------------------------------------------+ | +------------+----------------------------------------------------+ | |||
| Table 2: Operations and Their Valid Errors | Table 2: Operations and Their Valid Errors | |||
| 3.6. Extension of Existing Implementations | 3.6. Extension of Existing Implementations | |||
| The new LAYOUT_WCC operation is OPTIONAL for both NFSv4.2 ([RFC7863]) | The new LAYOUT_WCC operation is OPTIONAL for both NFSv4.2 [RFC7863] | |||
| and the flexible file layout type ([RFC8435]). | and the flexible file layout type [RFC8435]. | |||
| 3.7. Flex Files Layout Type | 3.7. Flexible File Layout Type | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| /// struct ff_data_server_wcc4 { | /// struct ff_data_server_wcc4 { | |||
| /// deviceid4 ffdsw_deviceid; | /// deviceid4 ffdsw_deviceid; | |||
| /// stateid4 ffdsw_stateid; | /// stateid4 ffdsw_stateid; | |||
| /// nfs_fh4 ffdsw_fh_vers<>; | /// nfs_fh4 ffdsw_fh_vers<>; | |||
| /// fattr4 ffdsw_attributes; | /// fattr4 ffdsw_attributes; | |||
| /// }; | /// }; | |||
| /// | /// | |||
| /// struct ff_mirror_wcc4 { | /// struct ff_mirror_wcc4 { | |||
| /// ff_data_server_wcc4 ffmw_data_servers<>; | /// ff_data_server_wcc4 ffmw_data_servers<>; | |||
| /// }; | /// }; | |||
| /// | /// | |||
| /// struct ff_layout_wcc4 { | /// struct ff_layout_wcc4 { | |||
| /// ff_mirror_wcc4 fflw_mirrors<>; | /// ff_mirror_wcc4 fflw_mirrors<>; | |||
| /// }; | /// }; | |||
| <CODE ENDS> | <CODE ENDS> | |||
| The flex file layout type specific results MUST correspond to the | The results specific to the flexible file layout type MUST correspond | |||
| ff_layout4 data structure as defined in Section 5.1 of [RFC8435]. | to the ff_layout4 data structure as defined in Section 5.1 of | |||
| There MUST be a one-to-one correspondence between: | [RFC8435]. There MUST be a one-to-one correspondence between the | |||
| following: | ||||
| * ff_data_server4 -> ff_data_server_wcc4 | * ff_data_server4 -> ff_data_server_wcc4 | |||
| * ff_mirror4 -> ff_mirror_wcc4 | * ff_mirror4 -> ff_mirror_wcc4 | |||
| * ff_layout4 -> ff_layout_wcc4 | * ff_layout4 -> ff_layout_wcc4 | |||
| Each ff_layout4 has an array of ff_mirror4, which have an array of | Each ff_layout4 has an array of ff_mirror4, which has an array of | |||
| ff_data_server4. Based on the current filehandle and the | ff_data_server4. Based on the current filehandle and the | |||
| lowa_stateid, the server can match the reported attributes. | lowa_stateid, the server can match the reported attributes. | |||
| But the positional correspondence between the elements is not | But the positional correspondence between the elements is not | |||
| sufficient to determine the attributes to update. Consider the case | sufficient to determine the attributes to update. Consider the case | |||
| where a layout had three mirrors and two of them had updated | where a layout has three mirrors and two of them have updated | |||
| attributes, but the third did not. A client could decide to present | attributes but the third does not. A client could decide to present | |||
| all three mirrors, with one mirror having an attribute mask with no | all three mirrors, with one mirror having an attribute mask with no | |||
| attributes present. Or it could decide to present only the two | attributes present. Or it could decide to present only the two | |||
| mirrors which had been changed. | mirrors that had been changed. | |||
| In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and | In either case, the combination of ffdsw_deviceid, ffdsw_stateid, and | |||
| ffdsw_fh_vers will uniquely identify the attributes to be updated. | ffdsw_fh_vers will uniquely identify the attributes to be updated. | |||
| All three arguments are required. A layout might have multiple data | All three arguments are required. A layout might have multiple data | |||
| files on the same storage device, in which case the ffdsw_deviceid | files on the same storage device, in which case the ffdsw_deviceid | |||
| and ffdsw_stateid would match, but the ffdsw_fh_vers would not. | and ffdsw_stateid would match, but the ffdsw_fh_vers would not. | |||
| The ffdsw_attributes are processed similar to the obj_attributes in | The ffdsw_attributes are processed similar to the obj_attributes in | |||
| the SETATTR arguments (See Section 18.34 of [RFC8881]). | the SETATTR arguments (see Section 18.30 of [RFC8881]). | |||
| 4. Extraction of XDR | 4. Extraction of XDR | |||
| This document contains the external data representation (XDR) | This document contains the XDR [RFC4506] description of the new | |||
| [RFC4506] description of the new open flags for delegating the file | NFSv4.2 operation LAYOUT_WCC. The XDR description is embedded in | |||
| to the client. The XDR description is embedded in this document in a | this document in a way that makes it simple for the reader to extract | |||
| way that makes it simple for the reader to extract into a ready-to- | into a ready-to-compile form. The reader can feed this document into | |||
| compile form. The reader can feed this document into the following | the following shell script to produce the machine-readable XDR | |||
| shell script to produce the machine-readable XDR description of the | description of the new NFSv4.2 operation LAYOUT_WCC. | |||
| new flags: | ||||
| <CODE BEGINS> | <CODE BEGINS> | |||
| #!/bin/sh | #!/bin/sh | |||
| grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' | grep '^ *///' $* | sed 's?^ */// ??' | sed 's?^ *///$??' | |||
| <CODE ENDS> | <CODE ENDS> | |||
| That is, if the above script is stored in a file called 'extract.sh', | That is, if the above script is stored in a file called 'extract.sh', | |||
| and this document is in a file called 'spec.txt', then the reader can | and this document is in a file called 'spec.txt', then the reader can | |||
| do: | do: | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| sh extract.sh < spec.txt > layout_wcc.x | sh extract.sh < spec.txt > layout_wcc.x | |||
| <CODE ENDS> | <CODE ENDS> | |||
| The effect of the script is to remove leading white space from each | The effect of the script is to remove leading blank space from each | |||
| line, plus a sentinel sequence of '///'. XDR descriptions with the | line, plus a sentinel sequence of '///'. XDR descriptions with the | |||
| sentinel sequence are embedded throughout the document. | sentinel sequence are embedded throughout the document. | |||
| Note that the XDR code contained in this document depends on types | Note that the XDR code contained in this document depends on types | |||
| from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This | from the NFSv4.2 nfs4_prot.x file (generated from [RFC7863]). This | |||
| includes both nfs types that end with a 4, such as offset4, length4, | includes both nfs types that end with a 4 (such as offset4 and | |||
| etc., as well as more generic types such as uint32_t and uint64_t. | length4) as well as more generic types (such as uint32_t and | |||
| uint64_t). | ||||
| While the XDR can be appended to that from [RFC7863], the various | While the XDR can be appended to that from [RFC7863], the various | |||
| code snippets belong in their respective areas of that XDR. | code snippets belong in their respective areas of that XDR. | |||
| 4.1. Code Components Licensing Notice | ||||
| Both the XDR description and the scripts used for extracting the XDR | ||||
| description are Code Components as described in Section 4 of 'Legal | ||||
| Provisions Relating to IETF Documents' [LEGAL]. These Code | ||||
| Components are licensed according to the terms of that document. | ||||
| 5. Security Considerations | 5. Security Considerations | |||
| There are no new security considerations beyond those in [RFC8435]. | There are no new security considerations beyond those in [RFC8435]. | |||
| 6. IANA Considerations | 6. IANA Considerations | |||
| This section is to be removed before publishing as an RFC. | This document has no IANA actions. | |||
| There are no IANA considerations for this document. | ||||
| 7. References | 7. References | |||
| 7.1. Normative References | 7.1. Normative References | |||
| [I-D.ietf-nfsv4-delstid] | ||||
| Haynes, T. and T. Myklebust, "Extending the Opening of | ||||
| Files in NFSv4.2", Work in Progress, Internet-Draft, | ||||
| draft-ietf-nfsv4-delstid-08, 2 October 2024, | ||||
| <https://datatracker.ietf.org/doc/html/draft-ietf-nfsv4- | ||||
| delstid-08>. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC4506] Eisler, M., Ed., "XDR: External Data Representation | [RFC4506] Eisler, M., Ed., "XDR: External Data Representation | |||
| Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May | Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May | |||
| 2006, <https://www.rfc-editor.org/info/rfc4506>. | 2006, <https://www.rfc-editor.org/info/rfc4506>. | |||
| [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | |||
| skipping to change at page 12, line 39 ¶ | skipping to change at line 501 ¶ | |||
| [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible | [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible | |||
| File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, | File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, | |||
| <https://www.rfc-editor.org/info/rfc8435>. | <https://www.rfc-editor.org/info/rfc8435>. | |||
| [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) | [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) | |||
| Version 4 Minor Version 1 Protocol", RFC 8881, | Version 4 Minor Version 1 Protocol", RFC 8881, | |||
| DOI 10.17487/RFC8881, August 2020, | DOI 10.17487/RFC8881, August 2020, | |||
| <https://www.rfc-editor.org/info/rfc8881>. | <https://www.rfc-editor.org/info/rfc8881>. | |||
| 7.2. Informative References | [RFC9754] Haynes, T. and T. Myklebust, "Extensions for Opening and | |||
| Delegating Files in NFSv4.2", RFC 9754, | ||||
| DOI 10.17487/RFC9754, March 2025, | ||||
| <https://www.rfc-editor.org/info/rfc9754>. | ||||
| [LEGAL] IETF Trust, "Legal Provisions Relating to IETF Documents", | 7.2. Informative References | |||
| November 2008, <http://trustee.ietf.org/docs/IETF-Trust- | ||||
| License-Policy.pdf>. | ||||
| [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | |||
| Version 3 Protocol Specification", RFC 1813, | Version 3 Protocol Specification", RFC 1813, | |||
| DOI 10.17487/RFC1813, June 1995, | DOI 10.17487/RFC1813, June 1995, | |||
| <https://www.rfc-editor.org/info/rfc1813>. | <https://www.rfc-editor.org/info/rfc1813>. | |||
| Appendix A. Acknowledgments | Acknowledgments | |||
| Dave Noveck, Tigran Mkrtchyan, and Rick Macklem provided reviews of | Dave Noveck, Tigran Mkrtchyan, and Rick Macklem provided reviews of | |||
| the document. | the document. | |||
| Authors' Addresses | Authors' Addresses | |||
| Thomas Haynes | Thomas Haynes | |||
| Hammerspace | Hammerspace | |||
| Email: loghyr@gmail.com | Email: loghyr@gmail.com | |||
| End of changes. 49 change blocks. | ||||
| 202 lines changed or deleted | 184 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||